Poker-AI.org
http://poker-ai.org/phpbb/

Parallel/Multi-threaded Clustering?
http://poker-ai.org/phpbb/viewtopic.php?f=24&t=2602
Page 1 of 1

Author:  cantina [ Sat Sep 28, 2013 9:19 pm ]
Post subject:  Parallel/Multi-threaded Clustering?

Does anybody know of an algorithm available that can cluster large datasets in a multi-threaded fashion with comparable results to Xmeans? I found the below paper on PXM, but haven't found it implemented anywhere.

ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6324625&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6324625

I found this code in regard to multi-threaded k-means, but Xmeans is much better/stabler:
www.codethinked.com/multi-threaded-k-me ... -in-net-40

Author:  ibot [ Thu Nov 21, 2013 12:24 am ]
Post subject:  Re: Parallel/Multi-threaded Clustering?

Any luck? About to start working more on clustering now. How are you working with the data?

Author:  cantina [ Fri Nov 22, 2013 5:18 pm ]
Post subject:  Re: Parallel/Multi-threaded Clustering?

I think I just went with the single-threaded implementation of Xmeans for the flop and FarthestFirst for the turn, using Weka. The data was based on past/future statistical projections for each hand, as mentioned in another thread. It didn't work well for me -- maybe too many dimensions to the data?

FarthestFirst is a fast but weak clusterer. Xmeans might have taken a month to cluster the turn.

Author:  flopnflush [ Tue Nov 26, 2013 11:46 am ]
Post subject:  Re: Parallel/Multi-threaded Clustering?

I have just adapted the KMeansPlusPlusClusterer from Apache Commons Math to support multithreading and point-frequencies (e.g. for suit isomorphisms). If anyone needs it: https://github.com/flopnflush/kmeans

Author:  proud2bBot [ Tue Nov 26, 2013 9:43 pm ]
Post subject:  Re: Parallel/Multi-threaded Clustering?

If you are using KMeans/KMeans++ you should perform multiple runs (using different seeds) and choose the best clustering from all runs. I guess its easier to run the different runs in a thread as you dont need to change any exisiting algorithms.

Author:  flopnflush [ Tue Nov 26, 2013 11:04 pm ]
Post subject:  Re: Parallel/Multi-threaded Clustering?

Yes, that would have been the easier solution, but I have done it now anyway. Might still be the better solution if I run it on amazon ec2 instances with many threads. 'And I think you also need more ram if you perform multiple runs simultaneously.

I have added the MultiKMeansPlusPlusClusterer class to my code now, which performs multiple runs and chooses the best solution.

How many runs do you perform usually?

Author:  proud2bBot [ Wed Nov 27, 2013 12:18 am ]
Post subject:  Re: Parallel/Multi-threaded Clustering?

i always performed like 10-20 different runs, but it mainly depends on the data, so you can't name a good parameter before...

Author:  flopnflush [ Thu Nov 28, 2013 11:04 pm ]
Post subject:  Re: Parallel/Multi-threaded Clustering?

The cool thing is that I can now run each trial in it's own ec2-instance with 32 threads. This reduces the time needed for clustering tremendously.

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/