Poker-AI.org http://poker-ai.org/phpbb/ |
|
Parallel/Multi-threaded Clustering? http://poker-ai.org/phpbb/viewtopic.php?f=24&t=2602 |
Page 1 of 1 |
Author: | cantina [ Sat Sep 28, 2013 9:19 pm ] |
Post subject: | Parallel/Multi-threaded Clustering? |
Does anybody know of an algorithm available that can cluster large datasets in a multi-threaded fashion with comparable results to Xmeans? I found the below paper on PXM, but haven't found it implemented anywhere. ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6324625&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6324625 I found this code in regard to multi-threaded k-means, but Xmeans is much better/stabler: www.codethinked.com/multi-threaded-k-me ... -in-net-40 |
Author: | ibot [ Thu Nov 21, 2013 12:24 am ] |
Post subject: | Re: Parallel/Multi-threaded Clustering? |
Any luck? About to start working more on clustering now. How are you working with the data? |
Author: | cantina [ Fri Nov 22, 2013 5:18 pm ] |
Post subject: | Re: Parallel/Multi-threaded Clustering? |
I think I just went with the single-threaded implementation of Xmeans for the flop and FarthestFirst for the turn, using Weka. The data was based on past/future statistical projections for each hand, as mentioned in another thread. It didn't work well for me -- maybe too many dimensions to the data? FarthestFirst is a fast but weak clusterer. Xmeans might have taken a month to cluster the turn. |
Author: | flopnflush [ Tue Nov 26, 2013 11:46 am ] |
Post subject: | Re: Parallel/Multi-threaded Clustering? |
I have just adapted the KMeansPlusPlusClusterer from Apache Commons Math to support multithreading and point-frequencies (e.g. for suit isomorphisms). If anyone needs it: https://github.com/flopnflush/kmeans |
Author: | proud2bBot [ Tue Nov 26, 2013 9:43 pm ] |
Post subject: | Re: Parallel/Multi-threaded Clustering? |
If you are using KMeans/KMeans++ you should perform multiple runs (using different seeds) and choose the best clustering from all runs. I guess its easier to run the different runs in a thread as you dont need to change any exisiting algorithms. |
Author: | flopnflush [ Tue Nov 26, 2013 11:04 pm ] |
Post subject: | Re: Parallel/Multi-threaded Clustering? |
Yes, that would have been the easier solution, but I have done it now anyway. Might still be the better solution if I run it on amazon ec2 instances with many threads. 'And I think you also need more ram if you perform multiple runs simultaneously. I have added the MultiKMeansPlusPlusClusterer class to my code now, which performs multiple runs and chooses the best solution. How many runs do you perform usually? |
Author: | proud2bBot [ Wed Nov 27, 2013 12:18 am ] |
Post subject: | Re: Parallel/Multi-threaded Clustering? |
i always performed like 10-20 different runs, but it mainly depends on the data, so you can't name a good parameter before... |
Author: | flopnflush [ Thu Nov 28, 2013 11:04 pm ] |
Post subject: | Re: Parallel/Multi-threaded Clustering? |
The cool thing is that I can now run each trial in it's own ec2-instance with 32 threads. This reduces the time needed for clustering tremendously. |
Page 1 of 1 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/ |