Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 5:34 pm

All times are UTC




Post new topic Reply to topic  [ 8 posts ] 
Author Message
PostPosted: Sat Sep 28, 2013 9:19 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Does anybody know of an algorithm available that can cluster large datasets in a multi-threaded fashion with comparable results to Xmeans? I found the below paper on PXM, but haven't found it implemented anywhere.

ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=6324625&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D6324625

I found this code in regard to multi-threaded k-means, but Xmeans is much better/stabler:
www.codethinked.com/multi-threaded-k-me ... -in-net-40


Top
 Profile  
 
PostPosted: Thu Nov 21, 2013 12:24 am 
Offline
Regular Member
User avatar

Joined: Tue Mar 05, 2013 9:19 pm
Posts: 50
Any luck? About to start working more on clustering now. How are you working with the data?


Top
 Profile  
 
PostPosted: Fri Nov 22, 2013 5:18 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
I think I just went with the single-threaded implementation of Xmeans for the flop and FarthestFirst for the turn, using Weka. The data was based on past/future statistical projections for each hand, as mentioned in another thread. It didn't work well for me -- maybe too many dimensions to the data?

FarthestFirst is a fast but weak clusterer. Xmeans might have taken a month to cluster the turn.


Top
 Profile  
 
PostPosted: Tue Nov 26, 2013 11:46 am 
Offline
Junior Member

Joined: Sat Nov 02, 2013 2:21 pm
Posts: 26
I have just adapted the KMeansPlusPlusClusterer from Apache Commons Math to support multithreading and point-frequencies (e.g. for suit isomorphisms). If anyone needs it: https://github.com/flopnflush/kmeans


Top
 Profile  
 
PostPosted: Tue Nov 26, 2013 9:43 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
If you are using KMeans/KMeans++ you should perform multiple runs (using different seeds) and choose the best clustering from all runs. I guess its easier to run the different runs in a thread as you dont need to change any exisiting algorithms.


Top
 Profile  
 
PostPosted: Tue Nov 26, 2013 11:04 pm 
Offline
Junior Member

Joined: Sat Nov 02, 2013 2:21 pm
Posts: 26
Yes, that would have been the easier solution, but I have done it now anyway. Might still be the better solution if I run it on amazon ec2 instances with many threads. 'And I think you also need more ram if you perform multiple runs simultaneously.

I have added the MultiKMeansPlusPlusClusterer class to my code now, which performs multiple runs and chooses the best solution.

How many runs do you perform usually?


Top
 Profile  
 
PostPosted: Wed Nov 27, 2013 12:18 am 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
i always performed like 10-20 different runs, but it mainly depends on the data, so you can't name a good parameter before...


Top
 Profile  
 
PostPosted: Thu Nov 28, 2013 11:04 pm 
Offline
Junior Member

Joined: Sat Nov 02, 2013 2:21 pm
Posts: 26
The cool thing is that I can now run each trial in it's own ec2-instance with 32 threads. This reduces the time needed for clustering tremendously.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 8 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group