Poker-AI.org http://poker-ai.org/phpbb/ |
|
Best way to program a Poker MCTS ? http://poker-ai.org/phpbb/viewtopic.php?f=24&t=2552 |
Page 1 of 1 |
Author: | vlad2048 [ Mon Aug 12, 2013 7:48 pm ] |
Post subject: | Best way to program a Poker MCTS ? |
Hi there ! First post I'm trying to implement the MCTS algorithm described in many papers. I already have a simple MonteCarlo algorithm working where I do the selection and a bunch of expansions to get the average value of each actions and choose the best one. Basically, I have a GameState class and to find the best action, I do: Code: foreach (action in state.PossibleActions) And basically choose the move that gives the best average value.newState = state.Clone(); for i=1 to 1000 while (!newState.IsFinished) newState.ChooseActionOrBoardCard value = newState.PayoutForMe deduce the averageValueForMove After reading a few papers on MCTS, I think I roughly understand the algorithm (this one was the best: http://www.ai.rug.nl/~mwiering/Tom_van_ ... Thesis.pdf) My question is about how to implement it ? (I've looked a bit at https://code.google.com/p/opentestbed, but didn't understand it all) 1) Is it best to have a Node<-DecisionNode, OpponentNode, ChanceNode hierarchy of classes that each wrap one GameState clone ? 2) If every node has a copy of a GameState, it's going to take a lot of memory, is that normal ? 3) I'm thinking about putting the OpponentModels inside the GameState as they will be updated everytime the GameStates change. Is that the best way ? I see that OpenTestBed puts them in a shared Config and calls Config.Model.AssumeTemporarily(GameState), I don't really understand what it's doing exactly there... 4) In OpenTestBed, I cannot see the ChanceNode, has it been optimized away ? 5) In the AkiRealbot (from the DarmstadtPokerBots), I see they have a hierarchy of nodes AND of arcs between the nodes. Any advantage to using arcs ? It seems wasteful to me, the nodes should have the arc information. 6) I understand that the main advantage of MCTS over my simple algorithm is that it will balance exploitation vs exploration. Any other advantages ? I'm trying to get my head around it, and hopefully avoid some annoying design mistakes here. I'm using C#, but I believe a Java viewpoint should be just as helpful. Thanks guys, Vlad |
Page 1 of 1 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/ |