Poker-AI.org
http://poker-ai.org/phpbb/

Best way to program a Poker MCTS ?
http://poker-ai.org/phpbb/viewtopic.php?f=24&t=2552
Page 1 of 1

Author:  vlad2048 [ Mon Aug 12, 2013 7:48 pm ]
Post subject:  Best way to program a Poker MCTS ?

Hi there ! First post

I'm trying to implement the MCTS algorithm described in many papers.
I already have a simple MonteCarlo algorithm working where I do the selection and a bunch of expansions to get the average value of each actions and choose the best one.
Basically, I have a GameState class and to find the best action, I do:
Code:
foreach (action in state.PossibleActions)
  newState = state.Clone();
  for i=1 to 1000
    while (!newState.IsFinished)
      newState.ChooseActionOrBoardCard
    value = newState.PayoutForMe
  deduce the averageValueForMove
And basically choose the move that gives the best average value.

After reading a few papers on MCTS, I think I roughly understand the algorithm (this one was the best: http://www.ai.rug.nl/~mwiering/Tom_van_ ... Thesis.pdf)
My question is about how to implement it ? (I've looked a bit at https://code.google.com/p/opentestbed, but didn't understand it all)

1) Is it best to have a Node<-DecisionNode, OpponentNode, ChanceNode hierarchy of classes that each wrap one GameState clone ?
2) If every node has a copy of a GameState, it's going to take a lot of memory, is that normal ?
3) I'm thinking about putting the OpponentModels inside the GameState as they will be updated everytime the GameStates change. Is that the best way ? I see that OpenTestBed puts them in a shared Config and calls Config.Model.AssumeTemporarily(GameState), I don't really understand what it's doing exactly there...
4) In OpenTestBed, I cannot see the ChanceNode, has it been optimized away ?
5) In the AkiRealbot (from the DarmstadtPokerBots), I see they have a hierarchy of nodes AND of arcs between the nodes. Any advantage to using arcs ? It seems wasteful to me, the nodes should have the arc information.
6) I understand that the main advantage of MCTS over my simple algorithm is that it will balance exploitation vs exploration. Any other advantages ?

I'm trying to get my head around it, and hopefully avoid some annoying design mistakes here.
I'm using C#, but I believe a Java viewpoint should be just as helpful.

Thanks guys,
Vlad

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/