Poker-AI.org Poker AI and Botting Discussion Forum 2015-10-25T19:21:14+00:00 http://poker-ai.org/phpbb/feed.php?f=24&t=2941 2015-10-25T19:21:14+00:00 2015-10-25T19:21:14+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2941&p=6862#p6862 <![CDATA[Re: CFRM nodes with p=0]]>
I will give the average strategy sampling a try when I have the time. I glanced through the paper, intuitively I think it should also be possible to skip moves with negative regret when p=0, but not always (as I tested), but with some non-zero probability, I will also retest this some time.

Statistics: Posted by eiisolver — Sun Oct 25, 2015 7:21 pm


]]>
2015-10-24T21:27:02+00:00 2015-10-24T21:27:02+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2941&p=6861#p6861 <![CDATA[Re: CFRM nodes with p=0]]>
Average strategy sampling sort of does this:
viewtopic.php?f=24&t=5

Statistics: Posted by somehomelessguy — Sat Oct 24, 2015 9:27 pm


]]>
2015-10-17T13:33:34+00:00 2015-10-17T13:33:34+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2941&p=6852#p6852 <![CDATA[Re: CFRM nodes with p=0]]>
The algorithm converged quickly to an equilibrium. At first glance the obtained strategy look promising. But upon closer inspection I found fold probability for 44 as first action to be 100%, and there was no way the algorithm would ever be able to learn that a call or a raise would be better because regret values of sub nodes of call and/or raise, containing all pure strategies, would never update their regret values due to my "optimization".

So the obtained equilibrium was certainly not (close to) a Nash-equilibrium...

Statistics: Posted by eiisolver — Sat Oct 17, 2015 1:33 pm


]]>
2015-09-29T18:31:04+00:00 2015-09-29T18:31:04+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2941&p=6824#p6824 <![CDATA[CFRM nodes with p=0]]>
Code:
   public double betNode(GameState state, double p, double op) {
      if (player == trainPlayer) {
         Node node = nodeMap.getNode(state, trainPlayer);
         double[] strategy = node.getRegretBasedStrategy(possibleActions);
         double factor = (1.0 / op) * p;
         node.updateCumulativeStrategy(strategy, factor);// DOES NOTHING IF p = 0!
         double[] u = new double[NUM_ACTIONS];
         double ev = 0;
         for (int i : possibleActions) {
            state.playerAction(i);
            u[i] = node(state, p * strategy[i], op);
            state.undo();
            ev += u[i] * strategy[i];
         }
         node.updateRegrets(u, ev, possibleActions);
         return ev;
      } else {
         playerStrategies[player].getStrategy(state, scratchStrategy);
         int action = Node.sampleStrategy(rnd, scratchStrategy);
         state.playerAction(action);
         double result = node(state, p, op * scratchStrategy[action]);
         state.undo();
         return result;
      }
   }


Suppose we get at infoset A, trained player is faced with a big raise and has extremely bad cards. The regret strategy says: 100% fold. All actions are tried, and when attempting a raise action, we sample an opponent reraise and we reach infoset B with trained player to move. Because the raise action at infoset A was not in A's regret strategy, parameter p is now zero. The regret strategy at B is (also) 100% fold.

Does it make any sense to evaluate other moves than fold at infoset B? Because p = 0, other moves don't contribute to B's EV, so the only merit would be to refine B's regret strategy in the hope that some time in the future B will be reached with p > 0. But isn't that a waste of time? Wouldn't it be better (leading faster to convergence) to concentrate efforts at moves in infosets reached with p>0? Or would skipping such moves lead to incorrect results? Or have I misunderstood something in the algorithm?

Statistics: Posted by eiisolver — Tue Sep 29, 2015 6:31 pm


]]>