Poker-AI.org

Re: CFRM nodes with p=0

2015-10-25T19:21:14+00:00

Thank you poor homeless guy for this link, it is really interesting. It describes exactly the kind of optimization I was looking for.

I will give the average strategy sampling a try when I have the time. I glanced through the paper, intuitively I think it should also be possible to skip moves with negative regret when p=0, but not always (as I tested), but with some non-zero probability, I will also retest this some time.

Statistics: Posted by eiisolver — Sun Oct 25, 2015 7:21 pm

Re: CFRM nodes with p=0

2015-10-24T21:27:02+00:00

You can't ignore them completely, but you don't necessarily have to traverse them every time.

Average strategy sampling sort of does this:
viewtopic.php?f=24&t=5

Statistics: Posted by somehomelessguy — Sat Oct 24, 2015 9:27 pm

Re: CFRM nodes with p=0

2015-10-17T13:33:34+00:00

Update: I tried this in an experiment (heads up no limit) and it did not work out.

The algorithm converged quickly to an equilibrium. At first glance the obtained strategy look promising. But upon closer inspection I found fold probability for 44 as first action to be 100%, and there was no way the algorithm would ever be able to learn that a call or a raise would be better because regret values of sub nodes of call and/or raise, containing all pure strategies, would never update their regret values due to my "optimization".

So the obtained equilibrium was certainly not (close to) a Nash-equilibrium...

Statistics: Posted by eiisolver — Sat Oct 17, 2015 1:33 pm

CFRM nodes with p=0

2015-09-29T18:31:04+00:00

Hi, I am trying to learn CFRM and I am playing with external sampling. My code for bet nodes looks something like (inspired by amax code, which was very helpful):

Code:

   public double betNode(GameState state, double p, double op) {
      if (player == trainPlayer) {
         Node node = nodeMap.getNode(state, trainPlayer);
         double[] strategy = node.getRegretBasedStrategy(possibleActions);
         double factor = (1.0 / op) * p;
         node.updateCumulativeStrategy(strategy, factor);// DOES NOTHING IF p = 0!
         double[] u = new double[NUM_ACTIONS];
         double ev = 0;
         for (int i : possibleActions) {
            state.playerAction(i);
            u[i] = node(state, p * strategy[i], op);
            state.undo();
            ev += u[i] * strategy[i];
         }
         node.updateRegrets(u, ev, possibleActions);
         return ev;
      } else {
         playerStrategies[player].getStrategy(state, scratchStrategy);
         int action = Node.sampleStrategy(rnd, scratchStrategy);
         state.playerAction(action);
         double result = node(state, p, op * scratchStrategy[action]);
         state.undo();
         return result;
      }
   }

Suppose we get at infoset A, trained player is faced with a big raise and has extremely bad cards. The regret strategy says: 100% fold. All actions are tried, and when attempting a raise action, we sample an opponent reraise and we reach infoset B with trained player to move. Because the raise action at infoset A was not in A's regret strategy, parameter p is now zero. The regret strategy at B is (also) 100% fold.

Does it make any sense to evaluate other moves than fold at infoset B? Because p = 0, other moves don't contribute to B's EV, so the only merit would be to refine B's regret strategy in the hope that some time in the future B will be reached with p > 0. But isn't that a waste of time? Wouldn't it be better (leading faster to convergence) to concentrate efforts at moves in infosets reached with p>0? Or would skipping such moves lead to incorrect results? Or have I misunderstood something in the algorithm?

Statistics: Posted by eiisolver — Tue Sep 29, 2015 6:31 pm