Code:
public double betNode(GameState state, double p, double op) {
if (player == trainPlayer) {
Node node = nodeMap.getNode(state, trainPlayer);
double[] strategy = node.getRegretBasedStrategy(possibleActions);
double factor = (1.0 / op) * p;
node.updateCumulativeStrategy(strategy, factor);// DOES NOTHING IF p = 0!
double[] u = new double[NUM_ACTIONS];
double ev = 0;
for (int i : possibleActions) {
state.playerAction(i);
u[i] = node(state, p * strategy[i], op);
state.undo();
ev += u[i] * strategy[i];
}
node.updateRegrets(u, ev, possibleActions);
return ev;
} else {
playerStrategies[player].getStrategy(state, scratchStrategy);
int action = Node.sampleStrategy(rnd, scratchStrategy);
state.playerAction(action);
double result = node(state, p, op * scratchStrategy[action]);
state.undo();
return result;
}
}
Suppose we get at infoset A, trained player is faced with a big raise and has extremely bad cards. The regret strategy says: 100% fold. All actions are tried, and when attempting a raise action, we sample an opponent reraise and we reach infoset B with trained player to move. Because the raise action at infoset A was not in A's regret strategy, parameter p is now zero. The regret strategy at B is (also) 100% fold.
Does it make any sense to evaluate other moves than fold at infoset B? Because p = 0, other moves don't contribute to B's EV, so the only merit would be to refine B's regret strategy in the hope that some time in the future B will be reached with p > 0. But isn't that a waste of time? Wouldn't it be better (leading faster to convergence) to concentrate efforts at moves in infosets reached with p>0? Or would skipping such moves lead to incorrect results? Or have I misunderstood something in the algorithm?Statistics: Posted by eiisolver — Tue Sep 29, 2015 6:31 pm
]]>