Poker-AI.org http://poker-ai.org/phpbb/ |
|
Best response calculation http://poker-ai.org/phpbb/viewtopic.php?f=24&t=2671 |
Page 1 of 1 |
Author: | nemesis17 [ Thu Dec 26, 2013 6:23 pm ] |
Post subject: | Best response calculation |
I changed the amax code to solve some preflop games but I am trying to use the 169 isomorphic starting hands (buckets) instead of 1326 to make it faster. My code is (chance) sampling the preflop hands correctly according to their preflop weight (suited hands, pocket pairs, offsuit hands) but I figure the rest of the CFRM code doesn't need adjusting once you sample the hands with the correct weight, right? The part that doesn't seem to work is the best response calculation and I would appreciate some help figuring what is wrong (must be really obvious but I am missing it completely). The Decision.cs best response implementation was changed to the following: Code: public double BestResponse(int brplayer, HoleDistribution[] distributions) { int bropponent = brplayer ^ 1; var op = new double[distributions[bropponent].HoleCount]; double sum = 0; for (int i = 0; i < distributions[brplayer].HoleCount; i++) { var phole = distributions[brplayer].Holes[i]; double opsum = 0; for (int j = 0; j < distributions[bropponent].HoleCount; j++) { var ohole = distributions[bropponent].Holes[j]; op[j] = HoleDistribution.Combos[i, j]; opsum += op[j]; } for (int j = 0; j < distributions[bropponent].HoleCount; j++) op[j] = op[j] * distributions[bropponent].Holes[j].Probability / opsum; sum += phole.RelativeProbability * BestResponse(brplayer, distributions, i, op); } return sum; } HoleDistribution.Combos[i, j] i, j in [0...168] is a simple lookup table that returns the number of non-conflicting combos between hand i and j. Ex: if i corresponds to 22 and j is 32o then the number of non-conflicting combos is 6 because if one player holds 22 there are only 6 combos of 32o the opponent can hold. Using this I am hopefully computing the correct opponent reach probabilities for each hand type. The other method in the same class is unchanged from the original amax code. Code: public override double BestResponse(int brplayer, HoleDistribution[] distributions, int hand, double[] op) { int bropponent = brplayer ^ 1; if (player == brplayer) { double bestev = -double.MaxValue; for (int i = 0; i < children.Length; i++) bestev = Math.Max(bestev, children[i].BestResponse(brplayer, distributions, hand, op)); return bestev; } else { double ev = 0; for (int i = 0; i < children.Length; i++) { var newop = new double[distributions[bropponent].HoleCount]; for (int h = 0; h < distributions[bropponent].HoleCount; h++) { var s = GetNormalizedAverageStrategy(h); newop[h] = s[i] * op[h]; } ev += children[i].BestResponse(brplayer, distributions, hand, newop); } return ev; } } The Showdown.cs BestResponse is shown below. Here I also changed the TrainChanceSampling method: Code: public override double BestResponse(int brplayer, HoleDistribution[] distributions, int i, double[] op) { double ev = 0; for (int j = 0; j < op.Length; j++) { double equity = HoleDistribution.Equity(i, j); if (equity > 0.5) ev += value * op[j]; else if (equity < 0.5) ev -= value * op[j]; } return ev; } public override double TrainChanceSampling(int trainplayer, Iteration iteration, double p, double op) { return (iteration.GetShowdownValue(trainplayer) * 2 * value - value) * op; } HoleDistribution.Equity(i, j) i, j in [0..168] is a simple LUT with preflop equity of hand i vs hand j. GetShowdownValue just looks up this table and returns the equity of the train player sampled hand vs the opponent. And the Fold.cs BestResponse and TrainChanceSampling look like: Code: public override double BestResponse(int brplayer, HoleDistribution[] distributions, int hand, double[] op) { double ev = 0; for (int i = 0; i < op.Length; i++) ev += value * op[i]; return brplayer == 0 ? ev : -ev; } public override double TrainChanceSampling(int trainplayer, Iteration iteration, double p, double op) { if (trainplayer == 0) return op * value; else return op * -value; } When running this on a simple Push/Fold game I get wrong best response computations that don't converge: Code: ChanceSampling 120 seconds 0 | 0.0s | i/s | BR 1.3224 + 1.2831 = 2.6055 10000000 | 3.2s | 3132832 | BR 0.0747 + 0.6275 = 0.7021 20000000 | 6.3s | 3162055 | BR 0.0752 + 0.6242 = 0.6994 30000000 | 9.4s | 3185389 | BR 0.0741 + 0.6286 = 0.7027 40000000 | 12.5s | 3194122 | BR 0.0800 + 0.6297 = 0.7096 50000000 | 15.6s | 3198976 | BR 0.0813 + 0.6236 = 0.7050 60000000 | 18.8s | 3196079 | BR 0.0784 + 0.6206 = 0.6990 70000000 | 21.9s | 3198537 | BR 0.0755 + 0.6214 = 0.6969 80000000 | 25.0s | 3200896 | BR 0.0741 + 0.6235 = 0.6976 90000000 | 28.1s | 3201252 | BR 0.0749 + 0.6252 = 0.7001 100000000 | 31.2s | 3204101 | BR 0.0757 + 0.6265 = 0.7022 .... This should be a simple error but I am stuck here and would appreciate another pair of eyes to help me find it. Thanks. |
Page 1 of 1 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/ |