Best response calculation

2013-12-26T18:23:14+00:00

I changed the amax code to solve some preflop games but I am trying to use the 169 isomorphic starting hands (buckets) instead of 1326 to make it faster.

My code is (chance) sampling the preflop hands correctly according to their preflop weight (suited hands, pocket pairs, offsuit hands) but I figure the rest of the CFRM code doesn't need adjusting once you sample the hands with the correct weight, right?

The part that doesn't seem to work is the best response calculation and I would appreciate some help figuring what is wrong (must be really obvious but I am missing it completely).

The Decision.cs best response implementation was changed to the following:

Code:

  public double BestResponse(int brplayer, HoleDistribution[] distributions)
        {
            int bropponent = brplayer ^ 1;

            var op = new double[distributions[bropponent].HoleCount];
            double sum = 0;

            for (int i = 0; i < distributions[brplayer].HoleCount; i++)
            {
                var phole = distributions[brplayer].Holes[i];

                double opsum = 0;

                for (int j = 0; j < distributions[bropponent].HoleCount; j++)
                {
                    var ohole = distributions[bropponent].Holes[j];
                    op[j] = HoleDistribution.Combos[i, j];                 
                    opsum += op[j];
                }

                for (int j = 0; j < distributions[bropponent].HoleCount; j++)                  
                        op[j] = op[j] * distributions[bropponent].Holes[j].Probability / opsum;

                sum += phole.RelativeProbability * BestResponse(brplayer, distributions, i, op);
            }

            return sum;
        }

HoleDistribution.Combos[i, j] i, j in [0...168] is a simple lookup table that returns the number of non-conflicting combos between hand i and j. Ex: if i corresponds to 22 and j is 32o then the number of non-conflicting combos is 6 because if one player holds 22 there are only 6 combos of 32o the opponent can hold. Using this I am hopefully computing the correct opponent reach probabilities for each hand type.

The other method in the same class is unchanged from the original amax code.

Code:

public override double BestResponse(int brplayer, HoleDistribution[] distributions, int hand, double[] op)
      {
         int bropponent = brplayer ^ 1;

         if (player == brplayer)
         {
            double bestev = -double.MaxValue;

            for (int i = 0; i < children.Length; i++)
               bestev = Math.Max(bestev, children[i].BestResponse(brplayer, distributions, hand, op));

            return bestev;
         }
         else
         {
            double ev = 0;

            for (int i = 0; i < children.Length; i++)
            {
               var newop = new double[distributions[bropponent].HoleCount];

               for (int h = 0; h < distributions[bropponent].HoleCount; h++)
               {
                  var s = GetNormalizedAverageStrategy(h);
                  newop[h] = s[i] * op[h];
               }

               ev += children[i].BestResponse(brplayer, distributions, hand, newop);
            }

            return ev;
         }
      }

The Showdown.cs BestResponse is shown below. Here I also changed the TrainChanceSampling method:

Code:

 public override double BestResponse(int brplayer, HoleDistribution[] distributions, int i, double[] op)
        {
            double ev = 0;
                     
            for (int j = 0; j < op.Length; j++)
            {
         
                double equity = HoleDistribution.Equity(i, j);
                if (equity > 0.5)
                    ev += value * op[j];
                else if (equity < 0.5)
                    ev -= value * op[j];
            }

            return ev;
        }

public override double TrainChanceSampling(int trainplayer, Iteration iteration, double p, double op)
      {           
            return (iteration.GetShowdownValue(trainplayer) * 2 * value - value) * op;
            
      }

HoleDistribution.Equity(i, j) i, j in [0..168] is a simple LUT with preflop equity of hand i vs hand j.
GetShowdownValue just looks up this table and returns the equity of the train player sampled hand vs the opponent.

And the Fold.cs BestResponse and TrainChanceSampling look like:

Code:

public override double BestResponse(int brplayer, HoleDistribution[] distributions, int hand, double[] op)
      {
         double ev = 0;
            for (int i = 0; i < op.Length; i++)
                ev += value * op[i];

         return brplayer == 0 ? ev : -ev;
      }

public override double TrainChanceSampling(int trainplayer, Iteration iteration, double p, double op)
      {
            if (trainplayer == 0)
                return op * value;
            else
                return op * -value;
      }

When running this on a simple Push/Fold game I get wrong best response computations that don't converge:

Code:

ChanceSampling 120 seconds
         0 |        0.0s |        i/s | BR 1.3224 + 1.2831 = 2.6055
  10000000 |        3.2s |    3132832 | BR 0.0747 + 0.6275 = 0.7021
  20000000 |        6.3s |    3162055 | BR 0.0752 + 0.6242 = 0.6994
  30000000 |        9.4s |    3185389 | BR 0.0741 + 0.6286 = 0.7027
  40000000 |       12.5s |    3194122 | BR 0.0800 + 0.6297 = 0.7096
  50000000 |       15.6s |    3198976 | BR 0.0813 + 0.6236 = 0.7050
  60000000 |       18.8s |    3196079 | BR 0.0784 + 0.6206 = 0.6990
  70000000 |       21.9s |    3198537 | BR 0.0755 + 0.6214 = 0.6969
  80000000 |       25.0s |    3200896 | BR 0.0741 + 0.6235 = 0.6976
  90000000 |       28.1s |    3201252 | BR 0.0749 + 0.6252 = 0.7001
 100000000 |       31.2s |    3204101 | BR 0.0757 + 0.6265 = 0.7022
....

This should be a simple error but I am stuck here and would appreciate another pair of eyes to help me find it.

Thanks.

Statistics: Posted by nemesis17 — Thu Dec 26, 2013 6:23 pm

Poker-AI.org

Best response calculation