Poker-AI.org :: View topic - Average Strategy Sampling

public override double TrainAverageStrategySampling(int trainplayer, Iteration iteration, double q)
   {
      int hole = iteration.GetBucket(street, player);

// You must also devide the utility by Q at terminal nodes (showdown and fold)

const double e = 0.5;
const double t = 1000;
const double b = 100000;

      var s = GetStrategy(hole);

      if (player == trainplayer)
      {
         var u = new double[children.Length];

         double ev = 0;

double cs_sum = 0;

for (int i = 0; i < children.Length; i++)
cs_sum += cumulativeStrategy[hole, i];

         for (int i = 0; i < children.Length; i++)
         {
double ap = Math.Max(e, (b + t * cumulativeStrategy[hole, i]) / (b + cs_sum));
if (rnd.Value.NextDouble() < ap)
{
               u[i] = children[i].TrainAverageStrategySampling(trainplayer, iteration, q * Math.Min(1, ap));
               ev += u[i] * s[i];
}
         }

         for (int i = 0; i < children.Length; i++)
            regret[hole, i] += u[i] - ev;

         return ev;

      }
      else
      {
         for (int i = 0; i < children.Length; i++)
            cumulativeStrategy[hole, i] += s[i] / q;

         int a = SampleStrategy(s);
         return children[a].TrainAverageStrategySampling(trainplayer, iteration, q);
      }
   }

public override double TrainAverageStrategyProbing(int trainplayer, Iteration iteration, double q, bool probe)
   {
      int hole = iteration.GetBucket(street, player);

const double e = 0.5;

      var s = GetStrategy(hole);

      if (probe)
      {
         int a = SampleStrategy(s);
         return children[a].TrainAverageStrategyProbing(trainplayer, iteration, q, true);
}
      else if (player == trainplayer)
      {
         var u = new double[children.Length];

         double ev = 0;

double cs_sum = 0;

for (int i = 0; i < children.Length; i++)
cs_sum += cumulativeStrategy[hole, i];

         for (int i = 0; i < children.Length; i++)
         {
double ap = cs_sum <= 0 ? 1 : Math.Max(e, cumulativeStrategy[hole, i]) / cs_sum);
if (rnd.Value.NextDouble() <= ap)
{
               u[i] = children[i].TrainAverageStrategyProbing(trainplayer, iteration, q / ap, probe);
}
else
{
               u[i] = children[i].TrainAverageStrategyProbing(trainplayer, iteration, q, true);
}

               ev += u[i] * s[i];
   }

         for (int i = 0; i < children.Length; i++)
            regret[hole, i] += (u[i] - ev) * q;

         return ev;

      }
      else
      {
         for (int i = 0; i < children.Length; i++)
            cumulativeStrategy[hole, i] += s[i] * q;

         int a = SampleStrategy(s);
         return children[a].TrainAverageStrategyProbing(trainplayer, iteration, q, probe);
      }
   }

Author:	cantina [ Tue Mar 05, 2013 3:34 pm ]
Post subject:	Average Strategy Sampling
From the paper Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions. http://poker.cs.ualberta.ca/publications/NIPS12.pdf In the same format as Amax's posts: Code: public override double TrainAverageStrategySampling(int trainplayer, Iteration iteration, double q) { int hole = iteration.GetBucket(street, player); // You must also devide the utility by Q at terminal nodes (showdown and fold) const double e = 0.5; const double t = 1000; const double b = 100000; var s = GetStrategy(hole); if (player == trainplayer) { var u = new double[children.Length]; double ev = 0; double cs_sum = 0; for (int i = 0; i < children.Length; i++) cs_sum += cumulativeStrategy[hole, i]; for (int i = 0; i < children.Length; i++) { double ap = Math.Max(e, (b + t * cumulativeStrategy[hole, i]) / (b + cs_sum)); if (rnd.Value.NextDouble() < ap) { u[i] = children[i].TrainAverageStrategySampling(trainplayer, iteration, q * Math.Min(1, ap)); ev += u[i] * s[i]; } } for (int i = 0; i < children.Length; i++) regret[hole, i] += u[i] - ev; return ev; } else { for (int i = 0; i < children.Length; i++) cumulativeStrategy[hole, i] += s[i] / q; int a = SampleStrategy(s); return children[a].TrainAverageStrategySampling(trainplayer, iteration, q); } } Or, Average Strategy Sampling as a probing strategy. http://poker.cs.ualberta.ca/publications/AAAI12-generalmccfr.pdf Code: public override double TrainAverageStrategyProbing(int trainplayer, Iteration iteration, double q, bool probe) { int hole = iteration.GetBucket(street, player); const double e = 0.5; var s = GetStrategy(hole); if (probe) { int a = SampleStrategy(s); return children[a].TrainAverageStrategyProbing(trainplayer, iteration, q, true); } else if (player == trainplayer) { var u = new double[children.Length]; double ev = 0; double cs_sum = 0; for (int i = 0; i < children.Length; i++) cs_sum += cumulativeStrategy[hole, i]; for (int i = 0; i < children.Length; i++) { double ap = cs_sum <= 0 ? 1 : Math.Max(e, cumulativeStrategy[hole, i]) / cs_sum); if (rnd.Value.NextDouble() <= ap) { u[i] = children[i].TrainAverageStrategyProbing(trainplayer, iteration, q / ap, probe); } else { u[i] = children[i].TrainAverageStrategyProbing(trainplayer, iteration, q, true); } ev += u[i] * s[i]; } for (int i = 0; i < children.Length; i++) regret[hole, i] += (u[i] - ev) * q; return ev; } else { for (int i = 0; i < children.Length; i++) cumulativeStrategy[hole, i] += s[i] * q; int a = SampleStrategy(s); return children[a].TrainAverageStrategyProbing(trainplayer, iteration, q, probe); } }

Author:	proud2bBot [ Mon Mar 11, 2013 10:37 pm ]
Post subject:	Re: Average Strategy Sampling
thanks for sharing the code Nasher. I wonder if anyone has used both approaches and has experience with which one converges faster. I am personally using the average strategy sampling (w/o probing) and it is converging much faster than MCCFRM in my case (having a lot of different betting options).

Author:	cantina [ Tue Mar 12, 2013 6:58 pm ]
Post subject:	Re: Average Strategy Sampling
I've tried a union of the two, which converges extremely fast, but with limited accuracy. The latter version I posted, AS-Probing, might need the threshold and bonus requirements to work (I haven't tested it thoroughly). Plain probing is what I'm using. I wonder how well these probabilistic approaches would work with an abstraction that is very close to the underlying game? I've somehow got the idea in my head that these algorithms take advantage of the fact that some abstractions are over-trained (and therefor more exploitable in the real game) after a certain convergence.

Author:	proud2bBot [ Tue Mar 12, 2013 9:34 pm ]
Post subject:	Re: Average Strategy Sampling
what do you mean with overtrained? I see abstractions orthogonal to the algorithmic solving. If you are talking about overtraining in terms of e.g. hitting our FDs at the beginning a lot and changing the strategy accordingly to highly value them, then its part of parameters like exploration to avoid getting stuck in a local optimum. If you are using ASS, you can see that its not faster at the beginning, but once the first xM iterations are performed, more and more actions are not sampled anymore (or better sampled very rarely).

Author:	cantina [ Wed Mar 13, 2013 7:42 am ]
Post subject:	Re: Average Strategy Sampling
I should have said 'over-fitting' instead of 'over-training.' By this I meant the point where the real game exploitability begins to increase while the abstracted game exploitability decreases. These approaches get to that threshold quickly. In an abstraction with high real-game correlation I wonder if they wouldn't cease to converge after a certain point.

Poker-AI.org http://poker-ai.org/phpbb/

Average Strategy Sampling http://poker-ai.org/phpbb/viewtopic.php?f=24&t=5	Page 1 of 1

Author:	proud2bBot [ Wed Mar 13, 2013 12:52 pm ]
Post subject:	Re: Average Strategy Sampling
I can't prove it, but in my understanding, an abstraction that is closer to the real game should improve our strategy in terms of exploitability in both, the abstract and the real game, as long as you still sample hands/boards completely random.

Author:	cantina [ Wed Mar 13, 2013 7:56 pm ]
Post subject:	Re: Average Strategy Sampling
I always have to think hard to understand your meaning, p2bb. I'm not sure if that's a good or a bad thing. I'm saying these probabilistic CFRM algorithms might not work as well in very large abstractions, close to the real game. In the ASS paper, they did some comparisons to show the opposite, but in smaller abstractions.

Author:	proud2bBot [ Wed Mar 13, 2013 8:34 pm ]
Post subject:	Re: Average Strategy Sampling
hahaha, sorry. My point was that I cannot see a reason why a probabilistic algorithm should perform worse for a "real world abstraction" compared to a regular one. It has been shown that larger abstractions dont need to lead to better results in general, but that was independent of the algorithm. The only effential difference I see between ASS and regular MC-CFRM is, that we don't necessary sample all branches of the tree anymore. This may lead to one worst case scenario: one branch (we call the action x) performs bad, hence is not chosen a lot of times, and then the cumulative strategy values of our other options increased so much that the branch x takes a long time to get chosen with their real optimal value. However, this is avoided by a) waiting until the algo really converges (which may take a while) and b) choosing "good" parameters for exploration and bonus.

Author:	Coffee4tw [ Wed Mar 13, 2013 8:40 pm ]
Post subject:	Re: Average Strategy Sampling
Didn't UofA publish a paper that shows that a bigger abstraction (therefore closer to the real game) doesn't necessarily lead to less exploitability in the real world versus a smaller abstraction? Ah yes, this is the paper: Abstraction Pathologies in Extensive Games Authors: Kevin Waugh, David Schnizlein, Michael Bowling, and Duane Szafron. Date 2009 http://poker.cs.ualberta.ca/publication ... action.pdf

Author:	cantina [ Wed Mar 13, 2013 10:03 pm ]
Post subject:	Re: Average Strategy Sampling
That's true, c4tw, but I was thinking big-big, like slumbot's abstraction in the 2012 ACPC (half a terabyte, unabstracted flop/turn).

Author:	cantina [ Wed Mar 13, 2013 10:22 pm ]
Post subject:	Re: Average Strategy Sampling
proud2bBot wrote: a) waiting until the algo really converges (which may take a while) and b) choosing "good" parameters for exploration and bonus. Those metrics aren't perfect. a) How do you know when your algo really converges? b) Same question. You're essentially trading accuracy for speed by using educated guesses. In a very large abstraction (i.e. close to the real-world game), those inadequacies make you more exploitable, wouldn't you say? In a game where over-fitting is expected, that inaccuracy is a good thing. Make sense? This is just speculation on my part, and what seemed to be happening in my experimentation with extreme estimators.

Author:	proud2bBot [ Wed Mar 13, 2013 10:43 pm ]
Post subject:	Re: Average Strategy Sampling
I see your argument and agree My guess would be that the effect would be very small however for the well-known algorithms (often, you can find evaluations with best-response exploitability comparisons), while the speed-up is huge. So practically, algorithms like ASS will always perform better given a limited amount of CPU hours; I'd guess if you crunch the numbers for like several weeks, one might go back to plain CFRM (as it doesnt matter if we converge after 10 or 100M iterations, if we run 100B iterations), but like many other areas, I guess we can only make guesses here.

Author:	Nose [ Mon Jul 01, 2013 7:48 am ]
Post subject:	Re: Average Strategy Sampling
Coffee4tw wrote: Didn't UofA publish a paper that shows that a bigger abstraction (therefore closer to the real game) doesn't necessarily lead to less exploitability in the real world versus a smaller abstraction? Ah yes, this is the paper: Abstraction Pathologies in Extensive Games Authors: Kevin Waugh, David Schnizlein, Michael Bowling, and Duane Szafron. Date 2009 http://poker.cs.ualberta.ca/publication ... action.pdf Not necessarily, but maybe worth mentioning: http://poker.cs.ualberta.ca/publications/ijcai2011_accelerated_best_response.pdf http://poker.cs.ualberta.ca/publications/ijcai2011_accelerated_best_response.pdf (Page 2, Top left) wrote: In this paper, we describe general techniques for accelerating best response calculations. The method uses the structure of information and utilities to avoid a full game tree traversal, while also being well-suited to parallel computation. As a result we are able for the first time to compute the worst case performance of non-trivial strategies in two-player limit Texas hold’em. After introducing these innovations, we use our technique to empirically answer a number of open questions related to abstraction and equilibrium approximation. We show that in practice finer poker abstractions do produce better equilibrium approximations, but better worst-case performance does not always result in better performance in a tournament. These conclusions are drawn from evaluating the worst-case performance of over three dozen strategies involving ten total CPU years of computation.

Author:	Isildur11 [ Tue Jul 02, 2013 8:48 pm ]
Post subject:	Re: Average Strategy Sampling
Coffee4tw wrote: Didn't UofA publish a paper that shows that a bigger abstraction (therefore closer to the real game) doesn't necessarily lead to less exploitability in the real world versus a smaller abstraction? not if you train your CFR against a best response in a wider abstraction (cf CFR-BR)

Page 1 of 1	All times are UTC
Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/