Poker-AI.org

Re: Average Strategy Sampling

2013-07-02T20:48:13+00:00

Coffee4tw wrote:

Didn't UofA publish a paper that shows that a bigger abstraction (therefore closer to the real game) doesn't necessarily lead to less exploitability in the real world versus a smaller abstraction?

not if you train your CFR against a best response in a wider abstraction (cf CFR-BR)

Statistics: Posted by Isildur11 — Tue Jul 02, 2013 8:48 pm

Re: Average Strategy Sampling

2013-07-01T07:48:04+00:00

Coffee4tw wrote:

Didn't UofA publish a paper that shows that a bigger abstraction (therefore closer to the real game) doesn't necessarily lead to less exploitability in the real world versus a smaller abstraction?

Ah yes, this is the paper:
Abstraction Pathologies in Extensive Games
Authors: Kevin Waugh, David Schnizlein, Michael Bowling, and Duane Szafron.
Date 2009
http://poker.cs.ualberta.ca/publication ... action.pdf

Not necessarily, but maybe worth mentioning: http://poker.cs.ualberta.ca/publications/ijcai2011_accelerated_best_response.pdf

http://poker.cs.ualberta.ca/publications/ijcai2011_accelerated_best_response.pdf (Page 2, Top left) wrote:

In this paper, we describe general techniques for accelerating best response calculations. The method uses the structure of information and utilities to avoid a full game tree traversal, while also being well-suited to parallel computation. As a result we are able for the first time to compute the worst case performance of non-trivial strategies in two-player limit Texas hold’em. After introducing these innovations, we use our technique to empirically answer a number of open questions related to abstraction and equilibrium approximation.

We show that in practice finer poker abstractions do produce better equilibrium approximations, but better worst-case performance does not always result in better performance in a tournament. These conclusions are drawn from evaluating the worst-case performance of over three dozen strategies involving ten total CPU years of computation.

Statistics: Posted by Nose — Mon Jul 01, 2013 7:48 am

Re: Average Strategy Sampling

2013-03-13T22:43:14+00:00

I see your argument and agree My guess would be that the effect would be very small however for the well-known algorithms (often, you can find evaluations with best-response exploitability comparisons), while the speed-up is huge. So practically, algorithms like ASS will always perform better given a limited amount of CPU hours; I'd guess if you crunch the numbers for like several weeks, one might go back to plain CFRM (as it doesnt matter if we converge after 10 or 100M iterations, if we run 100B iterations), but like many other areas, I guess we can only make guesses here.

Statistics: Posted by proud2bBot — Wed Mar 13, 2013 10:43 pm

Re: Average Strategy Sampling

2013-03-13T22:22:15+00:00

proud2bBot wrote:

a) waiting until the algo really converges (which may take a while) and b) choosing "good" parameters for exploration and bonus.

Those metrics aren't perfect.
a) How do you know when your algo *really* converges?
b) Same question.

You're essentially trading accuracy for speed by using educated guesses. In a very large abstraction (i.e. close to the real-world game), those inadequacies make you more exploitable, wouldn't you say? In a game where over-fitting is expected, that inaccuracy is a good thing. Make sense? This is just speculation on my part, and what seemed to be happening in my experimentation with extreme estimators.

Statistics: Posted by cantina — Wed Mar 13, 2013 10:22 pm

Re: Average Strategy Sampling

2013-03-13T22:03:56+00:00

That's true, c4tw, but I was thinking big-big, like slumbot's abstraction in the 2012 ACPC (half a terabyte, unabstracted flop/turn).

Statistics: Posted by cantina — Wed Mar 13, 2013 10:03 pm

Re: Average Strategy Sampling

2013-03-13T20:40:38+00:00

Statistics: Posted by Coffee4tw — Wed Mar 13, 2013 8:40 pm

Re: Average Strategy Sampling

2013-03-13T20:34:09+00:00

hahaha, sorry. My point was that I cannot see a reason why a probabilistic algorithm should perform worse for a "real world abstraction" compared to a regular one. It has been shown that larger abstractions dont need to lead to better results in general, but that was independent of the algorithm. The only effential difference I see between ASS and regular MC-CFRM is, that we don't necessary sample all branches of the tree anymore. This may lead to one worst case scenario: one branch (we call the action x) performs bad, hence is not chosen a lot of times, and then the cumulative strategy values of our other options increased so much that the branch x takes a long time to get chosen with their real optimal value. However, this is avoided by a) waiting until the algo really converges (which may take a while) and b) choosing "good" parameters for exploration and bonus.

Statistics: Posted by proud2bBot — Wed Mar 13, 2013 8:34 pm

Re: Average Strategy Sampling

2013-03-13T19:56:32+00:00

I always have to think hard to understand your meaning, p2bb. I'm not sure if that's a good or a bad thing. I'm saying these probabilistic CFRM algorithms might not work as well in very large abstractions, close to the real game. In the ASS paper, they did some comparisons to show the opposite, but in smaller abstractions.

Statistics: Posted by cantina — Wed Mar 13, 2013 7:56 pm

Re: Average Strategy Sampling

2013-03-13T12:52:33+00:00

I can't prove it, but in my understanding, an abstraction that is closer to the real game should improve our strategy in terms of exploitability in both, the abstract and the real game, as long as you still sample hands/boards completely random.

Statistics: Posted by proud2bBot — Wed Mar 13, 2013 12:52 pm

Re: Average Strategy Sampling

2013-03-13T07:42:22+00:00

I should have said 'over-fitting' instead of 'over-training.' By this I meant the point where the real game exploitability begins to increase while the abstracted game exploitability decreases. These approaches get to that threshold quickly. In an abstraction with high real-game correlation I wonder if they wouldn't cease to converge after a certain point.

Statistics: Posted by cantina — Wed Mar 13, 2013 7:42 am

Re: Average Strategy Sampling

2013-03-12T21:34:44+00:00

what do you mean with overtrained? I see abstractions orthogonal to the algorithmic solving. If you are talking about overtraining in terms of e.g. hitting our FDs at the beginning a lot and changing the strategy accordingly to highly value them, then its part of parameters like exploration to avoid getting stuck in a local optimum. If you are using ASS, you can see that its not faster at the beginning, but once the first xM iterations are performed, more and more actions are not sampled anymore (or better sampled very rarely).

Statistics: Posted by proud2bBot — Tue Mar 12, 2013 9:34 pm

Re: Average Strategy Sampling

2013-03-12T18:58:53+00:00

I've tried a union of the two, which converges extremely fast, but with limited accuracy. The latter version I posted, AS-Probing, might need the threshold and bonus requirements to work (I haven't tested it thoroughly). Plain probing is what I'm using.

I wonder how well these probabilistic approaches would work with an abstraction that is very close to the underlying game? I've somehow got the idea in my head that these algorithms take advantage of the fact that some abstractions are over-trained (and therefor more exploitable in the real game) after a certain convergence.

Statistics: Posted by cantina — Tue Mar 12, 2013 6:58 pm

Re: Average Strategy Sampling

2013-03-11T22:37:17+00:00

thanks for sharing the code Nasher. I wonder if anyone has used both approaches and has experience with which one converges faster. I am personally using the average strategy sampling (w/o probing) and it is converging much faster than MCCFRM in my case (having a lot of different betting options).

Statistics: Posted by proud2bBot — Mon Mar 11, 2013 10:37 pm

Average Strategy Sampling

2013-03-05T15:34:32+00:00

From the paper Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions.
http://poker.cs.ualberta.ca/publications/NIPS12.pdf

In the same format as Amax's posts:

Code:

      public override double TrainAverageStrategySampling(int trainplayer, Iteration iteration, double q)
      {
         int hole = iteration.GetBucket(street, player);

        // You must also devide the utility by Q at terminal nodes (showdown and fold)

        const double e = 0.5;
        const double t = 1000;
        const double b = 100000;

         var s = GetStrategy(hole);

         if (player == trainplayer)
         {
            var u = new double[children.Length];

            double ev = 0;

            double cs_sum = 0;

            for (int i = 0; i < children.Length; i++)
                cs_sum += cumulativeStrategy[hole, i];

            for (int i = 0; i < children.Length; i++)
            {
                double ap = Math.Max(e, (b + t * cumulativeStrategy[hole, i]) / (b + cs_sum));
                if (rnd.Value.NextDouble() < ap)
                {
                   u[i] = children[i].TrainAverageStrategySampling(trainplayer, iteration, q * Math.Min(1, ap));
                   ev += u[i] * s[i];
                }
            }

            for (int i = 0; i < children.Length; i++)
               regret[hole, i] += u[i] - ev;

            return ev;

         }
         else
         {
            for (int i = 0; i < children.Length; i++)
               cumulativeStrategy[hole, i] += s[i] / q;

            int a = SampleStrategy(s);
            return children[a].TrainAverageStrategySampling(trainplayer, iteration, q);
         }
      }

Or, Average Strategy Sampling as a probing strategy.
http://poker.cs.ualberta.ca/publications/AAAI12-generalmccfr.pdf

Code:

      public override double TrainAverageStrategyProbing(int trainplayer, Iteration iteration, double q, bool probe)
      {
         int hole = iteration.GetBucket(street, player);

        const double e = 0.5;

         var s = GetStrategy(hole);

         if (probe)
         {
            int a = SampleStrategy(s);
            return children[a].TrainAverageStrategyProbing(trainplayer, iteration, q, true);
         }
         else if (player == trainplayer)
         {
            var u = new double[children.Length];

            double ev = 0;

            double cs_sum = 0;

            for (int i = 0; i < children.Length; i++)
                cs_sum += cumulativeStrategy[hole, i];

            for (int i = 0; i < children.Length; i++)
            {
                double ap = cs_sum <= 0 ? 1 : Math.Max(e, cumulativeStrategy[hole, i]) / cs_sum);
                if (rnd.Value.NextDouble() <= ap)
                {
                   u[i] = children[i].TrainAverageStrategyProbing(trainplayer, iteration, q / ap, probe);
                }
                else
                {
                   u[i] = children[i].TrainAverageStrategyProbing(trainplayer, iteration, q, true);
                }

                   ev += u[i] * s[i];         
            }

            for (int i = 0; i < children.Length; i++)
               regret[hole, i] += (u[i] - ev) * q;

            return ev;

         }
         else
         {
            for (int i = 0; i < children.Length; i++)
               cumulativeStrategy[hole, i] += s[i] * q;

            int a = SampleStrategy(s);
            return children[a].TrainAverageStrategyProbing(trainplayer, iteration, q, probe);
         }
      }

Statistics: Posted by cantina — Tue Mar 05, 2013 3:34 pm