Poker-AI.org • View topic - Average Strategy Sampling

View unanswered posts | View active topics

Board index » Public Forums » AI Research

All times are UTC

Average Strategy Sampling

Page 1 of 1

[ 14 posts ]

Print view

Previous topic | Next topic

Author

Message

cantina

Post subject: Average Strategy Sampling

Posted: Tue Mar 05, 2013 3:34 pm

Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437

From the paper Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions.
http://poker.cs.ualberta.ca/publications/NIPS12.pdf

In the same format as Amax's posts:

Code:

      public override double TrainAverageStrategySampling(int trainplayer, Iteration iteration, double q)
      {
         int hole = iteration.GetBucket(street, player);

        // You must also devide the utility by Q at terminal nodes (showdown and fold)

        const double e = 0.5;
        const double t = 1000;
        const double b = 100000;

         var s = GetStrategy(hole);

         if (player == trainplayer)
         {
            var u = new double[children.Length];

            double ev = 0;

            double cs_sum = 0;

            for (int i = 0; i < children.Length; i++)
                cs_sum += cumulativeStrategy[hole, i];

            for (int i = 0; i < children.Length; i++)
            {
                double ap = Math.Max(e, (b + t * cumulativeStrategy[hole, i]) / (b + cs_sum));
                if (rnd.Value.NextDouble() < ap)
                {
                   u[i] = children[i].TrainAverageStrategySampling(trainplayer, iteration, q * Math.Min(1, ap));
                   ev += u[i] * s[i];
                }
            }

            for (int i = 0; i < children.Length; i++)
               regret[hole, i] += u[i] - ev;

            return ev;

         }
         else
         {
            for (int i = 0; i < children.Length; i++)
               cumulativeStrategy[hole, i] += s[i] / q;

            int a = SampleStrategy(s);
            return children[a].TrainAverageStrategySampling(trainplayer, iteration, q);
         }
      }

Or, Average Strategy Sampling as a probing strategy.
http://poker.cs.ualberta.ca/publications/AAAI12-generalmccfr.pdf

Code:

      public override double TrainAverageStrategyProbing(int trainplayer, Iteration iteration, double q, bool probe)
      {
         int hole = iteration.GetBucket(street, player);

        const double e = 0.5;

         var s = GetStrategy(hole);

         if (probe)
         {
            int a = SampleStrategy(s);
            return children[a].TrainAverageStrategyProbing(trainplayer, iteration, q, true);
         }
         else if (player == trainplayer)
         {
            var u = new double[children.Length];

            double ev = 0;

            double cs_sum = 0;

            for (int i = 0; i < children.Length; i++)
                cs_sum += cumulativeStrategy[hole, i];

            for (int i = 0; i < children.Length; i++)
            {
                double ap = cs_sum <= 0 ? 1 : Math.Max(e, cumulativeStrategy[hole, i]) / cs_sum);
                if (rnd.Value.NextDouble() <= ap)
                {
                   u[i] = children[i].TrainAverageStrategyProbing(trainplayer, iteration, q / ap, probe);
                }
                else
                {
                   u[i] = children[i].TrainAverageStrategyProbing(trainplayer, iteration, q, true);
                }

                   ev += u[i] * s[i];         
            }

            for (int i = 0; i < children.Length; i++)
               regret[hole, i] += (u[i] - ev) * q;

            return ev;

         }
         else
         {
            for (int i = 0; i < children.Length; i++)
               cumulativeStrategy[hole, i] += s[i] * q;

            int a = SampleStrategy(s);
            return children[a].TrainAverageStrategyProbing(trainplayer, iteration, q, probe);
         }
      }

Top

proud2bBot

Post subject: Re: Average Strategy Sampling

Posted: Mon Mar 11, 2013 10:37 pm

Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216

thanks for sharing the code Nasher. I wonder if anyone has used both approaches and has experience with which one converges faster. I am personally using the average strategy sampling (w/o probing) and it is converging much faster than MCCFRM in my case (having a lot of different betting options).

Top

cantina

Post subject: Re: Average Strategy Sampling

Posted: Tue Mar 12, 2013 6:58 pm

Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437

I've tried a union of the two, which converges extremely fast, but with limited accuracy. The latter version I posted, AS-Probing, might need the threshold and bonus requirements to work (I haven't tested it thoroughly). Plain probing is what I'm using.

I wonder how well these probabilistic approaches would work with an abstraction that is very close to the underlying game? I've somehow got the idea in my head that these algorithms take advantage of the fact that some abstractions are over-trained (and therefor more exploitable in the real game) after a certain convergence.

Top

proud2bBot

Post subject: Re: Average Strategy Sampling

Posted: Tue Mar 12, 2013 9:34 pm

Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216

what do you mean with overtrained? I see abstractions orthogonal to the algorithmic solving. If you are talking about overtraining in terms of e.g. hitting our FDs at the beginning a lot and changing the strategy accordingly to highly value them, then its part of parameters like exploration to avoid getting stuck in a local optimum. If you are using ASS, you can see that its not faster at the beginning, but once the first xM iterations are performed, more and more actions are not sampled anymore (or better sampled very rarely).

Top

cantina

Post subject: Re: Average Strategy Sampling

Posted: Wed Mar 13, 2013 7:42 am

Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437

I should have said 'over-fitting' instead of 'over-training.' By this I meant the point where the real game exploitability begins to increase while the abstracted game exploitability decreases. These approaches get to that threshold quickly. In an abstraction with high real-game correlation I wonder if they wouldn't cease to converge after a certain point.

Top

proud2bBot

Post subject: Re: Average Strategy Sampling

Posted: Wed Mar 13, 2013 12:52 pm

Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216

I can't prove it, but in my understanding, an abstraction that is closer to the real game should improve our strategy in terms of exploitability in both, the abstract and the real game, as long as you still sample hands/boards completely random.

Top

cantina

Post subject: Re: Average Strategy Sampling

Posted: Wed Mar 13, 2013 7:56 pm

Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437

I always have to think hard to understand your meaning, p2bb. I'm not sure if that's a good or a bad thing.

I'm saying these probabilistic CFRM algorithms might not work as well in very large abstractions, close to the real game. In the ASS paper, they did some comparisons to show the opposite, but in smaller abstractions.

Top

proud2bBot

Post subject: Re: Average Strategy Sampling

Posted: Wed Mar 13, 2013 8:34 pm

Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216

hahaha, sorry. My point was that I cannot see a reason why a probabilistic algorithm should perform worse for a "real world abstraction" compared to a regular one. It has been shown that larger abstractions dont need to lead to better results in general, but that was independent of the algorithm. The only effential difference I see between ASS and regular MC-CFRM is, that we don't necessary sample all branches of the tree anymore. This may lead to one worst case scenario: one branch (we call the action x) performs bad, hence is not chosen a lot of times, and then the cumulative strategy values of our other options increased so much that the branch x takes a long time to get chosen with their real optimal value. However, this is avoided by a) waiting until the algo really converges (which may take a while) and b) choosing "good" parameters for exploration and bonus.

Top

Coffee4tw

Post subject: Re: Average Strategy Sampling

Posted: Wed Mar 13, 2013 8:40 pm

Site Admin

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230

Didn't UofA publish a paper that shows that a bigger abstraction (therefore closer to the real game) doesn't necessarily lead to less exploitability in the real world versus a smaller abstraction?

Ah yes, this is the paper:
Abstraction Pathologies in Extensive Games
Authors: Kevin Waugh, David Schnizlein, Michael Bowling, and Duane Szafron.
Date 2009
http://poker.cs.ualberta.ca/publication ... action.pdf

_________________
Cheers.

Top

cantina

Post subject: Re: Average Strategy Sampling

Posted: Wed Mar 13, 2013 10:03 pm

Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437

That's true, c4tw, but I was thinking big-big, like slumbot's abstraction in the 2012 ACPC (half a terabyte, unabstracted flop/turn).

Top

cantina

Post subject: Re: Average Strategy Sampling

Posted: Wed Mar 13, 2013 10:22 pm

Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437

proud2bBot wrote:

a) waiting until the algo really converges (which may take a while) and b) choosing "good" parameters for exploration and bonus.

Those metrics aren't perfect.
a) How do you know when your algo *really* converges?
b) Same question.

You're essentially trading accuracy for speed by using educated guesses. In a very large abstraction (i.e. close to the real-world game), those inadequacies make you more exploitable, wouldn't you say? In a game where over-fitting is expected, that inaccuracy is a good thing. Make sense? This is just speculation on my part, and what seemed to be happening in my experimentation with extreme estimators.

Top

proud2bBot

Post subject: Re: Average Strategy Sampling

Posted: Wed Mar 13, 2013 10:43 pm

Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216

I see your argument and agree :-)

My guess would be that the effect would be very small however for the well-known algorithms (often, you can find evaluations with best-response exploitability comparisons), while the speed-up is huge. So practically, algorithms like ASS will always perform better given a limited amount of CPU hours; I'd guess if you crunch the numbers for like several weeks, one might go back to plain CFRM (as it doesnt matter if we converge after 10 or 100M iterations, if we run 100B iterations), but like many other areas, I guess we can only make guesses here.

Top

Nose

Post subject: Re: Average Strategy Sampling

Posted: Mon Jul 01, 2013 7:48 am

Regular Member

Joined: Sat May 25, 2013 7:36 am
Posts: 73

Coffee4tw wrote:

Not necessarily, but maybe worth mentioning: http://poker.cs.ualberta.ca/publications/ijcai2011_accelerated_best_response.pdf

http://poker.cs.ualberta.ca/publications/ijcai2011_accelerated_best_response.pdf (Page 2, Top left) wrote:

In this paper, we describe general techniques for accelerating best response calculations. The method uses the structure of information and utilities to avoid a full game tree traversal, while also being well-suited to parallel computation. As a result we are able for the first time to compute the worst case performance of non-trivial strategies in two-player limit Texas hold’em. After introducing these innovations, we use our technique to empirically answer a number of open questions related to abstraction and equilibrium approximation.

We show that in practice finer poker abstractions do produce better equilibrium approximations, but better worst-case performance does not always result in better performance in a tournament. These conclusions are drawn from evaluating the worst-case performance of over three dozen strategies involving ten total CPU years of computation.

Top

Isildur11

Post subject: Re: Average Strategy Sampling

Posted: Tue Jul 02, 2013 8:48 pm

Junior Member

Joined: Mon Jun 03, 2013 9:06 pm
Posts: 20

Coffee4tw wrote:

Didn't UofA publish a paper that shows that a bigger abstraction (therefore closer to the real game) doesn't necessarily lead to less exploitability in the real world versus a smaller abstraction?

not if you train your CFR against a best response in a wider abstraction (cf CFR-BR)

Top

Page 1 of 1

[ 14 posts ]

Board index » Public Forums » AI Research

All times are UTC

Who is online

Users browsing this forum: No registered users and 3 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum