Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 2:25 pm

All times are UTC




Post new topic Reply to topic  [ 14 posts ] 
Author Message
PostPosted: Tue Mar 05, 2013 3:34 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
From the paper Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions.
http://poker.cs.ualberta.ca/publications/NIPS12.pdf

In the same format as Amax's posts:
Code:
      public override double TrainAverageStrategySampling(int trainplayer, Iteration iteration, double q)
      {
         int hole = iteration.GetBucket(street, player);

        // You must also devide the utility by Q at terminal nodes (showdown and fold)

        const double e = 0.5;
        const double t = 1000;
        const double b = 100000;

         var s = GetStrategy(hole);

         if (player == trainplayer)
         {
            var u = new double[children.Length];

            double ev = 0;

            double cs_sum = 0;

            for (int i = 0; i < children.Length; i++)
                cs_sum += cumulativeStrategy[hole, i];

            for (int i = 0; i < children.Length; i++)
            {
                double ap = Math.Max(e, (b + t * cumulativeStrategy[hole, i]) / (b + cs_sum));
                if (rnd.Value.NextDouble() < ap)
                {
                   u[i] = children[i].TrainAverageStrategySampling(trainplayer, iteration, q * Math.Min(1, ap));
                   ev += u[i] * s[i];
                }
            }

            for (int i = 0; i < children.Length; i++)
               regret[hole, i] += u[i] - ev;

            return ev;

         }
         else
         {
            for (int i = 0; i < children.Length; i++)
               cumulativeStrategy[hole, i] += s[i] / q;

            int a = SampleStrategy(s);
            return children[a].TrainAverageStrategySampling(trainplayer, iteration, q);
         }
      }


Or, Average Strategy Sampling as a probing strategy.
http://poker.cs.ualberta.ca/publications/AAAI12-generalmccfr.pdf
Code:
      public override double TrainAverageStrategyProbing(int trainplayer, Iteration iteration, double q, bool probe)
      {
         int hole = iteration.GetBucket(street, player);

        const double e = 0.5;

         var s = GetStrategy(hole);

         if (probe)
         {
            int a = SampleStrategy(s);
            return children[a].TrainAverageStrategyProbing(trainplayer, iteration, q, true);
         }
         else if (player == trainplayer)
         {
            var u = new double[children.Length];

            double ev = 0;

            double cs_sum = 0;

            for (int i = 0; i < children.Length; i++)
                cs_sum += cumulativeStrategy[hole, i];

            for (int i = 0; i < children.Length; i++)
            {
                double ap = cs_sum <= 0 ? 1 : Math.Max(e, cumulativeStrategy[hole, i]) / cs_sum);
                if (rnd.Value.NextDouble() <= ap)
                {
                   u[i] = children[i].TrainAverageStrategyProbing(trainplayer, iteration, q / ap, probe);
                }
                else
                {
                   u[i] = children[i].TrainAverageStrategyProbing(trainplayer, iteration, q, true);
                }

                   ev += u[i] * s[i];         
            }

            for (int i = 0; i < children.Length; i++)
               regret[hole, i] += (u[i] - ev) * q;

            return ev;

         }
         else
         {
            for (int i = 0; i < children.Length; i++)
               cumulativeStrategy[hole, i] += s[i] * q;

            int a = SampleStrategy(s);
            return children[a].TrainAverageStrategyProbing(trainplayer, iteration, q, probe);
         }
      }


Top
 Profile  
 
PostPosted: Mon Mar 11, 2013 10:37 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
thanks for sharing the code Nasher. I wonder if anyone has used both approaches and has experience with which one converges faster. I am personally using the average strategy sampling (w/o probing) and it is converging much faster than MCCFRM in my case (having a lot of different betting options).


Top
 Profile  
 
PostPosted: Tue Mar 12, 2013 6:58 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
I've tried a union of the two, which converges extremely fast, but with limited accuracy. The latter version I posted, AS-Probing, might need the threshold and bonus requirements to work (I haven't tested it thoroughly). Plain probing is what I'm using.

I wonder how well these probabilistic approaches would work with an abstraction that is very close to the underlying game? I've somehow got the idea in my head that these algorithms take advantage of the fact that some abstractions are over-trained (and therefor more exploitable in the real game) after a certain convergence.


Top
 Profile  
 
PostPosted: Tue Mar 12, 2013 9:34 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
what do you mean with overtrained? I see abstractions orthogonal to the algorithmic solving. If you are talking about overtraining in terms of e.g. hitting our FDs at the beginning a lot and changing the strategy accordingly to highly value them, then its part of parameters like exploration to avoid getting stuck in a local optimum. If you are using ASS, you can see that its not faster at the beginning, but once the first xM iterations are performed, more and more actions are not sampled anymore (or better sampled very rarely).


Top
 Profile  
 
PostPosted: Wed Mar 13, 2013 7:42 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
I should have said 'over-fitting' instead of 'over-training.' By this I meant the point where the real game exploitability begins to increase while the abstracted game exploitability decreases. These approaches get to that threshold quickly. In an abstraction with high real-game correlation I wonder if they wouldn't cease to converge after a certain point.


Top
 Profile  
 
PostPosted: Wed Mar 13, 2013 12:52 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
I can't prove it, but in my understanding, an abstraction that is closer to the real game should improve our strategy in terms of exploitability in both, the abstract and the real game, as long as you still sample hands/boards completely random.


Top
 Profile  
 
PostPosted: Wed Mar 13, 2013 7:56 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
I always have to think hard to understand your meaning, p2bb. I'm not sure if that's a good or a bad thing. :) I'm saying these probabilistic CFRM algorithms might not work as well in very large abstractions, close to the real game. In the ASS paper, they did some comparisons to show the opposite, but in smaller abstractions.


Top
 Profile  
 
PostPosted: Wed Mar 13, 2013 8:34 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
hahaha, sorry. My point was that I cannot see a reason why a probabilistic algorithm should perform worse for a "real world abstraction" compared to a regular one. It has been shown that larger abstractions dont need to lead to better results in general, but that was independent of the algorithm. The only effential difference I see between ASS and regular MC-CFRM is, that we don't necessary sample all branches of the tree anymore. This may lead to one worst case scenario: one branch (we call the action x) performs bad, hence is not chosen a lot of times, and then the cumulative strategy values of our other options increased so much that the branch x takes a long time to get chosen with their real optimal value. However, this is avoided by a) waiting until the algo really converges (which may take a while) and b) choosing "good" parameters for exploration and bonus.


Top
 Profile  
 
PostPosted: Wed Mar 13, 2013 8:40 pm 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
Didn't UofA publish a paper that shows that a bigger abstraction (therefore closer to the real game) doesn't necessarily lead to less exploitability in the real world versus a smaller abstraction?

Ah yes, this is the paper:
Abstraction Pathologies in Extensive Games
Authors: Kevin Waugh, David Schnizlein, Michael Bowling, and Duane Szafron.
Date 2009
http://poker.cs.ualberta.ca/publication ... action.pdf

_________________
Cheers.


Top
 Profile  
 
PostPosted: Wed Mar 13, 2013 10:03 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
That's true, c4tw, but I was thinking big-big, like slumbot's abstraction in the 2012 ACPC (half a terabyte, unabstracted flop/turn).


Top
 Profile  
 
PostPosted: Wed Mar 13, 2013 10:22 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
proud2bBot wrote:
a) waiting until the algo really converges (which may take a while) and b) choosing "good" parameters for exploration and bonus.


Those metrics aren't perfect.
a) How do you know when your algo *really* converges?
b) Same question.

You're essentially trading accuracy for speed by using educated guesses. In a very large abstraction (i.e. close to the real-world game), those inadequacies make you more exploitable, wouldn't you say? In a game where over-fitting is expected, that inaccuracy is a good thing. Make sense? This is just speculation on my part, and what seemed to be happening in my experimentation with extreme estimators.


Top
 Profile  
 
PostPosted: Wed Mar 13, 2013 10:43 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
I see your argument and agree :-) My guess would be that the effect would be very small however for the well-known algorithms (often, you can find evaluations with best-response exploitability comparisons), while the speed-up is huge. So practically, algorithms like ASS will always perform better given a limited amount of CPU hours; I'd guess if you crunch the numbers for like several weeks, one might go back to plain CFRM (as it doesnt matter if we converge after 10 or 100M iterations, if we run 100B iterations), but like many other areas, I guess we can only make guesses here.


Top
 Profile  
 
PostPosted: Mon Jul 01, 2013 7:48 am 
Offline
Regular Member
User avatar

Joined: Sat May 25, 2013 7:36 am
Posts: 73
Coffee4tw wrote:
Didn't UofA publish a paper that shows that a bigger abstraction (therefore closer to the real game) doesn't necessarily lead to less exploitability in the real world versus a smaller abstraction?

Ah yes, this is the paper:
Abstraction Pathologies in Extensive Games
Authors: Kevin Waugh, David Schnizlein, Michael Bowling, and Duane Szafron.
Date 2009
http://poker.cs.ualberta.ca/publication ... action.pdf


Not necessarily, but maybe worth mentioning: http://poker.cs.ualberta.ca/publications/ijcai2011_accelerated_best_response.pdf

http://poker.cs.ualberta.ca/publications/ijcai2011_accelerated_best_response.pdf (Page 2, Top left) wrote:
In this paper, we describe general techniques for accelerating best response calculations. The method uses the structure of information and utilities to avoid a full game tree traversal, while also being well-suited to parallel computation. As a result we are able for the first time to compute the worst case performance of non-trivial strategies in two-player limit Texas hold’em. After introducing these innovations, we use our technique to empirically answer a number of open questions related to abstraction and equilibrium approximation.

We show that in practice finer poker abstractions do produce better equilibrium approximations, but better worst-case performance does not always result in better performance in a tournament. These conclusions are drawn from evaluating the worst-case performance of over three dozen strategies involving ten total CPU years of computation.


Top
 Profile  
 
PostPosted: Tue Jul 02, 2013 8:48 pm 
Offline
Junior Member
User avatar

Joined: Mon Jun 03, 2013 9:06 pm
Posts: 20
Coffee4tw wrote:
Didn't UofA publish a paper that shows that a bigger abstraction (therefore closer to the real game) doesn't necessarily lead to less exploitability in the real world versus a smaller abstraction?


not if you train your CFR against a best response in a wider abstraction (cf CFR-BR)


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group