Poker-AI.org Poker AI and Botting Discussion Forum 2013-07-02T20:48:13+00:00 http://poker-ai.org/phpbb/feed.php?f=24&t=5 2013-07-02T20:48:13+00:00 2013-07-02T20:48:13+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=5&p=4383#p4383 <![CDATA[Re: Average Strategy Sampling]]> Coffee4tw wrote:

Didn't UofA publish a paper that shows that a bigger abstraction (therefore closer to the real game) doesn't necessarily lead to less exploitability in the real world versus a smaller abstraction?


not if you train your CFR against a best response in a wider abstraction (cf CFR-BR)

Statistics: Posted by Isildur11 — Tue Jul 02, 2013 8:48 pm


]]>
2013-07-01T07:48:04+00:00 2013-07-01T07:48:04+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=5&p=4372#p4372 <![CDATA[Re: Average Strategy Sampling]]> Coffee4tw wrote:

Didn't UofA publish a paper that shows that a bigger abstraction (therefore closer to the real game) doesn't necessarily lead to less exploitability in the real world versus a smaller abstraction?

Ah yes, this is the paper:
Abstraction Pathologies in Extensive Games
Authors: Kevin Waugh, David Schnizlein, Michael Bowling, and Duane Szafron.
Date 2009
http://poker.cs.ualberta.ca/publication ... action.pdf


Not necessarily, but maybe worth mentioning: http://poker.cs.ualberta.ca/publications/ijcai2011_accelerated_best_response.pdf

http://poker.cs.ualberta.ca/publications/ijcai2011_accelerated_best_response.pdf (Page 2, Top left) wrote:

In this paper, we describe general techniques for accelerating best response calculations. The method uses the structure of information and utilities to avoid a full game tree traversal, while also being well-suited to parallel computation. As a result we are able for the first time to compute the worst case performance of non-trivial strategies in two-player limit Texas hold’em. After introducing these innovations, we use our technique to empirically answer a number of open questions related to abstraction and equilibrium approximation.

We show that in practice finer poker abstractions do produce better equilibrium approximations, but better worst-case performance does not always result in better performance in a tournament. These conclusions are drawn from evaluating the worst-case performance of over three dozen strategies involving ten total CPU years of computation.

Statistics: Posted by Nose — Mon Jul 01, 2013 7:48 am


]]>
2013-03-13T22:43:14+00:00 2013-03-13T22:43:14+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=5&p=3192#p3192 <![CDATA[Re: Average Strategy Sampling]]> My guess would be that the effect would be very small however for the well-known algorithms (often, you can find evaluations with best-response exploitability comparisons), while the speed-up is huge. So practically, algorithms like ASS will always perform better given a limited amount of CPU hours; I'd guess if you crunch the numbers for like several weeks, one might go back to plain CFRM (as it doesnt matter if we converge after 10 or 100M iterations, if we run 100B iterations), but like many other areas, I guess we can only make guesses here.

Statistics: Posted by proud2bBot — Wed Mar 13, 2013 10:43 pm


]]>
2013-03-13T22:22:15+00:00 2013-03-13T22:22:15+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=5&p=3191#p3191 <![CDATA[Re: Average Strategy Sampling]]> proud2bBot wrote:

a) waiting until the algo really converges (which may take a while) and b) choosing "good" parameters for exploration and bonus.


Those metrics aren't perfect.
a) How do you know when your algo *really* converges?
b) Same question.

You're essentially trading accuracy for speed by using educated guesses. In a very large abstraction (i.e. close to the real-world game), those inadequacies make you more exploitable, wouldn't you say? In a game where over-fitting is expected, that inaccuracy is a good thing. Make sense? This is just speculation on my part, and what seemed to be happening in my experimentation with extreme estimators.

Statistics: Posted by cantina — Wed Mar 13, 2013 10:22 pm


]]>
2013-03-13T22:03:56+00:00 2013-03-13T22:03:56+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=5&p=3190#p3190 <![CDATA[Re: Average Strategy Sampling]]> Statistics: Posted by cantina — Wed Mar 13, 2013 10:03 pm


]]>
2013-03-13T20:40:38+00:00 2013-03-13T20:40:38+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=5&p=3182#p3182 <![CDATA[Re: Average Strategy Sampling]]>
Ah yes, this is the paper:
Abstraction Pathologies in Extensive Games
Authors: Kevin Waugh, David Schnizlein, Michael Bowling, and Duane Szafron.
Date 2009
http://poker.cs.ualberta.ca/publication ... action.pdf

Statistics: Posted by Coffee4tw — Wed Mar 13, 2013 8:40 pm


]]>
2013-03-13T20:34:09+00:00 2013-03-13T20:34:09+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=5&p=3181#p3181 <![CDATA[Re: Average Strategy Sampling]]> Statistics: Posted by proud2bBot — Wed Mar 13, 2013 8:34 pm


]]>
2013-03-13T19:56:32+00:00 2013-03-13T19:56:32+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=5&p=3177#p3177 <![CDATA[Re: Average Strategy Sampling]]> I'm saying these probabilistic CFRM algorithms might not work as well in very large abstractions, close to the real game. In the ASS paper, they did some comparisons to show the opposite, but in smaller abstractions.

Statistics: Posted by cantina — Wed Mar 13, 2013 7:56 pm


]]>
2013-03-13T12:52:33+00:00 2013-03-13T12:52:33+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=5&p=3173#p3173 <![CDATA[Re: Average Strategy Sampling]]> Statistics: Posted by proud2bBot — Wed Mar 13, 2013 12:52 pm


]]>
2013-03-13T07:42:22+00:00 2013-03-13T07:42:22+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=5&p=3169#p3169 <![CDATA[Re: Average Strategy Sampling]]> Statistics: Posted by cantina — Wed Mar 13, 2013 7:42 am


]]>
2013-03-12T21:34:44+00:00 2013-03-12T21:34:44+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=5&p=3156#p3156 <![CDATA[Re: Average Strategy Sampling]]> Statistics: Posted by proud2bBot — Tue Mar 12, 2013 9:34 pm


]]>
2013-03-12T18:58:53+00:00 2013-03-12T18:58:53+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=5&p=3151#p3151 <![CDATA[Re: Average Strategy Sampling]]>
I wonder how well these probabilistic approaches would work with an abstraction that is very close to the underlying game? I've somehow got the idea in my head that these algorithms take advantage of the fact that some abstractions are over-trained (and therefor more exploitable in the real game) after a certain convergence.

Statistics: Posted by cantina — Tue Mar 12, 2013 6:58 pm


]]>
2013-03-11T22:37:17+00:00 2013-03-11T22:37:17+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=5&p=3120#p3120 <![CDATA[Re: Average Strategy Sampling]]> Statistics: Posted by proud2bBot — Mon Mar 11, 2013 10:37 pm


]]>
2013-03-05T15:34:32+00:00 2013-03-05T15:34:32+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=5&p=15#p15 <![CDATA[Average Strategy Sampling]]> http://poker.cs.ualberta.ca/publications/NIPS12.pdf

In the same format as Amax's posts:
Code:
      public override double TrainAverageStrategySampling(int trainplayer, Iteration iteration, double q)
      {
         int hole = iteration.GetBucket(street, player);

        // You must also devide the utility by Q at terminal nodes (showdown and fold)

        const double e = 0.5;
        const double t = 1000;
        const double b = 100000;

         var s = GetStrategy(hole);

         if (player == trainplayer)
         {
            var u = new double[children.Length];

            double ev = 0;

            double cs_sum = 0;

            for (int i = 0; i < children.Length; i++)
                cs_sum += cumulativeStrategy[hole, i];

            for (int i = 0; i < children.Length; i++)
            {
                double ap = Math.Max(e, (b + t * cumulativeStrategy[hole, i]) / (b + cs_sum));
                if (rnd.Value.NextDouble() < ap)
                {
                   u[i] = children[i].TrainAverageStrategySampling(trainplayer, iteration, q * Math.Min(1, ap));
                   ev += u[i] * s[i];
                }
            }

            for (int i = 0; i < children.Length; i++)
               regret[hole, i] += u[i] - ev;

            return ev;

         }
         else
         {
            for (int i = 0; i < children.Length; i++)
               cumulativeStrategy[hole, i] += s[i] / q;

            int a = SampleStrategy(s);
            return children[a].TrainAverageStrategySampling(trainplayer, iteration, q);
         }
      }


Or, Average Strategy Sampling as a probing strategy.
http://poker.cs.ualberta.ca/publications/AAAI12-generalmccfr.pdf
Code:
      public override double TrainAverageStrategyProbing(int trainplayer, Iteration iteration, double q, bool probe)
      {
         int hole = iteration.GetBucket(street, player);

        const double e = 0.5;

         var s = GetStrategy(hole);

         if (probe)
         {
            int a = SampleStrategy(s);
            return children[a].TrainAverageStrategyProbing(trainplayer, iteration, q, true);
         }
         else if (player == trainplayer)
         {
            var u = new double[children.Length];

            double ev = 0;

            double cs_sum = 0;

            for (int i = 0; i < children.Length; i++)
                cs_sum += cumulativeStrategy[hole, i];

            for (int i = 0; i < children.Length; i++)
            {
                double ap = cs_sum <= 0 ? 1 : Math.Max(e, cumulativeStrategy[hole, i]) / cs_sum);
                if (rnd.Value.NextDouble() <= ap)
                {
                   u[i] = children[i].TrainAverageStrategyProbing(trainplayer, iteration, q / ap, probe);
                }
                else
                {
                   u[i] = children[i].TrainAverageStrategyProbing(trainplayer, iteration, q, true);
                }

                   ev += u[i] * s[i];         
            }

            for (int i = 0; i < children.Length; i++)
               regret[hole, i] += (u[i] - ev) * q;

            return ev;

         }
         else
         {
            for (int i = 0; i < children.Length; i++)
               cumulativeStrategy[hole, i] += s[i] * q;

            int a = SampleStrategy(s);
            return children[a].TrainAverageStrategyProbing(trainplayer, iteration, q, probe);
         }
      }

Statistics: Posted by cantina — Tue Mar 05, 2013 3:34 pm


]]>