Image Image Image




Post new topic Reply to topic  [ 57 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Wed Jan 27, 2010 9:27 pm 
Offline
Senior member
User avatar

Posts: 125
Favourite Bot: wetbot
Quote:
This is a good idea. However I just lent out my copy of the book. Would you be able to post the exact rules?

Ah, forgot to mention, that example is "forced" pot-limit, meaning any bet or raise must be for "pot". I still think you should roll with it. One disadvantage of your method (that goes along with it's very cool advantages!) is that without being able to generate best response, it is difficult to test. This game is non-trivial and by priming the "hero" with hands around the range cutoffs of the known solution, you should be able to get a sanity test on the "Nash"-ness of your bot. If you didn't parse all of that 2+2 thread, the solution to the game (the range cutoffs) is given in the middle of post #5. As far as the exact rules of the game, make your tree look like this:
Code:
>>> temp.details()
Dead Money =  1
Acted   Bets    Pot Name
None (0, 0)   1.0 root
0     (0, 0)   1.0 root;CHECK
  1    (0, 0)   1.0 root;CHECK;check
  1    (0, 1)   2.0 root;CHECK;potbet
0     (0, 1)   2.0 root;CHECK;potbet;FOLD
0     (1, 1)   3.0 root;CHECK;potbet;CALL
0     (4, 1)   6.0 root;CHECK;potbet;POTBET
  1    (4, 1)   6.0 root;CHECK;potbet;POTBET;fold
  1    (4, 4)   9.0 root;CHECK;potbet;POTBET;call
0     (1, 0)   2.0 root;POTBET
  1    (1, 0)   2.0 root;POTBET;fold
  1    (1, 1)   3.0 root;POTBET;call
  1    (1, 4)   6.0 root;POTBET;potbet
0     (1, 4)   6.0 root;POTBET;potbet;FOLD
0     (4, 4)   9.0 root;POTBET;potbet;CALL

Quote:
Yes, but once hero knows the hero's hand, the strategy would not be very useful. It would be playing without looking at your cards.

Not following you here. I was just asking questions to ensure I'm understanding what your implementation does. It seems to solve individual hands in the full continuous 0,1 game with uncapped betting in real-time. That alone is a hell of an achievement. On the other hand, it cannot provide a solution to the full game. It solves one hand at a time. I think the MCCFRM algorithms were designed so that after some large number of iterations, the pi_ave values would be "useful" in the sense that the solution to a game could be published, and the algorithm would no longer be needed to "play the game". You've inverted it by using the algo in real-time, but your resulting pi_ave values won't have the same utility at the end. Right?

New questions:
(1) You seem to be using quite a low value for epsilon (1e-4). The paper says in the Experimental Results section, "we found that epsilon = 0.6 worked well across all games". How did you land at your value, and have you tried to "optimize" it's value? Of course, your sampling schema is quite different as well, so perhaps that is a confounding factor.
(2) For a given "hero hand", how do you tell when it is converged on the solution? How many iterations does this generally take?
(3) You seem to have quite the penchant for outside the box thinking. Do you see a way to uncouple the betting from discretization and solve for a full no-limit 0,1 game?


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Ga
PostPosted: Thu Jan 28, 2010 12:23 am 
Offline
Regular member
User avatar

Posts: 77
Favourite Bot: homebrew
I'm very tired and won't have much time for this until the end of the week. I will devote In the meantime:

wetbot wrote:
Quote:
Yes, but once hero knows the hero's hand, the strategy would not be very useful. It would be playing without looking at your cards.

Not following you here. I was just asking questions to ensure I'm understanding what your implementation does. It seems to solve individual hands in the full continuous 0,1 game with uncapped betting in real-time. That alone is a hell of an achievement.

Yes I misunderstood you; thanks. You correctly describe what I'm aiming at and what it will do when I debug successfully.

wetbot wrote:
On the other hand, it cannot provide a solution to the full game. It solves one hand at a time. I think the MCCFRM algorithms were designed so that after some large number of iterations, the pi_ave values would be "useful" in the sense that the solution to a game could be published, and the algorithm would no longer be needed to "play the game". You've inverted it by using the algo in real-time, but your resulting pi_ave values won't have the same utility at the end. Right?


That is correct. However if too slow, I could save the strategy and pre-compute it, using much the same code.

wetbot wrote:
New questions:
(1) You seem to be using quite a low value for epsilon (1e-4). The paper says in the Experimental Results section, "we found that epsilon = 0.6 worked well across all games". How did you land at your value, and have you tried to "optimize" it's value? Of course, your sampling schema is quite different as well, so perhaps that is a confounding factor.


The epsilon you refer to is a totally different epsilon. My bad naming scheme. The eps=1e-4 is to disallow a division by zero when calculating p through regret matching. Perhaps I shouldn't shouldn't have posted such intermediate stage code; it's definitely not yet right.
Their eps is to do with the sampling scheme as you observe.

wetbot wrote:
(2) For a given "hero hand", how do you tell when it is converged on the solution? How many iterations does this generally take?


I define another epsilon (as in epsilon-nash strategy), outside the routine I posted. The regret is bounded by the sum of the positive immediate cfr at each node, divided by T (the number of samples).
Code:
   for (int t=0;t<T;t++) {
      sample_strategy_profile(hero_hand, hero);
      R_bound=0.0;
      for (it=strategy_profile.begin();it!=strategy_profile.end();++it) {
         if (!it->is_terminal && !it->is_chance) {
            for (n=0;n<NUM_CHILDREN;n++) {
               R_bound += max(it->imm_cfr[n], 0.0);
            }
         }
      }
      R_bound/=t;
      file << R_bound << " " << endl;
      if (R_bound>epsilon) T++;
      if (T>T_max) break;
   }


Takes about 5s for a million samples with a game tree with about 10 chips each. It would probably need less to converge, were it converging to the correct strategy. I will not quote eps numbers since I am not yet converging to Nash, as they are meaningless at this point.

wetbot wrote:
(3) You seem to have quite the penchant for outside the box thinking. Do you see a way to uncouple the betting from discretization and solve for a full no-limit 0,1 game?

Well, I like to do things my way! My plan is to do NL. Perhaps with a choice at each node of fold/call/raise/jam where raise is decided by a rule-of-thumb formula. Kind of a semi-discretisation. I tried doing formula-based NL poker bots, they were all crap as they took no account of the opponent's strategy. Perhaps though I can resurrect some aspects wrt bet sizing.

If you feel able to share, I'd be interested in seeing your chance sampled implementation. You can pm about it if that's preferred for you.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Thu Jan 28, 2010 8:30 pm 
Offline
Senior member
User avatar

Posts: 125
Favourite Bot: wetbot
Quote:
I could save the strategy and pre-compute it, using much the same code.

Hmm. Code would be reusable, yes, but wouldn't you have to discretize to pre-compute?
Quote:
The eps=1e-4 is to disallow a division by zero when calculating p through regret matching.

Yeah, I'm running into the same problem now, too. Note that if either pi_i or pi_not_i goes to zero, the algo will have no effect on the cumulative CFR (because it is adding or subtracting a product containing a zero). So, what is the point of forcing q>0, when if sampled, it will have no effect on CFR? I don't recall seeing this addressed anywhere. Since my implementation isn't working yet either, I'm going to give a try allowing q to be zero when the respective pi is zero.
Code:
(*rit)->parent->imm_cfr[n] += (1.0/sigma_h_a - 1.0)*u[actor]*pi_z/((*rit)->pi_i_h*q);
(*rit)->parent->imm_cfr[n] -= u[actor]*pi_z/((*rit)->pi_i_h*q);

Why was it necessary to deviate from the paper here? How did you come up with these changes?
Quote:
I'd be interested in seeing your chance sampled implementation.

Chance-sampled is at least an order of magnitude easier to implement, because the documentation available is so much better. The two primary sources are the "Regret Minimization in Games with Incomplete Information" paper by Zinkevich et al, and Mike Johanson's thesis, specifically pp. 41-53. I will PM you my very poor implementation, but I would pretty much urge you to ignore it and roll your own from those docs. Mine is very sloppy.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Ga
PostPosted: Thu May 06, 2010 3:36 pm 
Offline
Regular member
User avatar

Posts: 77
Favourite Bot: homebrew
Just to clear up my own mess:

The code I posted earlier in this thread is wrong, but not by that much. I converged to a working version working with wetbot.

You DO have to discretise the [0,1] game (I was wrong).

Outcome sampling is awkward to implement because the documentation is not that good. It converges to a decent approximation fast for large games though, which may be useful.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Tue Jun 08, 2010 6:05 pm 
Offline
Regular member
User avatar

Posts: 95
Favourite Bot: man
Here's my implementation of outcome sampling in pseudo code, without greek:

Code:
train(node)

   if node is terminal
      for each player   value[player] = node.chance_sampled_value[player] * reach[opponent] / sample_probability
   
   else
      strategy = regret_matching(regret)
      cumulative_strategy += strategy * reach[player] / sample_probability

      sampled_action = sample_strategy(strategy, exploration)
      sample_probability *= probability of sample_strategy returning sampled_action
      reach[player] *= strategy[sampled_action]

      train(children[sampled_action])
      
      regret[sampled_action] += value[player]
      value[player] *= strategy[sampled_action]
      for each action   regret[action] -= value[player]



Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Ga
PostPosted: Sun Dec 12, 2010 6:21 pm 
Offline
PokerAI fellow
User avatar

Posts: 1115
Favourite Bot: Johnny #5
supert wrote:
Quote:
Quote:
Well, I also found a mistake in calculating pi_i_h.

What was it? I am taking what you have done and cobbling together a discretized version of Outcome Sampling CFRM. It's not working yet. Did you start out with discretized, or did you skip that entirely?

child->pi_i_h should be divided by pi_i_h[n]

How would this apply to the code you posted previously?


Top
 Profile  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Wed Nov 09, 2011 9:27 pm 
Offline
PokerAI fellow
User avatar

Posts: 1115
Favourite Bot: Johnny #5
I'm playing around with something similar right now and I'm curious if anybody can tell me how supert is calculating the utility?

u[actor]=payoff()

Is u[actor] then Z * outcome? Or, just the outcome of the sampled hand for player actor?


Top
 Profile  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Thu Nov 10, 2011 4:23 pm 
Offline
Regular member
User avatar

Posts: 95
Favourite Bot: man
Here are my C# implementations of chance sampling, external sampling and outcome sampling for one card poker. Also included is best response calculation for verifying the results.


Attachments:
CFRMTest.zip [6.73 KB]
Downloaded 159 times
Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Fri Nov 11, 2011 7:29 pm 
Offline
Regular member
User avatar

Posts: 95
Favourite Bot: man
I have also created a new sampling scheme that seems to beat the previous ones, but more about that later.

Attachment:
ocp.png
ocp.png [ 49.17 KB | Viewed 2949 times ]


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Thu Nov 17, 2011 12:54 am 
Offline
Regular member
User avatar

Posts: 95
Favourite Bot: man
An updated version with a holdem river solver.


Attachments:
RiverSolver.zip [105.85 KB]
Downloaded 144 times
Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Thu Dec 01, 2011 7:55 pm 
Offline
Senior member
User avatar

Posts: 124
Favourite Bot: coming
Thank you for sharing!

Code:
   
static Decision CreateGame()
      {
         return
            new Decision
            (
               0,
               new Decision
               (
                  1,
                  new Showdown(2),
                  new Fold(1)
               ),
               new Decision
               (
                  1,
                  new Decision
                  (
                     0,
                     new Showdown(2),
                     new Fold(-1)
                  ),
                  new Showdown(1)
               )
            );
      }


I looked at your code and I don't understand the means of the values in showdown() and fold(). Can you help me?


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Thu Dec 01, 2011 9:22 pm 
Offline
PokerAI fellow
User avatar

Posts: 1239
Favourite Bot: my bot
Its the amount won or lost on showdown or fold.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Thu Dec 01, 2011 10:14 pm 
Offline
Regular member
User avatar

Posts: 95
Favourite Bot: man
Yeah it's a little confusing:

Fold(N): player 0 wins N, player 1 loses N
Showdown(N): better hand wins N, worse hand loses N


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Thu Dec 01, 2011 10:46 pm 
Offline
PokerAI fellow
User avatar

Posts: 1115
Favourite Bot: Johnny #5
Should the amount won/lost be the whole pot, or just what trainplayer put in?


Top
 Profile  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Thu Dec 01, 2011 11:46 pm 
Offline
Senior member
User avatar

Posts: 124
Favourite Bot: coming
c2008 wrote:
Should the amount won/lost be the whole pot, or just what trainplayer put in?


That's what confused me. I guess both amount can be used in case blinds/antes are the same for both players.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Tue Dec 06, 2011 11:34 am 
Offline
Junior member
User avatar

Posts: 11
Favourite Bot: phil ivey
Can someone please help me understand why chance sampled CFRM would be better than vanilla CFRM for poker style games? While watching the lecture I got the impression that I don’t quiet understand what an iteration is in vanilla CFRM so maybe this is the problem.

What I was thinking is instead of sampling the hands for both players why not just iterate through all the possible matchups. This would guarantee that all matchups are played through the same number of times. With chance sampled CFRM some hands will and up being slightly more likely than others. I guess the idea is that after a large number of iterations this difference will be insignificant but it still seems like it’s introducing an unnecessary error.

So my questions are:
1. How is an iteration in vanilla CFRM different from chance sampled CFRM?
2. Is there a name for what I am suggesting and wouldn’t it be better than chance sampled CFRM? So far my tests show that it’s about the same per iteration but iterations are faster because of the cost associated with sampling hands.


Top
 Profile  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Wed Dec 07, 2011 1:00 pm 
Offline
PokerAI fellow
User avatar

Posts: 1239
Favourite Bot: my bot
ymca wrote:
Can someone please help me understand why chance sampled CFRM would be better than vanilla CFRM for poker style games? While watching the lecture I got the impression that I don’t quiet understand what an iteration is in vanilla CFRM so maybe this is the problem.

What I was thinking is instead of sampling the hands for both players why not just iterate through all the possible matchups. This would guarantee that all matchups are played through the same number of times. With chance sampled CFRM some hands will and up being slightly more likely than others. I guess the idea is that after a large number of iterations this difference will be insignificant but it still seems like it’s introducing an unnecessary error.

So my questions are:
1. How is an iteration in vanilla CFRM different from chance sampled CFRM?
2. Is there a name for what I am suggesting and wouldn’t it be better than chance sampled CFRM? So far my tests show that it’s about the same per iteration but iterations are faster because of the cost associated with sampling hands.


I think this thread should answer some of your questions. If you search around you will find others but I think this one is a good starting point.

Using this nomemclature:
(1) Vanilla CFRM (when it was originally published, it was simply called CFRM): Johanson & Zinkevich
(2) Chance-sampled CFRM (formerly known as "Poker-Specific" CFRM): Johanson & Zinkevich
(3) Outcome-sampled CFRM: Lanctot
(4) External-sampled CFRM: Lanctot

Vanilla CFRM visits all nodes, even if they are very rarely visited in reality. Chance sampled CFRM visits nodes as dictated by the random draw of cards. Chance sampled is faster because it doesn't expend a lot of effort on situations that don't arise very often. Somewhere (repeated in thread) MBJ said 3 and 4 were slower than 2.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Thu Dec 08, 2011 11:18 am 
Offline
Junior member
User avatar

Posts: 11
Favourite Bot: phil ivey
Thanks spears I remember reading that thread a while back but have probably forgotten most of the details. Will have another look it and some of the relevant papers.


Top
 Profile  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Sat Dec 17, 2011 2:31 am 
Offline
PokerAI fellow
User avatar

Posts: 1115
Favourite Bot: Johnny #5
In CSCFRM, texture buckets can just be defined as additional branching in the game tree, correct? For example, the training function in a texture class would just return the child node of the appropriate bucket to it's parent, without taking into consideration the other branches connected to the other texture buckets?


Top
 Profile  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Fri Dec 23, 2011 11:34 am 
Offline
Senior member
User avatar

Posts: 124
Favourite Bot: coming
I tried to compute Push/Fold strategy in NLHE with amax's code.
I get something close to what can be found here : http://www.holdemresources.net/hr/sngs/hucalculator.html

I also tried to remove isomorphic starting hands using 169 and not 1326 ones but it was not good. I have an idea why but it is not clear in my mind.

Give it a try, I'd like to share about this.


Attachments:
Push_Fold.7z [870.25 KB]
Downloaded 100 times
Top
 Profile E-mail  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 57 posts ]  Go to page Previous  1, 2, 3  Next


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: