Image Image Image




Post new topic Reply to topic  [ 57 posts ]  Go to page 1, 2, 3  Next
Author Message
 Post subject: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Fri Jan 08, 2010 9:56 pm 
Offline
Regular member
User avatar

Posts: 64
Location: Edinburgh, Scotland flounderhead@gmail.com
Favourite Bot: Kryten
Another Waugh et.al. NIPS 22 paper. This is not directly poker related, but might still be of interest to people interested in regret minimzation.

Monte Carlo Sampling for Regret Minimization in Extensive Games
Author: Marc Lanctot and Kevin Waugh and Martin Zinkevich and Michael Bowling
Proceedings: Advances in Neural Information Processing Systems 22
Editors: Y. Bengio and D. Schuurmans and J. Lafferty and C. K. I. Williams and A. Culotta
Pages: 1078 - 1086
Year: 2009


Abstract:

Sequential decision-making with multiple agents and imperfect information is commonly modeled as an extensive game. One efficient method for computing Nash equilibria in large, zero-sum, imperfect information games is counterfactual regret minimization (CFR). In the domain of poker, CFR has proven effective, particularly when using a domain-specific augmentation involving chance outcome sampling. In this paper, we describe a general family of domain-independent CFR sample-based algorithms called Monte Carlo counterfactual regret minimization (MCCFR) of which the original and poker-specific versions are special cases. We start by showing that MCCFR performs the same regret updates as CFR on expectation. Then, we introduce two sampling schemes: outcome sampling and external sampling, showing that both have bounded overall regret with high probability. Thus, they can compute an approximate equilibrium using self-play. Finally, we prove a new tighter bound on the regret for the original CFR algorithm and relate this new bound to MCCFR’s bounds. We show empirically that, although the sample-based algorithms require more iterations, their lower cost per iteration can lead to dramatically faster convergence in various games.

http://books.nips.cc/papers/files/nips2 ... 9_0363.pdf
Supplementals (Algorithm and Proofs); http://books.nips.cc/papers/files/nips2 ... .extra.zip
Slide (one): http://books.nips.cc/papers/files/nips2 ... _slide.pdf
Audio (50 second abstract only): http://books.nips.cc/papers/files/nips2 ... 9_0363.mp3


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Ga
PostPosted: Fri Jan 22, 2010 11:44 pm 
Offline
Regular member
User avatar

Posts: 77
Favourite Bot: homebrew
There is a related presentation here: http://videolectures.net/icml09_lanctot_mccfr/.

Anyone use this paper? I am trying to do an implementation for the [0,1] game.
I am having trouble getting it to converge on the same strategy each time (given the same hand).

There is a difference in equation 10 (last term in weighting) between the presentation and the paper. Not sure which is correct yet.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Sat Jan 23, 2010 12:41 am 
Offline
Senior member
User avatar

Posts: 125
Favourite Bot: wetbot
Quote:
Anyone use this paper? I am trying to do an implementation for the [0,1] game.

I am currently implementing Chance-Sampled CFRM. What discretization of 0,1 are you using? My implementation seems to work well for "poker" games, but performing strangely in say the integerized 0,99 game. Not sure why yet.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Ga
PostPosted: Sat Jan 23, 2010 11:32 am 
Offline
Regular member
User avatar

Posts: 77
Favourite Bot: homebrew
wetbot wrote:
Quote:
Anyone use this paper? I am trying to do an implementation for the [0,1] game.

I am currently implementing Chance-Sampled CFRM. What discretization of 0,1 are you using? My implementation seems to work well for "poker" games, but performing strangely in say the integerized 0,99 game. Not sure why yet.


I am not discretising the [0,1] game. I don't think I need to. I'm not using chance-sample but outcome-sampled as described in Lanctot's paper above. The bot plays well (can beat me) but still does odd things like fold the nuts occasionally.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Sat Jan 23, 2010 10:54 pm 
Offline
Senior member
User avatar

Posts: 125
Favourite Bot: wetbot
OK, thanks for this. Have gone back and re-read paper in detail. I will infer that since they ignored chance-sampled CFRM in their experiments that they have also found that it doesn't perform well for 'non-poker' games, which I have confirmed. It would be nice to know, though, where 'poker' and 'non-poker' begins and ends; chance-sampled CFRM does not work well for one-card poker, for instance. I suppose this is an example of an area they highlight for future research.

Conversely, I wonder if outcome-sampling or external sampling works 'well' for 'poker' games, and if so, could either be better than chance-sampling? The paper is sadly silent on this.
Quote:
I'm not using chance-sample but outcome-sampled as described in Lanctot's paper above.

I salute you. Even after implementing chance-sampling (and outcome-sampling looks pretty darn similar), I don't think I can quite cut through the math in the paper to turn it into an implementation. Maybe you could post some more user-friendly explanation or pseudocode?
Quote:
I am not discretising the [0,1] game. I don't think I need to.

I don't see how this is possible within the context of these algorithms. Perhaps you've developed something novel? [0,1] has an infinite strategy space. Note that discretized 0,1 with n-discretizations is the same as One-Card Poker with an n-deck. They use One-Card Poker with n=500 as one of their experiments.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Sat Jan 23, 2010 11:02 pm 
Offline
Senior member
User avatar

Posts: 125
Favourite Bot: wetbot
Oh, also:
Quote:
I am having trouble getting it to converge on the same strategy each time (given the same hand).

This is not particularly surprising, given that they are non-deterministic algorithms and, depending on your particular game, an infinite number of equilibria may exist. The algorithm may be converging on a different equilibrium each time. Or, you might have a bug. ;) But, if you can rule out the latter, I'd suspect the former. What specific rules for [0,1] are you using?
Quote:
There is a difference in equation 10 (last term in weighting) between the presentation and the paper. Not sure which is correct yet.

Very bottom of p.5 in the paper under Eq 10 says that "an alternative form for Eq 10 is reccommended for implementation", which can be found in the supplemental materials. This probably explains the "discrepancy" you noted.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Sun Jan 24, 2010 12:40 pm 
Offline
PokerAI fellow
User avatar

Posts: 7731
Favourite Bot: V12
Only side remark - I really like the combination of a paper plus uploaded video lecture/presentation.

_________________
indiana


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Ga
PostPosted: Mon Jan 25, 2010 10:50 am 
Offline
Regular member
User avatar

Posts: 77
Favourite Bot: homebrew
wetbot wrote:
Oh, also:
Quote:
I am having trouble getting it to converge on the same strategy each time (given the same hand).

This is not particularly surprising, given that they are non-deterministic algorithms and, depending on your particular game, an infinite number of equilibria may exist. The algorithm may be converging on a different equilibrium each time. Or, you might have a bug. ;)

Probably a bug. It should never fold when it holds 1 for example.

wetbot wrote:
But, if you can rule out the latter, I'd suspect the former. What specific rules for [0,1] are you using?


Each player gets a number between 0 and 1. One posts small blind, the next big blind. Then small blind bets. Choices: Fold, call, raise by 1. Betting continues until both called or one folds.

wetbot wrote:
Quote:
There is a difference in equation 10 (last term in weighting) between the presentation and the paper. Not sure which is correct yet.

Very bottom of p.5 in the paper under Eq 10 says that "an alternative form for Eq 10 is reccommended for implementation", which can be found in the supplemental materials. This probably explains the "discrepancy" you noted.


No, the same formula appears different. I will just have to go through the derivations. I may email the author to ask.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Ga
PostPosted: Mon Jan 25, 2010 10:52 am 
Offline
Regular member
User avatar

Posts: 77
Favourite Bot: homebrew
indiana wrote:
Only side remark - I really like the combination of a paper plus uploaded video lecture/presentation.


Totally -- I had trouble with the paper (not too well written imho, though brilliant idea) until I saw the presentation.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Ga
PostPosted: Mon Jan 25, 2010 10:58 am 
Offline
Regular member
User avatar

Posts: 77
Favourite Bot: homebrew
wetbot wrote:
OK, thanks for this. Have gone back and re-read paper in detail. I will infer that since they ignored chance-sampled CFRM in their experiments that they have also found that it doesn't perform well for 'non-poker' games, which I have confirmed. It would be nice to know, though, where 'poker' and 'non-poker' begins and ends; chance-sampled CFRM does not work well for one-card poker, for instance. I suppose this is an example of an area they highlight for future research.


This is interesting. Maybe I should use chance-sampled, if you have had good results. I mainly wanted to use outcome sampling because it's easy to code (in theory) and should be very fast.

wetbot wrote:
Conversely, I wonder if outcome-sampling or external sampling works 'well' for 'poker' games, and if so, could either be better than chance-sampling? The paper is sadly silent on this.
Quote:
I'm not using chance-sample but outcome-sampled as described in Lanctot's paper above.

I salute you. Even after implementing chance-sampling (and outcome-sampling looks pretty darn similar), I don't think I can quite cut through the math in the paper to turn it into an implementation. Maybe you could post some more user-friendly explanation or pseudocode?


I'm all into sharing:
Code:
node_list_it sample_strategy_profile(double hero_hand, int hero) {
   // outcome-sampling mccfr
   list<node_list_it> z;   // a terminal history
   vector<node_list_it> terminal_nodes;   // a vector of terminal nodes to sample from
   list<node_list_it>::iterator it, itc;
   int players=strategy_profile.begin()->player_state.size();
   double eps=1e-4;
   double decay=0.999;
   double pi_z, pi_i_h[players], sigma_h_a, q;
   double cfr_sum;   // accumulated regret
   int action, actor;
   int u[players];   // payoff for last acting player
   double hands_sample[players];
   int n;
   node_list_it current_node, child, terminal_node;
   double sample_strategy_probs[NUM_CHILDREN]={0.2,0.2,0.6};
   
   // initialise probabilities
   for (n=0;n<players;n++) {
      pi_i_h[n]=1.0;
   }
   q=1.0; pi_z=1.0;
   
   current_node = strategy_profile.begin(); // root node
   //cout << "forward march" << endl;
   // sample a terminal history, z:
   // traverse forward
   while (!current_node->is_terminal) {
      if (current_node->is_chance) {
         // root node: pick opponent's hole cards (hand) from modelled distribution
         // no performance penalty for using un-abstracted cards for opp's hole
         for (n=0;n<players;n++) {
            hands_sample[n]=RNG();
         }
         action=0;
         //cout << "c " << opponent_hand << " ";
      } else {
         // normal node
         // sample probs
         action=choose_random_int(sample_strategy_probs, NUM_CHILDREN);
         q*=sample_strategy_probs[action];                  // sample probability
         pi_z*=current_node->pi[action];
         pi_i_h[current_node->actor]*=current_node->pi[action];   // history chosen by i probability
         current_node->children[action]->pi_i_h = pi_i_h[current_node->actor];
      }

      current_node->action=action;
      
      
      // sample using strategy profile so that we don't need to know opponent strategy
      // choose a child node, sample prob is currently strategy prob
      // traverse forward to compute actor's probability of of playing to reach each prefix of history, pi^sigma_i(h)
      
      //cout << "pl" << current_node->actor << " action:" << current_node->action;
      //cout << " q:" << q << " pi_z:" << pi_z;
      //cout << " is_terminal:" << current_node->is_terminal;
      //cout << " ID" << &*current_node << endl;

      // walk the path
      // & store history
      z.push_back(current_node);
      current_node = current_node->children[action];
      current_node->visits++;
   }
   z.push_back(current_node);   // the terminal node
   //cout << "pl" << current_node->actor << " action:" << current_node->action;
   //cout << " q:" << q << " pi_z//:" << pi_z;
   //cout << " is_terminal:" << current_node->is_terminal;
   //cout << " ID" << &*current_node << endl;
   
   /////////////////////////////////////////////////////////
   

   // current node is now terminal, so find payoff
   for (actor=0;actor<players;actor++) {
      u[actor]=payoff(current_node->player_state, current_node->parent->action, actor, hands_sample);
   }
   hands_sample[hero]=hero_hand;   // the hero's hand is known only to himself
                           // and therefore his expected payoff is different
   u[hero]=payoff(current_node->player_state, current_node->parent->action, hero, hands_sample);

   current_node->payoff*=decay;
   current_node->payoff+=(1.-decay)*u[current_node->parent->actor];
   
   //cout << "backward march" << endl;
   // traverse backward to compute actor's probability of playing remaining actions pi^sigma_i(h,z)
   it=z.end();
   while (it!=z.begin()) {
      --it;   // now node at back (or before)
      cfr_sum=0.0;
      node temp=**it;
      action =(*it)->parent->action;   // action taken to get us here
      actor  =(*it)->parent->actor;      // action taken to get us here
      
      // visit all siblings
      for (n=0;n<NUM_CHILDREN;n++) {      // pi(h)
         // during backward traversal, imm_cfr for information set on path (i.e. strategy node) is computed to update accumulated regret
         // is node in history or not
         sigma_h_a=(*it)->parent->pi[n];
         if ((*it)->parent->children[n]==*it) {
            // regret update if in history
            (*it)->parent->imm_cfr[n] += (1.0-sigma_h_a)*u[actor]*pi_z/((*it)->pi_i_h*q);
         } else {
            // regret update if not in history
            (*it)->parent->imm_cfr[n] -= sigma_h_a*u[actor]*pi_z/((*it)->pi_i_h*q);
         }
         cfr_sum += max((*it)->parent->imm_cfr[n], 0.0);
      }
      
      // regret matching update
      if (!(*it)->is_chance) {
         if (cfr_sum>eps) {
            for (n=0;n<NUM_CHILDREN;n++) {
               (*it)->parent->pi[n]=max(eps, (*it)->parent->imm_cfr[n])/cfr_sum;
            }
         } else {
            (*it)->parent->pi[n]=1.0/NUM_CHILDREN;
         }
         
         for (n=0;n<NUM_CHILDREN;n++) {
            // update of cumulative strategy table
            (*it)->parent->pi_ave[n] += pi_i_h[n]*(*it)->parent->pi[n];
         }
      }
   }
   
   it=z.begin();
   while ((*it)->is_chance) {
      it++;
   }
   return *it;
}



wetbot wrote:
Quote:
I am not discretising the [0,1] game. I don't think I need to.

I don't see how this is possible within the context of these algorithms. Perhaps you've developed something novel? [0,1] has an infinite strategy space. Note that discretized 0,1 with n-discretizations is the same as One-Card Poker with an n-deck. They use One-Card Poker with n=500 as one of their experiments.


I compute the strategy once the bot has its hand. I.e. I deal the hands, the bot calculates the strategy for that hand, and the opponent's hand is sampled and we only store his average strategy anyway.

It's fast, so I don't need to pre-compute.


By the way, is there a way to subscribe to a topic so I get emailed when someone replies? Otherwise I keep missing your replies.


Last edited by supert on Mon Jan 25, 2010 11:15 am, edited 1 time in total.

Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Ga
PostPosted: Mon Jan 25, 2010 11:08 am 
Offline
Regular member
User avatar

Posts: 77
Favourite Bot: homebrew
Don't know if this helps, but this is the tree I sample and the immediate cfr at each node.
Image


Attachments:
strategy.jpg
strategy.jpg [ 179.19 KB | Viewed 7354 times ]
Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Ga
PostPosted: Mon Jan 25, 2010 11:18 am 
Offline
PokerAI fellow
User avatar

Posts: 1239
Favourite Bot: my bot
supert wrote:
By the way, is there a way to subscribe to a topic so I get emailed when someone replies? Otherwise I keep missing your replies.


I never used it myself but you could try:
- user control panel
- board prefs
- edit posting defaults
- notify me upon replies by default


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Ga
PostPosted: Mon Jan 25, 2010 11:35 am 
Offline
PokerAI fellow
User avatar

Posts: 1239
Favourite Bot: my bot
supert wrote:
Don't know if this helps, but this is the tree I sample and the immediate cfr at each node.
Image


I can't view this image.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Ga
PostPosted: Mon Jan 25, 2010 12:08 pm 
Offline
Senior member
User avatar

Posts: 226
Location: Stockholm
Favourite Bot: ---
Quote:
wetbot wrote:
Quote:
I am not discretising the [0,1] game. I don't think I need to.

I don't see how this is possible within the context of these algorithms. Perhaps you've developed something novel? [0,1] has an infinite strategy space. Note that discretized 0,1 with n-discretizations is the same as One-Card Poker with an n-deck. They use One-Card Poker with n=500 as one of their experiments.


I compute the strategy once the bot has its hand. I.e. I deal the hands, the bot calculates the strategy for that hand, and the opponent's hand is sampled and we only store his average strategy anyway.

It's fast, so I don't need to pre-compute.


Since you're on a computer dealing numbers between [0,1] you've already discretized it.
I believe solving games with truly continuous elements requires a different tool set.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Ga
PostPosted: Mon Jan 25, 2010 7:58 pm 
Offline
Regular member
User avatar

Posts: 77
Favourite Bot: homebrew
djhemlig wrote:
Since you're on a computer dealing numbers between [0,1] you've already discretized it.
I believe solving games with truly continuous elements requires a different tool set.


Since I deal with only one bot hand at a time, and the opponent holds a distribution of hands, and I only store the average strategy over that distribution, I don't need to discretise. Conceptually, I have not discretised. If the computer went from double to quad precision it would not change my answer.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Ga
PostPosted: Tue Jan 26, 2010 12:12 am 
Offline
Regular member
User avatar

Posts: 77
Favourite Bot: homebrew
should be:

Code:
if ((*rit)->parent->children[n]==*rit) {
            // regret update if in history
            (*rit)->parent->imm_cfr[n] += (1.0/sigma_h_a - 1.0)*u[actor]*pi_z/((*rit)->pi_i_h*q);
         } else {
            // regret update if not in history
            (*rit)->parent->imm_cfr[n] -= u[actor]*pi_z/((*rit)->pi_i_h*q);
         }


results make more sense now.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Tue Jan 26, 2010 6:39 pm 
Offline
Senior member
User avatar

Posts: 125
Favourite Bot: wetbot
I'm afraid I have a lot of questions, because I think what you have done is very interesting. I hope you can entertain them.
Quote:
results make more sense now.

Does that mean that when you prime it with hand 0.0 that it never folds? You implied that you allow an infinite number of raises, correct? Is there a way to generate the Best Response to your epsilon-nash strategy to check the epsilon?
Quote:
current_node->payoff*=decay;
current_node->payoff+=(1.-decay)*u[current_node->parent->actor];

What are the two lines dong and why? I don't see this as being part of the algorithm described by the paper?
I think you are doing something novel here, possibly extending the algorithm beyond it's initial intended use, so I want to check my understanding of what you are doing.
Quote:
double sample_strategy_probs[NUM_CHILDREN]={0.2,0.2,0.6};

You seem to be sampling based on this fixed probability triple, also not suggested by the paper. Why did you do this, and how did you pick these numbers?
Quote:
Since I deal with only one bot hand at a time, and the opponent holds a distribution of hands, and I only store the average strategy over that distribution, I don't need to discretise.

You deal a hand to the bot, then find the epsilon-Nash solution for just that hand, right? So, each bot hand is played epsilon-Nash in real-time? Which is very cool, but then I suppose the average strategy you are capturing in pi_ave is the average probability triple for the bot at each node, without tracking the hands he has held? So, after the bot has seen some large number of hands, the probability triple may be epsilon-Nash, but doesn't tell us which specific hands fall in which category (ie can't distinguish value-bets from bluff-bets)?


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Ga
PostPosted: Wed Jan 27, 2010 12:05 am 
Offline
Regular member
User avatar

Posts: 77
Favourite Bot: homebrew
wetbot wrote:
I'm afraid I have a lot of questions, because I think what you have done is very interesting. I hope you can entertain them.
Quote:
results make more sense now.

Does that mean that when you prime it with hand 0.0 that it never folds? You implied that you allow an infinite number of raises, correct? Is there a way to generate the Best Response to your epsilon-nash strategy to check the epsilon?


Well, I also found a mistake in calculating pi_i_h.
It should always fold or check with 0.0, which it now does.
The number of raises is limited by the players' starting stacks.
I don't know how to calculate best-response for the strategy, though I can see it would be useful. I haven't been doing this very long.

wetbot wrote:
Quote:
current_node->payoff*=decay;
current_node->payoff+=(1.-decay)*u[current_node->parent->actor];

What are the two lines dong and why? I don't see this as being part of the algorithm described by the paper?
I think you are doing something novel here, possibly extending the algorithm beyond it's initial intended use, so I want to check my understanding of what you are doing.


I just store a moving average of the payoff for the last player to move. It's for monitoring purposes, I don't use it.

wetbot wrote:
Quote:
double sample_strategy_probs[NUM_CHILDREN]={0.2,0.2,0.6};

You seem to be sampling based on this fixed probability triple, also not suggested by the paper. Why did you do this, and how did you pick these numbers?


Yep. It was easy to code. So long as q_j>0 the result should converge to the same strategy.

wetbot wrote:
Quote:
Since I deal with only one bot hand at a time, and the opponent holds a distribution of hands, and I only store the average strategy over that distribution, I don't need to discretise.

You deal a hand to the bot, then find the epsilon-Nash solution for just that hand, right? So, each bot hand is played epsilon-Nash in real-time?


Yes. I want to be able to do approximate real-time strategies. This approach seems very flexible and could extend to >2 players and to calculating best-response to non-Nash opponents (I hope).

wetbot wrote:
Which is very cool,

:theboss

wetbot wrote:
but then I suppose the average strategy you are capturing in pi_ave is the average probability triple for the bot at each node, without tracking the hands he has held? So, after the bot has seen some large number of hands, the probability triple may be epsilon-Nash, but doesn't tell us which specific hands fall in which category (ie can't distinguish value-bets from bluff-bets)?


Not quite sure what you're saying here... pi_ave is the decision probability triple at each decision node; the average strategy over samplings, and it is the average strategy that converges to the Nash strategy. I don't track the opponent's previous hands.
I don't understand your last sentence, what do you mean?


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Games
PostPosted: Wed Jan 27, 2010 1:20 am 
Offline
Senior member
User avatar

Posts: 125
Favourite Bot: wetbot
Quote:
Does that mean that when you prime it with hand 0.0 that it never folds?
It should always fold or check with 0.0, which it now does.

Sorry, I forgot we are using opposing hand scales from one another. I should have said "Does that mean that when you prime it with the best hand that it never folds"?

However, in regards to the worst hand, it should be bluffing some percentage of the time. Though if you have set it up with unequal blinds (which the SB has to at least "call" to see showdown), it might not. You might want to just set the pot = 1 unit, and allow the first-to-act to check. In other words, emulate example 17.3 from Mathematics of Poker starting on p. 211, but still allow an unlimited number of raises. The solution to that example with a 2-bet cap is here: http://forumserver.twoplustwo.com/47/sc ... op-493645/

Quote:
Well, I also found a mistake in calculating pi_i_h.

What was it? I am taking what you have done and cobbling together a discretized version of Outcome Sampling CFRM. It's not working yet. Did you start out with discretized, or did you skip that entirely?

I noticed a couple possible bugs in your "regret matching update" section:
When you set pi[n] = 1.0/NUMCHILDREN, don't you have to loop through all the children (ie all the pi[n]'s) here? It appears to me that it's outside the loop.
When you update the cumulative strategy table, you reference pi_i_h[n] while looping through the n children, but when you calc pi_i_h, it's indexed by actor, not by child. Or maybe missing a pointer reference instead?

Quote:
I don't understand your last sentence, what do you mean?

Suppose instead of generating epsilon-Nash for a given hand in real-time, you wanted to precalculate the epsilon-Nash solution for all hands. You could do this by randomly sampling a very large number of "hero" hands, and as you said, the pi_ave moves to Nash values. You would then know raise, call, fold X,Y, and Z% of hands at each node based on the node's pi_ave probability triple. But you won't know which hands are which. The X% of bet/raise hands will be a composite of betting for value and bluffing, so you won't be able to play the epsilon-Nash for a given hand from these triples, because it's impossible to know which bucket the given hand belongs in. Let me know if I'm understanding it correctly. In contrast, with any discretized version, you are going to have a probability triple for each possible hand at each node, so in that case, you have compiled enough information to play Nash.


Top
 Profile E-mail  
 
 Post subject: Re: Monte Carlo Sampling for Regret Minimization in Extensive Ga
PostPosted: Wed Jan 27, 2010 8:12 pm 
Offline
Regular member
User avatar

Posts: 77
Favourite Bot: homebrew
Quote:
However, in regards to the worst hand, it should be bluffing some percentage of the time. Though if you have set it up with unequal blinds (which the SB has to at least "call" to see showdown), it might not. You might want to just set the pot = 1 unit, and allow the first-to-act to check. In other words, emulate example 17.3 from Mathematics of Poker starting on p. 211, but still allow an unlimited number of raises. The solution to that example with a 2-bet cap is here: http://forumserver.twoplustwo.com/47/sc ... op-493645/


This is a good idea. However I just lent out my copy of the book. Would you be able to post the exact rules? As I understand it from your statement, it is 1 in pot, 2 chips each, limit raise 1.

Quote:
Quote:
Well, I also found a mistake in calculating pi_i_h.

What was it? I am taking what you have done and cobbling together a discretized version of Outcome Sampling CFRM. It's not working yet. Did you start out with discretized, or did you skip that entirely?

child->pi_i_h should be divided by pi_i_h[n]

Quote:
I noticed a couple possible bugs in your "regret matching update" section:
When you set pi[n] = 1.0/NUMCHILDREN, don't you have to loop through all the children (ie all the pi[n]'s) here? It appears to me that it's outside the loop.


yep. thanks.

Quote:
When you update the cumulative strategy table, you reference pi_i_h[n] while looping through the n children, but when you calc pi_i_h, it's indexed by actor, not by child. Or maybe missing a pointer reference instead?


:piss should be ->action not ->actor, I think. Thanks. I am sure there are more bugs.

Quote:
Suppose instead of generating epsilon-Nash for a given hand in real-time, you wanted to precalculate the epsilon-Nash solution for all hands. You could do this by randomly sampling a very large number of "hero" hands, and as you said, the pi_ave moves to Nash values.


Yes, but once hero knows the hero's hand, the strategy would not be very useful. It would be playing without looking at your cards.

Quote:
You would then know raise, call, fold X,Y, and Z% of hands at each node based on the node's pi_ave probability triple. But you won't know which hands are which. The X% of bet/raise hands will be a composite of betting for value and bluffing, so you won't be able to play the epsilon-Nash for a given hand from these triples, because it's impossible to know which bucket the given hand belongs in. Let me know if I'm understanding it correctly.


I think you're correct, if I've understood.

Quote:
In contrast, with any discretized version, you are going to have a probability triple for each possible hand at each node, so in that case, you have compiled enough information to play Nash.


Yes, and your game tree would be N-times larger, if N is the number of discrete hands you can hold.


Top
 Profile E-mail  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 57 posts ]  Go to page 1, 2, 3  Next


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: