Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 2:10 pm

All times are UTC




Post new topic Reply to topic  [ 41 posts ]  Go to page Previous  1, 2, 3  Next
Author Message
PostPosted: Wed Jul 17, 2013 3:31 pm 
Offline
Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34
a strategy is a set of action probabilities for every situation that can occur. a nash equilibrium states that no player can alter their strategy to improve their EV.

during the self play convergence, openshoving 200bb will be tried until it's refuted. this could mean calling with AA 100% of the time, and calling with 27o 0.002% of the time. clearly it would be better if we only called with AA, but the response is still enough to refute the openpush. if we didn't have a response which refutes it, our opponent will be able to gain EV by using the openpush and it will start to use it during the self play iterations.


Top
 Profile  
 
PostPosted: Wed Jul 17, 2013 3:44 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
@OneDay In support of your pov http://poker-ai.org/archive/www.pokerai ... 329#p41329


Top
 Profile  
 
PostPosted: Wed Jul 17, 2013 7:43 pm 
Offline
Regular Member

Joined: Sun Mar 03, 2013 11:55 am
Posts: 64
somehomelessguy wrote:
a strategy is a set of action probabilities for every situation that can occur. a nash equilibrium states that no player can alter their strategy to improve their EV.

during the self play convergence, openshoving 200bb will be tried until it's refuted. this could mean calling with AA 100% of the time, and calling with 27o 0.002% of the time. clearly it would be better if we only called with AA, but the response is still enough to refute the openpush. if we didn't have a response which refutes it, our opponent will be able to gain EV by using the openpush and it will start to use it during the self play iterations.

This makes sense. However, surely it is possible that we don't stop playing an action purely because of the opponent response, but we could stop playing an action because we have found a better alternative. That would still prevent the response from evolving into a sensible/optimal response to that action.


Top
 Profile  
 
PostPosted: Wed Jul 17, 2013 8:19 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
Can you think of a toy game that demonstrates the problem that we can solve with CFRM or FP?


Top
 Profile  
 
PostPosted: Wed Jul 17, 2013 9:01 pm 
Offline
Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34
OneDayItllWork wrote:
somehomelessguy wrote:
a strategy is a set of action probabilities for every situation that can occur. a nash equilibrium states that no player can alter their strategy to improve their EV.

during the self play convergence, openshoving 200bb will be tried until it's refuted. this could mean calling with AA 100% of the time, and calling with 27o 0.002% of the time. clearly it would be better if we only called with AA, but the response is still enough to refute the openpush. if we didn't have a response which refutes it, our opponent will be able to gain EV by using the openpush and it will start to use it during the self play iterations.

This makes sense. However, surely it is possible that we don't stop playing an action purely because of the opponent response, but we could stop playing an action because we have found a better alternative. That would still prevent the response from evolving into a sensible/optimal response to that action.


yes, if there's always a better alternative, he can't switch to the open push with any of his hands to gain EV, therefore the criteria for nash equilibrium holds. it is still possible that open push generates positive EV (or quite obviously, open pushing AA preflop is +EV), but that doesn't matter. he will earn more by raising small etc.

the response to open push which will evolve is one where none of the opponent hands can gain EV by switching to this action. this response is not necessarily "the best", but still enough to refute the open push. there does exist a dominating nash equilibrium which has the "best" response, but afaik no efficient algorithms exists to find that specific equilibrium.


Top
 Profile  
 
PostPosted: Wed Jul 17, 2013 10:28 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
External Sampling looks something like below. Notice the regret is always updated and every child node traversed regardless of strategy probability (when it's trainplayer's turn).

Code:

double train(trainplayer)
{

  s[] = normalizedPositive(regret);

  if (turn == trainplayer) {

    for (i=0; i++; i<children.length) {
      u[i] = children[i].train(trainplayer);
      ev += u[i];
    }

    for (i=0; i++; i<children.length) {
      regret[i] += u[i] - ev;
      cumulative_strategy[i] += s[i];
    }
 
    return ev;

  } else {

    a = Sample(s);
    return children[a].train(trainplayer);

  }

}


Code:
rootnode.train(0);
rootnode.train(1);


Top
 Profile  
 
PostPosted: Wed Jul 17, 2013 11:49 pm 
Offline
Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34
Nasher wrote:
External Sampling looks something like below. Notice the regret is always updated and every child node traversed regardless of strategy probability (when it's trainplayer's turn).



IIRC external sampling samples the opponents actions, so if they do not contain the open push, it's not traversed.


Top
 Profile  
 
PostPosted: Thu Jul 18, 2013 10:49 am 
Offline
Regular Member

Joined: Sun Mar 03, 2013 11:55 am
Posts: 64
Doing testing with my algo, trying to work out this behaviour. I think I have a problem here:

We start of with no idea how to play poker, all in shoves for high ranges work pretty well. As we tighten up the calling range the shoving hands move higher and higher. We eventually work out that we can produce more profit by playing the higher end of our range a little slower and stop the first in shoves PF with the top of our range. By now, the optimal strategy is folding almost everything to first in shoves, which is what we want.

Now, as we all know, the most complex hands to play are the marginal ones, so at the same time as we're working out how to play the top of the range, we're really struggling with the marginal hands. That combined with the fact we're also now folding everything to first in shoves leads to a strategy shift to shove marginal hands such as 55 / 66 / 77 / 88 / AJ / AT, as we see more profit forcing the folds than by trying to play them post flop. So the calling strategy then changes, we start calling all in shoves with 88 AT+ or whatever. First in shoves then become a bad idea again and get eradicated, never shifting up to the top of the range, leaving the response strategy as calling with 88 AT+.

What I can do about this, I have no idea... it's an interesting problem though. I'm guessing an issue with all chance sampled optimal strategy calculation methods.


Last edited by OneDayItllWork on Thu Jul 18, 2013 12:39 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Thu Jul 18, 2013 11:48 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
somehomelessguy wrote:
IIRC external sampling samples the opponents actions, so if they do not contain the open push, it's not traversed.

It samples the op's action when it's not training that player, yes. But, otherwise, it touches every node. You train both players, yes?


Top
 Profile  
 
PostPosted: Thu Jul 18, 2013 9:21 pm 
Offline
Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34
@OneDay
i don't know how your algorithm works, but that would not be an issue using CFRM. with that calling range big hands would start switching back to push, assuming we're still talking about 200bb deep.
Nasher wrote:
It samples the op's action when it's not training that player, yes. But, otherwise, it touches every node. You train both players, yes?


i suppose the response to open push is "touched" when we are not training the player which responds to the open push, yes. but if the opponent never pushes it's never updated.


Top
 Profile  
 
PostPosted: Fri Jul 19, 2013 6:58 am 
Offline
Regular Member

Joined: Sun Mar 03, 2013 11:55 am
Posts: 64
somehomelessguy wrote:
@OneDay
i don't know how your algorithm works, but that would not be an issue using CFRM. with that calling range big hands would start switching back to push, assuming we're still talking about 200bb deep.

It may switch back eventually - I'm not done much with it yet. As you can imagine, doing anything game theory related with 6max NL is a slow process!


Top
 Profile  
 
PostPosted: Fri Jul 19, 2013 10:40 pm 
Offline
Junior Member
User avatar

Joined: Fri Apr 05, 2013 2:21 am
Posts: 11
Here's a toy game with a dominated choice. Player 1 decides "A" or "B" and then both players play rock-paper-scissors. If player 1 picks A, the winner gets 1 point. If he picks B then player 1 gets 0.5 points for winning with rock and 1 point for other wins. Player 2 always gets 1 point for wins.

Obviously player 1 should never pick B but player 2 still needs an optimal strategy against it. If his strategy is suboptimal it may make it profitable for player 1 to pick B.

Tysen


Top
 Profile  
 
PostPosted: Fri Jul 19, 2013 11:38 pm 
Offline
Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34
trojanrabbit wrote:
Here's a toy game with a dominated choice. Player 1 decides "A" or "B" and then both players play rock-paper-scissors. If player 1 picks A, the winner gets 1 point. If he picks B then player 1 gets 0.5 points for winning with rock and 1 point for other wins. Player 2 always gets 1 point for wins.

Obviously player 1 should never pick B but player 2 still needs an optimal strategy against it. If his strategy is suboptimal it may make it profitable for player 1 to pick B.

Tysen


it may, but player 2 doesn't need an optimal strategy. he just needs a strategy which doesn't suck enough, which i believe is OP's problem. optimal would be to pick more scissors when 1 has chosen B (if i'm thinking correctly), but playing uniformly 1/3, 1/3, 1/3 would be good enough and a nash equilibrium strategy.

because the dominated choice gets refuted quickly, the strategy against poor choices rarely converge to optimal, which is why you see stuff like calling 200bb shoves with 46s sometimes.


Top
 Profile  
 
PostPosted: Sat Aug 10, 2013 6:35 pm 
Offline
Veteran Member

Joined: Mon Mar 04, 2013 9:40 pm
Posts: 269
I saw this formula dealing with when to open shove which makes sense:

http://www.reddit.com/r/poker/comments/ ... hove_math/

So generally an open shove UTG would assume that calling range of 66+,ATs+,KQs,AJo+. 77 is .41 which yields .79BB long term. That is pretty close to a coin flip but still +EV long term. 66 and AJo are at the low end of the range at .36. I am going to guess that is the breakeven point. It looks like the open shove call range would be anything above .41 or AQ+, 88+.

This would generally be the basis for countering any short stack strategy which is common (like 20-40bb). You could probably use this for any open shove from any position although you could call lighter based on his position as his shoving range is wider. The above is worst case scenario. Like an open shove from the button he would assume your calling range to be 55+,A9s+,KQs,ATo+,KQo so for your call to be profitable you could call maybe 77+, AJo+, ATs+.

I think that is how you would generally handle an open shove but you may just want to avoid variance and only call with the nuts.


Top
 Profile  
 
PostPosted: Tue Feb 04, 2014 1:15 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Did you ever find a solution for this, OneDayItllWork?


Top
 Profile  
 
PostPosted: Fri Mar 07, 2014 10:32 pm 
Offline
Regular Member
User avatar

Joined: Sat May 25, 2013 7:36 am
Posts: 73
Is it really an issue?

Let's continue where OneDay stopped. We are left with a defending range for calling 200BB open shoves. This range is 88,AT+. Our opponent knows that and he knows some (obviously very strong hands) that crush this range and he decides to shove them all in

Fine. What size is our range? 88,AT+ consists of 106 combos (roughly 8÷ of our start-hands). For simplicity let's assume our range has 15÷ equity against each hand in his shoving range. So the ev of each of his hands in that spot is:

Ev(opp_hand) = p('we fold')*1.0BB + p('we call and lose')*200.0BB - p('we call and win')*200BB = 0.92*1.0BB + 0.85*0.08*200BB - 0.15*0.08*200BB = +12.12BB

But what happens now if he minbets? His range will be way weaker than our optimal strategy expects so our defending range will now crush his minbet-attacking range. Since there are way more hands that cannot be shoved but minbet and he does not have his very strong hands in this range (because he does not have an infinite amount of aces) he will likely lose more BBs than he gains by shoving his monsters. About this you have to trust the frequencies.

So you happily pay him off in a few spots to get your money with some interest back in other spots.

Now one can argue that he might put weaker hands in his open-shoving range and bluff us off, but he can't profitably, because ... Yeah, you don't want to run with 23o against 88+,AT+ ...

Just my two cents ...


Top
 Profile  
 
PostPosted: Sun Mar 09, 2014 1:54 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
I see your point, Nose. Ideally, you're right. The thing is (and the thing I missed when commenting on this before), strategies aren't really fully developed for "very" sub-optimal play. They're quickly eliminated from sampling/updates in CFRM. It could leave the strategy open to exploitation in those areas as they're essentially near random. You need DBR strategies, for those situations.


Top
 Profile  
 
PostPosted: Sun Mar 09, 2014 6:25 am 
Offline
Regular Member
User avatar

Joined: Sat May 25, 2013 7:36 am
Posts: 73
I am not developing 'deep stack' (100 BB +) strategies so I can only speculate about the algorithm converging too fast.

To give them names let's say SB open shoves and BB has to decide to call.

Let's look at the moment when SB eliminates his shoving range (means all buckets have 0 probability of open shoving)

In my understanding of CFRM this happens, when, in average, for each hand SB could be holding in that spot, he regrets to 100% not having played another action. This tells us, that BB must have come up with a defending-regret-strategy which is just strong enough to prevent SB from playing any of his hands more profitable by shoving than taking another action (ie. Minbetting).

This means, in case our bot is playing SB and gets KK (which a very suboptimal player might be tempted to just shove because his fishy opponent calls with hands like 88, TT, ... ) he (our bot) will extract more value from this hand than our opponent will - because that's what his experience represented by the frequencies tells him.

So we've come up with a (not too bad) defending-regret-strategy for BB. Over time (back to the learning process) this strategy will sum up in the average strategy (in ASS due to the train player's - when the train player is SB - exploration guarantee Epsilon) and in infinity the average strategy will be equal to the regret-strategy so I assume the defending strategy to be fully developed.

Or am I missing your point?


Top
 Profile  
 
PostPosted: Mon Mar 10, 2014 7:45 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Nose wrote:
In my understanding of CFRM this happens, when, in average, for each hand SB could be holding in that spot, he regrets to 100% not having played another action. This tells us, that BB must have come up with a defending-regret-strategy which is just strong enough to prevent SB from playing any of his hands more profitable by shoving than taking another action (ie. Minbetting).

It could also be due to variance. ;) SB's strategy could switch depending on how the remaining parts converge.


Top
 Profile  
 
PostPosted: Mon Mar 10, 2014 9:36 pm 
Offline
Regular Member
User avatar

Joined: Sat May 25, 2013 7:36 am
Posts: 73
Nasher wrote:
It could also be due to variance. ;)


Yeah, variance is the ultimate thought-terminant cliché. What I can argue against it is that:
A) this is highly unlikely to happen in the early stages of the game (branches of the tree with a short action sequence)
B) ASS has implemented a mechanism that ensures convergence up to a certain point (epsilon, beta and tau)

I basically see two options to enforce more stability
A) Increase the values of beta, epsilon and tau
B) Run several simulations and take the average of all strategies

But in the end there is only one certain thing in life ...

[EDIT] Nope, I just see option A does not help. Against variance there is only option B


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 41 posts ]  Go to page Previous  1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group