Poker-AI.org

Re: Very sub-optimal play

2014-03-17T12:53:09+00:00

I looked into a few spots and sort of caught this phenomenon. I think I one can live with it.

Column 1 means fold; Column 2 means call

Statistics: Posted by Nose — Mon Mar 17, 2014 12:53 pm

Re: Very sub-optimal play

2014-03-10T21:36:55+00:00

Nasher wrote:

It could also be due to variance.

Yeah, variance is the ultimate thought-terminant cliché. What I can argue against it is that:
A) this is highly unlikely to happen in the early stages of the game (branches of the tree with a short action sequence)
B) ASS has implemented a mechanism that ensures convergence up to a certain point (epsilon, beta and tau)

I basically see two options to enforce more stability
A) Increase the values of beta, epsilon and tau
B) Run several simulations and take the average of all strategies

But in the end there is only one certain thing in life ...

[EDIT] Nope, I just see option A does not help. Against variance there is only option B

Statistics: Posted by Nose — Mon Mar 10, 2014 9:36 pm

Re: Very sub-optimal play

2014-03-10T07:45:22+00:00

Nose wrote:

In my understanding of CFRM this happens, when, in average, for each hand SB could be holding in that spot, he regrets to 100% not having played another action. This tells us, that BB must have come up with a defending-regret-strategy which is just strong enough to prevent SB from playing any of his hands more profitable by shoving than taking another action (ie. Minbetting).

It could also be due to variance. SB's strategy could switch depending on how the remaining parts converge.

Statistics: Posted by cantina — Mon Mar 10, 2014 7:45 am

Re: Very sub-optimal play

2014-03-09T06:25:34+00:00

I am not developing 'deep stack' (100 BB +) strategies so I can only speculate about the algorithm converging too fast.

To give them names let's say SB open shoves and BB has to decide to call.

Let's look at the moment when SB eliminates his shoving range (means all buckets have 0 probability of open shoving)

In my understanding of CFRM this happens, when, in average, for each hand SB could be holding in that spot, he regrets to 100% not having played another action. This tells us, that BB must have come up with a defending-regret-strategy which is just strong enough to prevent SB from playing any of his hands more profitable by shoving than taking another action (ie. Minbetting).

This means, in case our bot is playing SB and gets KK (which a very suboptimal player might be tempted to just shove because his fishy opponent calls with hands like 88, TT, ... ) he (our bot) will extract more value from this hand than our opponent will - because that's what his experience represented by the frequencies tells him.

So we've come up with a (not too bad) defending-regret-strategy for BB. Over time (back to the learning process) this strategy will sum up in the average strategy (in ASS due to the train player's - when the train player is SB - exploration guarantee Epsilon) and in infinity the average strategy will be equal to the regret-strategy so I assume the defending strategy to be fully developed.

Or am I missing your point?

Statistics: Posted by Nose — Sun Mar 09, 2014 6:25 am

Re: Very sub-optimal play

2014-03-09T01:54:51+00:00

I see your point, Nose. Ideally, you're right. The thing is (and the thing I missed when commenting on this before), strategies aren't really fully developed for "very" sub-optimal play. They're quickly eliminated from sampling/updates in CFRM. It could leave the strategy open to exploitation in those areas as they're essentially near random. You need DBR strategies, for those situations.

Statistics: Posted by cantina — Sun Mar 09, 2014 1:54 am

Re: Very sub-optimal play

2014-03-07T22:32:03+00:00

Is it really an issue?

Let's continue where OneDay stopped. We are left with a defending range for calling 200BB open shoves. This range is 88,AT+. Our opponent knows that and he knows some (obviously very strong hands) that crush this range and he decides to shove them all in

Fine. What size is our range? 88,AT+ consists of 106 combos (roughly 8÷ of our start-hands). For simplicity let's assume our range has 15÷ equity against each hand in his shoving range. So the ev of each of his hands in that spot is:

Ev(opp_hand) = p('we fold')*1.0BB + p('we call and lose')*200.0BB - p('we call and win')*200BB = 0.92*1.0BB + 0.85*0.08*200BB - 0.15*0.08*200BB = +12.12BB

But what happens now if he minbets? His range will be way weaker than our optimal strategy expects so our defending range will now crush his minbet-attacking range. Since there are way more hands that cannot be shoved but minbet and he does not have his very strong hands in this range (because he does not have an infinite amount of aces) he will likely lose more BBs than he gains by shoving his monsters. About this you have to trust the frequencies.

So you happily pay him off in a few spots to get your money with some interest back in other spots.

Now one can argue that he might put weaker hands in his open-shoving range and bluff us off, but he can't profitably, because ... Yeah, you don't want to run with 23o against 88+,AT+ ...

Just my two cents ...

Statistics: Posted by Nose — Fri Mar 07, 2014 10:32 pm

Re: Very sub-optimal play

2014-02-04T01:15:02+00:00

Did you ever find a solution for this, OneDayItllWork?

Statistics: Posted by cantina — Tue Feb 04, 2014 1:15 am

Re: Very sub-optimal play

2013-08-10T18:35:09+00:00

I saw this formula dealing with when to open shove which makes sense:

http://www.reddit.com/r/poker/comments/ ... hove_math/

So generally an open shove UTG would assume that calling range of 66+,ATs+,KQs,AJo+. 77 is .41 which yields .79BB long term. That is pretty close to a coin flip but still +EV long term. 66 and AJo are at the low end of the range at .36. I am going to guess that is the breakeven point. It looks like the open shove call range would be anything above .41 or AQ+, 88+.

This would generally be the basis for countering any short stack strategy which is common (like 20-40bb). You could probably use this for any open shove from any position although you could call lighter based on his position as his shoving range is wider. The above is worst case scenario. Like an open shove from the button he would assume your calling range to be 55+,A9s+,KQs,ATo+,KQo so for your call to be profitable you could call maybe 77+, AJo+, ATs+.

I think that is how you would generally handle an open shove but you may just want to avoid variance and only call with the nuts.

Statistics: Posted by shalako — Sat Aug 10, 2013 6:35 pm

Re: Very sub-optimal play

2013-07-19T23:38:19+00:00

trojanrabbit wrote:

Here's a toy game with a dominated choice. Player 1 decides "A" or "B" and then both players play rock-paper-scissors. If player 1 picks A, the winner gets 1 point. If he picks B then player 1 gets 0.5 points for winning with rock and 1 point for other wins. Player 2 always gets 1 point for wins.

Obviously player 1 should never pick B but player 2 still needs an optimal strategy against it. If his strategy is suboptimal it may make it profitable for player 1 to pick B.

Tysen

it may, but player 2 doesn't need an optimal strategy. he just needs a strategy which doesn't suck enough, which i believe is OP's problem. optimal would be to pick more scissors when 1 has chosen B (if i'm thinking correctly), but playing uniformly 1/3, 1/3, 1/3 would be good enough and a nash equilibrium strategy.

because the dominated choice gets refuted quickly, the strategy against poor choices rarely converge to optimal, which is why you see stuff like calling 200bb shoves with 46s sometimes.

Statistics: Posted by somehomelessguy — Fri Jul 19, 2013 11:38 pm

Re: Very sub-optimal play

2013-07-19T22:40:15+00:00

Statistics: Posted by trojanrabbit — Fri Jul 19, 2013 10:40 pm

Re: Very sub-optimal play

2013-07-19T06:58:51+00:00

somehomelessguy wrote:

@OneDay
i don't know how your algorithm works, but that would not be an issue using CFRM. with that calling range big hands would start switching back to push, assuming we're still talking about 200bb deep.

It may switch back eventually - I'm not done much with it yet. As you can imagine, doing anything game theory related with 6max NL is a slow process!

Statistics: Posted by OneDayItllWork — Fri Jul 19, 2013 6:58 am

Re: Very sub-optimal play

2013-07-18T21:21:45+00:00

Nasher wrote:

It samples the op's action when it's not training that player, yes. But, otherwise, it touches every node. You train both players, yes?

i suppose the response to open push is "touched" when we are not training the player which responds to the open push, yes. but if the opponent never pushes it's never updated.

Statistics: Posted by somehomelessguy — Thu Jul 18, 2013 9:21 pm

Re: Very sub-optimal play

2013-07-18T11:48:46+00:00

somehomelessguy wrote:

IIRC external sampling samples the opponents actions, so if they do not contain the open push, it's not traversed.

It samples the op's action when it's not training that player, yes. But, otherwise, it touches every node. You train both players, yes?

Statistics: Posted by cantina — Thu Jul 18, 2013 11:48 am

Re: Very sub-optimal play

2013-07-18T12:39:13+00:00

Doing testing with my algo, trying to work out this behaviour. I think I have a problem here:

We start of with no idea how to play poker, all in shoves for high ranges work pretty well. As we tighten up the calling range the shoving hands move higher and higher. We eventually work out that we can produce more profit by playing the higher end of our range a little slower and stop the first in shoves PF with the top of our range. By now, the optimal strategy is folding almost everything to first in shoves, which is what we want.

Now, as we all know, the most complex hands to play are the marginal ones, so at the same time as we're working out how to play the top of the range, we're really struggling with the marginal hands. That combined with the fact we're also now folding everything to first in shoves leads to a strategy shift to shove marginal hands such as 55 / 66 / 77 / 88 / AJ / AT, as we see more profit forcing the folds than by trying to play them post flop. So the calling strategy then changes, we start calling all in shoves with 88 AT+ or whatever. First in shoves then become a bad idea again and get eradicated, never shifting up to the top of the range, leaving the response strategy as calling with 88 AT+.

What I can do about this, I have no idea... it's an interesting problem though. I'm guessing an issue with all chance sampled optimal strategy calculation methods.

Statistics: Posted by OneDayItllWork — Thu Jul 18, 2013 10:49 am

Re: Very sub-optimal play

2013-07-17T23:49:45+00:00

Nasher wrote:

External Sampling looks something like below. Notice the regret is always updated and every child node traversed regardless of strategy probability (when it's trainplayer's turn).

IIRC external sampling samples the opponents actions, so if they do not contain the open push, it's not traversed.

Statistics: Posted by somehomelessguy — Wed Jul 17, 2013 11:49 pm

Re: Very sub-optimal play

2013-07-17T22:28:43+00:00

External Sampling looks something like below. Notice the regret is always updated and every child node traversed regardless of strategy probability (when it's trainplayer's turn).

Code:

double train(trainplayer)
{

  s[] = normalizedPositive(regret);

  if (turn == trainplayer) {

    for (i=0; i++; i      u[i] = children[i].train(trainplayer);
      ev += u[i];
    }

    for (i=0; i++; i      regret[i] += u[i] - ev;
      cumulative_strategy[i] += s[i];
    }
 
    return ev;

  } else {

    a = Sample(s);
    return children[a].train(trainplayer);

  }

}

Code:

rootnode.train(0);
rootnode.train(1);

Statistics: Posted by cantina — Wed Jul 17, 2013 10:28 pm

Re: Very sub-optimal play

2013-07-17T21:01:03+00:00

OneDayItllWork wrote:

somehomelessguy wrote:

a strategy is a set of action probabilities for every situation that can occur. a nash equilibrium states that no player can alter their strategy to improve their EV.

during the self play convergence, openshoving 200bb will be tried until it's refuted. this could mean calling with AA 100% of the time, and calling with 27o 0.002% of the time. clearly it would be better if we only called with AA, but the response is still enough to refute the openpush. if we didn't have a response which refutes it, our opponent will be able to gain EV by using the openpush and it will start to use it during the self play iterations.

This makes sense. However, surely it is possible that we don't stop playing an action purely because of the opponent response, but we could stop playing an action because we have found a better alternative. That would still prevent the response from evolving into a sensible/optimal response to that action.

yes, if there's always a better alternative, he can't switch to the open push with any of his hands to gain EV, therefore the criteria for nash equilibrium holds. it is still possible that open push generates positive EV (or quite obviously, open pushing AA preflop is +EV), but that doesn't matter. he will earn more by raising small etc.

the response to open push which will evolve is one where none of the opponent hands can gain EV by switching to this action. this response is not necessarily "the best", but still enough to refute the open push. there does exist a dominating nash equilibrium which has the "best" response, but afaik no efficient algorithms exists to find that specific equilibrium.

Statistics: Posted by somehomelessguy — Wed Jul 17, 2013 9:01 pm

Re: Very sub-optimal play

2013-07-17T20:19:26+00:00

Can you think of a toy game that demonstrates the problem that we can solve with CFRM or FP?

Statistics: Posted by spears — Wed Jul 17, 2013 8:19 pm

Re: Very sub-optimal play

2013-07-17T19:43:32+00:00

somehomelessguy wrote:

Statistics: Posted by OneDayItllWork — Wed Jul 17, 2013 7:43 pm

Re: Very sub-optimal play

2013-07-17T15:44:53+00:00

@OneDay In support of your pov http://poker-ai.org/archive/www.pokerai ... 329#p41329

Statistics: Posted by spears — Wed Jul 17, 2013 3:44 pm

Re: Very sub-optimal play

2013-07-17T15:31:03+00:00

Statistics: Posted by somehomelessguy — Wed Jul 17, 2013 3:31 pm

Re: Very sub-optimal play

2013-07-17T15:21:27+00:00

spears wrote:

In CFRM and FP the action probabilities don't jump from their initial values to 0 or 1 in one iteration so I don't think what you describe actually happens. There has to be some advantage to such a slow method

Either way, we're still not going to know what the correct response is to a shove UTG.

Statistics: Posted by OneDayItllWork — Wed Jul 17, 2013 3:21 pm

Re: Very sub-optimal play

2013-07-17T14:55:18+00:00

Quote:

The theory is that we are unbeatable, so if someone plays our strategy, it's a draw, if not, they lose.

I don't know if this is relevant to the discussion, but this isn't quite true. There are lots of strategies that will draw with a NE strategy.

Also, I don't know if having more than 2 players affects things either.

Statistics: Posted by spears — Wed Jul 17, 2013 2:55 pm

Re: Very sub-optimal play

2013-07-17T14:49:01+00:00

Statistics: Posted by spears — Wed Jul 17, 2013 2:49 pm

Re: Very sub-optimal play

2013-07-17T14:10:40+00:00

Ok, so let's say an action is forced, and Mr_X shoves 32o UTG, then everyone folds to us on the BB. Our optimal response to that is to call with everything. If that's the only time the action is forced, then our response to call with everything will remain in place. While their initial move was definitely suboptimal, so it our response, and it'll never be updated.

Would this problem also occur with fictitious play / CFRM?

Statistics: Posted by OneDayItllWork — Wed Jul 17, 2013 2:10 pm

Re: Very sub-optimal play

2013-07-17T12:00:11+00:00

OneDayItllWork wrote:

If I was to force it to take an action, that still wouldn't be a realistic range for that action, so that node still wouldn't be solvable.

The pseudo optimal strategy needs to perform the action for us to have a pseudo optimal range on that node. If that action is never taken by the strategy, that node can't possibly have a range.

I think CFRM and Fictitious Play both force actions to be taken at least once in the way the action probabilities are initialised. As the algorithm runs the probability of unrealistic actions drops quickly, but the response action to the unrealistic action is still in place. If your algorithm is significantly different to CFRM or Fictitious Play then I can see you might have a problem, but without knowing more about the algorithm it is difficult to comment.

Anyway, another suggestion: just use the response of a "nearby" node

Statistics: Posted by spears — Wed Jul 17, 2013 12:00 pm

Re: Very sub-optimal play

2013-07-17T11:38:34+00:00

spears wrote:

Or do you mean actions that are never taken regardless of holding??

This

If I was to force it to take an action, that still wouldn't be a realistic range for that action, so that node still wouldn't be solvable.

The pseudo optimal strategy needs to perform the action for us to have a pseudo optimal range on that node. If that action is never taken by the strategy, that node can't possibly have a range.

Statistics: Posted by OneDayItllWork — Wed Jul 17, 2013 11:38 am

Re: Very sub-optimal play

2013-07-17T11:10:04+00:00

OneDayItllWork wrote:

But in 1 card poker there are no actions that are never taken. Every node therefore has an associated range and can be solved.

I think http://www.cs.cmu.edu/~ggordon/poker/ disagrees. player 1 never bets on the first round if he holds a 6

- Or do you mean actions that are never taken regardless of holding?
- Could you force your algorithm to take all actions at least once at the beginning of training?

Statistics: Posted by spears — Wed Jul 17, 2013 11:10 am

Re: Very sub-optimal play

2013-07-17T09:30:11+00:00

I've got a 1 card poker solver that I wrote. But in 1 card poker there are no actions that are never taken. Every node therefore has an associated range and can be solved.

I'm working 'outside the box' as they say here. Designing a new method of machine learning that specifically aims to (pseudo) solve complex high variance games with massive game trees. But I can't figure out in my head how to deal with this problem.

My eventual aim is to produce a decent 6 Max optimal bot.

Statistics: Posted by OneDayItllWork — Wed Jul 17, 2013 9:30 am

Re: Very sub-optimal play

2013-07-17T08:49:29+00:00

Maybe you could get an answer to your question by using amax's cfrm code to solve one card poker.

Statistics: Posted by spears — Wed Jul 17, 2013 8:49 am

Re: Very sub-optimal play

2013-07-16T21:52:17+00:00

why is that difficult to solve? You assume the worst until told otherwise as an unknown..so your range is something like AA-TT, AQ, AK

Oh..nevermind..if they are playing optimally how to do you solve it. I keep forgetting you guys are going for the NE which is way way over my head...

Statistics: Posted by shalako — Tue Jul 16, 2013 9:52 pm

Re: Very sub-optimal play

2013-07-16T21:32:08+00:00

I can only answer pragmatically -- look at probing CFRM.

If s(i) = 0 then there is no strategy update, but you still traverse the tree and update child nodes.

Statistics: Posted by cantina — Tue Jul 16, 2013 9:32 pm

Re: Very sub-optimal play

2013-07-16T20:48:05+00:00

This is purely hypothetical at the mo. I don't understand how we can solve a node that has no range associated with it.

Statistics: Posted by OneDayItllWork — Tue Jul 16, 2013 8:48 pm

Re: Very sub-optimal play

2013-07-16T20:39:40+00:00

What algorithm are you using, CS-CFRM?

Statistics: Posted by cantina — Tue Jul 16, 2013 8:39 pm

Re: Very sub-optimal play

2013-07-16T20:32:45+00:00

But what would the range of that action be?

Statistics: Posted by OneDayItllWork — Tue Jul 16, 2013 8:32 pm

Re: Very sub-optimal play

2013-07-16T19:09:01+00:00

You still probe the action....

Statistics: Posted by cantina — Tue Jul 16, 2013 7:09 pm

Re: Very sub-optimal play

2013-07-16T19:03:09+00:00

I understand that, but if an action is never taken, how do we attribute a range to that action to be able to solve it?

Statistics: Posted by OneDayItllWork — Tue Jul 16, 2013 7:03 pm

Re: Very sub-optimal play

2013-07-16T19:00:58+00:00

In the full game tree there are sub-optimal actions, there are all actions in the full game tree. I doesn't mean your strategy will converge to use them all, though.

Statistics: Posted by cantina — Tue Jul 16, 2013 7:00 pm

Re: Very sub-optimal play

2013-07-16T18:58:06+00:00

I can't get my head around how we'd solve that, even in the full game tree.

The situation in my head is this:
For us to solve it, we have to be able to put him on a range, but we assume everyone is playing optimally. As no optimal player makes that move, there's no range we can attribute to that action, and therefore it can't be solved.

Please explain where my thought process is going wrong.

Statistics: Posted by OneDayItllWork — Tue Jul 16, 2013 6:58 pm

Re: Very sub-optimal play

2013-07-16T17:38:38+00:00

Well, if you solved the full game, it would account for that opponent action, albeit suboptimal. However, for abstract games, you would do state translation if a branch doesn't exist in your game.

Statistics: Posted by cantina — Tue Jul 16, 2013 5:38 pm

Very sub-optimal play

2013-07-16T16:41:34+00:00

After many years of exploitive bots, I thought I'd have a play with optimal bots. I have yet to produce a bot worth shouting about, let alone unleash on the world, so this is a hypothetical question.

So, optimal bots are produced by some form of strategy convergence. That is, they devise strategies to beat an existing strategy until they can't improve any more. That produces some form of game tree which tells us what to do in each situation. The theory is that we are unbeatable, so if someone plays our strategy, it's a draw, if not, they lose.

So let's say I've build my 400TB game tree for full ring NL, we sit down in EP, UTG fires off a 200BB open shove.

We take a look at our game tree to work out what we should do, but this action is so sub-optimal, that no node exists. It's not something we'd ever do, and therefore we don't know how to respond to it. How do we deal with this kind of situation if an optimal bot is unleashed on the real world?

Statistics: Posted by OneDayItllWork — Tue Jul 16, 2013 4:41 pm