Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 5:35 pm

All times are UTC




Post new topic Reply to topic  [ 10 posts ] 
Author Message
PostPosted: Fri Jul 03, 2015 5:53 pm 
Offline
Junior Member

Joined: Sat Apr 26, 2014 7:29 am
Posts: 34
Hi,

I wonder if you guys have any thoughts about if it is smart to do a rollout of all remaining cards on a all-in of all active players for evolution based bots during the evolution (for 6-max in my case)?

I implemented it but at the moment I don't use it because I'm not sure if it is a good idea. One point is that I train on fixed hands (to cache all hand related calculations). I'm afraid that the bots learn to go all-in too often (because then they get what their hand is worth, without going all-in they depend on the following cards or lose everything if folding).

With all-in rollout I think I get a kind of (potentially bad?) discontinuity. E.g. what is won or even who wins changes abruptly for the same hand if it is a large pot (almost all-in) compared to if it is a all-in. Since my bots see the same hand many times I'm not sure if they learn something smart from that.

It is hard for me to evaluate that without training 2 sets of bots for months...

I already play all possible bot actions (so on 30% call, 50% raise half pot, 20% raise pot I do all of them and weight the winnings), compared to that all-in rollout is cheap. But on first tests the all-in rollout bots seemed weaker (however, since bred from normal bots for a short time that does not say much).


Top
 Profile  
 
PostPosted: Fri Jul 03, 2015 6:49 pm 
Offline
Veteran Member

Joined: Mon Mar 04, 2013 9:40 pm
Posts: 269
Hey,

I think I understand what your trying to say but not exactly sure. One thing I didn't see in your post is that stacking off really entirely depends on the effective stack SPR so there is really not much simulation or math that is needed? There are really only two situations to cover obviously. Either you jam on the villain or the villain jams on you. Both are totally dependent on SPR and calling or folding should be based on SOEQ or the minimum equity required to jam or call. All the really matters is that your playing to the effective stack and any all-in situation will play themselves.

What you dont want to do is put yourself into a low SPR situation and be forced to fold realizing zero equity. Your just better off folding. An example situation to avoid is HU vs short stacker preflop raise. You will have to call the short stacker with a much tighter range as your likely to face a jam a large portion of the time post flop. This requires at least one equity calculation to determine if the preflop call will be profitable. All your doing is basically determining the SPR and SOEQ on flop to see if you will have more equity then SOEQ on average. If so you can make the call and on average you will have enough equity to call off the short stackers more then likely jam.

Hope this helps.


Top
 Profile  
 
PostPosted: Sat Jul 04, 2015 4:47 pm 
Offline
Junior Member

Joined: Sat Apr 26, 2014 7:29 am
Posts: 34
shalako wrote:
Hey,

I think I understand what your trying to say but not exactly sure. One thing I didn't see in your post is that stacking off really entirely depends on the effective stack SPR so there is really not much simulation or math that is needed? There are really only two situations to cover obviously. Either you jam on the villain or the villain jams on you. Both are totally dependent on SPR and calling or folding should be based on SOEQ or the minimum equity required to jam or call. All the really matters is that your playing to the effective stack and any all-in situation will play themselves.

What you dont want to do is put yourself into a low SPR situation and be forced to fold realizing zero equity. Your just better off folding. An example situation to avoid is HU vs short stacker preflop raise. You will have to call the short stacker with a much tighter range as your likely to face a jam a large portion of the time post flop. This requires at least one equity calculation to determine if the preflop call will be profitable. All your doing is basically determining the SPR and SOEQ on flop to see if you will have more equity then SOEQ on average. If so you can make the call and on average you will have enough equity to call off the short stackers more then likely jam.

Hope this helps.


Thanks.
My question is not about when to go all-in. It is if I should do a rollout do determine the winner of a hand when 2 bots battle during evolution. Because the winner/winnings for the same hand change dramatically between all-in rollout and large pot hand without rollout (that would not happen in a normal game, where the winner is the same player and he gets the whole pot).

It seems clear that according to the law of large numbers I can do all-in rollout (same justification as for your answer; however with that many hands I would probably not need to rollout...). If I play enough hands. But with the about 100k hands (or 1.2M or 2M with permutations) I play to find the winner it looks like I'm far away from "large numbers" (meaning rollout or no rollout does have a significant influence on the winnings).

All that probably means I should play many more hands to find a winner. So many that rollout and no rollout has no significant difference. Maybe I should check in which order of magnitude that is. But I can probably not play that many hands in training for each fight :-(


Top
 Profile  
 
PostPosted: Sat Jul 04, 2015 7:42 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
@Skybot I definitely don't understand your post 100%.

- All-in is a legal bet so surely your simulation should play it. The evolutionary algorithm should learn when to go all-in and when not to.
- Why is it harder to evaluate all-in bets than another bet?
- I see no good reason why there should be a discontinuity between 99% all in and 100% all in.
- Is the algorithm converging? You will see weird results if it isn't.
- Do evolutionary algorithms get caught in local minima?
- Does your algorithm learn betting frequencies? ie Given the same situation will the resultant strategy always bet the same way? It shouldn't
- Does hero know villain's strategy? The algorithm could potentially be speeded up if it did.


Top
 Profile  
 
PostPosted: Sat Jul 04, 2015 10:38 pm 
Offline
Junior Member

Joined: Sat Apr 26, 2014 7:29 am
Posts: 34
[Edit]: maybe I should add more about my setup:
- Rule based bot logic, many parameters that I try to optimize by evolution
- 2 bots fight about survival by playing 100k or more hands out of a pool of about 1M fixed hands. They play all seat permutations (e.g. 20 for 3vs3, or 6+6 for 5vs1_and_1vs5) and get a score for that.
[/Edit]

>>> @Skybot I definitely don't understand your post 100%.
My question is if I let 2 bots fight in a hand, if I should rollout the rest of the cards if they are all-in (so they get 25% and 75% of the pot for example). The discontinuity is there because if they are almost all-in I do not rollout, so the pot goes to 100% to the winner (maybe the lucky winner, that would just get 25% with rollout). If I could play a extreme large amount of hands the effect would not exists (but then also rolling out would not be needed).
Note: my bots see the same hand many times, because I must cache all expensive calculations so I have a fixed set of hands they train on. So all x generations they see a hand again.

>> - All-in is a legal bet so surely your simulation should play it. The evolutionary algorithm should learn when to go all-in and when not to.
They go all-in. Question is how the discontinuity influences those learned decisions. At the moment I train without rolling out on all-in to be on the save side.

>> - Why is it harder to evaluate all-in bets than another bet?
See previous.

>> - I see no good reason why there should be a discontinuity between 99% all in and 100% all in.
See first point. For very large number of hands yes. But I cannot play that many hands I think.

>> - Is the algorithm converging? You will see weird results if it isn't.
I just want to beat low stakes. So a good local minima is good enough for me. There is no total order, of course (so maybe Min1Bot > Min2Bot and Min2Bot > Min3Bot but Min1Bot< Min3Bot). But normally the bots get better during training for a very long time until they hit a local minima.

>> - Do evolutionary algorithms get caught in local minima?
Yes (at least with my settings). But I can force them out of it by random mutations/noise/playing fewer hands to have more variance. However, the size of the minima depends on the bot logic I try to optimize. Sadly I think with my current logic I have some very wide local minima so it is hard to get to good ones (e.g. my current bots like bluffing and slow-playing a little too much for my taste) :-(

>> - Does your algorithm learn betting frequencies? ie Given the same situation will the resultant strategy always bet the same way? It shouldn't
My bots return a set of actions and percentages of which one will be randomly chosen according to the percentages. At training I evaluate all of them and weight the winnings.

>> - Does hero know villain's strategy? The algorithm could potentially be speeded up if it did.
No. During the evolution the bots play vs mutations of themselves. So they implicitly know the enemy will play at least similar to themselves. So I implicitly kind of search Nash EQ inside the current local minima with respect to the bots logic and the pool of fixed hands.
To play vs humans I have hundreds of bots at different local minima or that have proved strong, and I could pick one that should be good vs that human. As said, I'm happy with low stakes atm, even just picking a fixed one works if I don't pick a strange one (I play zoom, so most players do not recognize the leaks of my bots).


Top
 Profile  
 
PostPosted: Sun Jul 05, 2015 11:14 am 
Offline
Junior Member

Joined: Thu Nov 14, 2013 2:56 pm
Posts: 12
It sounds like your evolutionary algorithm is exploiting the fact that choosing to go all-in often maximizes variance, and then this in turn gives these members of the population a good chance of coming out on top due to random sampling?

When you say "rollout" do you mean:

1) You are doing just a single rollout of the remaining cards and allocating the pot accordingly.
2) You are doing several monte-carlo rollouts of the remaining cards and averaging to get a final EV/equity expectation to allocate.
3) You traversing the tree of all possible remaining rollouts and averaging to get a final EV/equity expectation to allocate.

If you are currently doing (1) then try (2) or (3) depending on how much processor time you can afford (or how well you can pre-compute lookups...). This should significantly reduce the variance of these all-in situations.

If this doesn't work, then you could consider penalizing the fitness function of all-in plays by reducing the allocated EV/equity somehow (you could try just a fixed reduction value or possibly even something based off the expected variance itself, etc).

Juk :)


Top
 Profile  
 
PostPosted: Sun Jul 05, 2015 1:40 pm 
Offline
Junior Member

Joined: Sat Apr 26, 2014 7:29 am
Posts: 34
jukofyork wrote:
When you say "rollout" do you mean:

1) You are doing just a single rollout of the remaining cards and allocating the pot accordingly.
2) You are doing several monte-carlo rollouts of the remaining cards and averaging to get a final EV/equity expectation to allocate.
3) You traversing the tree of all possible remaining rollouts and averaging to get a final EV/equity expectation to allocate.


I'm doing 1) if I understand you right. So on all-in on flop or turn, I rollout all remaining cards. I enumerate all possibilities, (remainingCards choose 1) or (remainingCards choose 2) evaluations. I get basically no preflop all-in with my bots (except on AA vs AA). So that I handle that like all-in on flop should not be a problem (get the regular flop and rollout the rest).
I don't see how doing 2) or 3) gives different results. Wouldn't the 2) and 3) approximate what I get with 1)? Except for preflop all-in of course.

Or do you mean not just rolling out on all-in, but for not-all-in too? I cannot give the bots all possible river cards and see what they would do. Decisions based on different cards are expensive in my approach (because I have to cache calculations for each card).

jukofyork wrote:
It sounds like your evolutionary algorithm is exploiting the fact that choosing to go all-in often maximizes variance, and then this in turn gives these members of the population a good chance of coming out on top due to random sampling?

Not sure I get that. With rollout the all-in minimizes the variance. Because they no longer have the variance of the remaining cards. So yes, for the bots without rollout, they see more variance in the all-in case compared to the rollout bots during training. But I don't see how they can exploit that, if I compare the 2 approaches I let the bots run on a other set of hands, so the larger variance could equally likely bite them in the ass. So intuitively I would say the rollout bots should have an advantage if I use rollout on comparing the 2, because they expect the lower variance they will see? But they did not seem to (however, as said earlier the tests don't say much).

It is a good point. I probably need to get many more validation hand sets to see how large the variance for fights between the 2 approaches is.

@all: thanks guys. I see I need more detailed data. I think I need more validation hand sets to measure the variance for fights between the 2 approaches. And I probably need a larger fixed hand pool used for training. And I should probably check the effect of the very few preflop all-in.
I probably have to train bots with both approaches for some weeks to do a better comparison. Even then could still be that one approach just luckily found a better local minima :-(


Top
 Profile  
 
PostPosted: Sun Jul 05, 2015 2:43 pm 
Offline
Junior Member

Joined: Thu Nov 14, 2013 2:56 pm
Posts: 12
SkyBot wrote:
So on all-in on flop or turn, I rollout all remaining cards. I enumerate all possibilities, (remainingCards choose 1) or (remainingCards choose 2) evaluations.

Ah, ignore my post then (when you said "rollout" I wasn't sure if you were just picking 1/2 cards for river/turn respectively and then doing a single evaluation for whatever came).

Juk :)


Top
 Profile  
 
PostPosted: Mon Jul 06, 2015 4:59 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
Ok, I think I understand now. For an all-in you calculate each player's equity on, say, the flop, and allocate his winnings accordingly. For non all-in bets you play the cards and see who actually wins. So you see much higher variance in non-all-in bets. Correct? (If so, my and your understanding of roll out is opposite!)

So you could:
1. Run more hands so that the all-in case and the non-all-in case are closer
2. Play the cards for the all-in case so at least it is the same as the non all-in case
3. Choose the hands that you play so that the equity as calculated on the flop is actually realised.

1 and 2 are too slow. 3 is hard to figure out how to do.


Top
 Profile  
 
PostPosted: Mon Jul 06, 2015 6:49 pm 
Offline
Junior Member

Joined: Sat Apr 26, 2014 7:29 am
Posts: 34
spears wrote:
Ok, I think I understand now. For an all-in you calculate each player's equity on, say, the flop, and allocate his winnings accordingly. For non all-in bets you play the cards and see who actually wins. So you see much higher variance in non-all-in bets. Correct? (If so, my and your understanding of roll out is opposite!)

So you could:
1. Run more hands so that the all-in case and the non-all-in case are closer
2. Play the cards for the all-in case so at least it is the same as the non all-in case
3. Choose the hands that you play so that the equity as calculated on the flop is actually realised.

1 and 2 are too slow. 3 is hard to figure out how to do.


Thank you.
1 seems like the easy fix, but hard trade off vs number of generations I can create in a given time.
I think 2 is what I do atm to fix the problem. Except that by saying it is slower you maybe mean the opposite. I currently play just the normal board cards on all-in, this is much faster than what I call rollout (enumerate all). Because I actually draw all cards on rollout compared to just drawing 1 or 2. If you mean the opposite, so enumerate all in non-all-in case, yes much too expensive.

I thought about 3 at earlier stages (because I only have resources to play few hands, I wanted a representative sample of all possible hands). It seemed like a hard problem so I stayed at my randomly generated hand pools. But I should maybe check the literature again if I find anything, this is far past my own knowledge/skill.

I currently think about changing the fitness function (atm just who wins more money). The idea is penalizing losses (e.g. subtract an extra 10% of the money lost on loss). I feel like that should reduce variance in general. But does it make the rollout problem worse (unlucky loss punished more)? I have to think about it more.
But that would also make my bots more passive, and that would not be bad for my current generation.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC


Who is online

Users browsing this forum: Google [Bot] and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group