[Edit]: maybe I should add more about my setup:
- Rule based bot logic, many parameters that I try to optimize by evolution
- 2 bots fight about survival by playing 100k or more hands out of a pool of about 1M fixed hands. They play all seat permutations (e.g. 20 for 3vs3, or 6+6 for 5vs1_and_1vs5) and get a score for that.
[/Edit]
>>> @Skybot I definitely don't understand your post 100%.
My question is if I let 2 bots fight in a hand, if I should rollout the rest of the cards if they are all-in (so they get 25% and 75% of the pot for example). The discontinuity is there because if they are almost all-in I do not rollout, so the pot goes to 100% to the winner (maybe the lucky winner, that would just get 25% with rollout). If I could play a extreme large amount of hands the effect would not exists (but then also rolling out would not be needed).
Note: my bots see the same hand many times, because I must cache all expensive calculations so I have a fixed set of hands they train on. So all x generations they see a hand again.
>> - All-in is a legal bet so surely your simulation should play it. The evolutionary algorithm should learn when to go all-in and when not to.
They go all-in. Question is how the discontinuity influences those learned decisions. At the moment I train without rolling out on all-in to be on the save side.
>> - Why is it harder to evaluate all-in bets than another bet?
See previous.
>> - I see no good reason why there should be a discontinuity between 99% all in and 100% all in.
See first point. For very large number of hands yes. But I cannot play that many hands I think.
>> - Is the algorithm converging? You will see weird results if it isn't.
I just want to beat low stakes. So a good local minima is good enough for me. There is no total order, of course (so maybe Min1Bot > Min2Bot and Min2Bot > Min3Bot but Min1Bot< Min3Bot). But normally the bots get better during training for a very long time until they hit a local minima.
>> - Do evolutionary algorithms get caught in local minima?
Yes (at least with my settings). But I can force them out of it by random mutations/noise/playing fewer hands to have more variance. However, the size of the minima depends on the bot logic I try to optimize. Sadly I think with my current logic I have some very wide local minima so it is hard to get to good ones (e.g. my current bots like bluffing and slow-playing a little too much for my taste)
>> - Does your algorithm learn betting frequencies? ie Given the same situation will the resultant strategy always bet the same way? It shouldn't
My bots return a set of actions and percentages of which one will be randomly chosen according to the percentages. At training I evaluate all of them and weight the winnings.
>> - Does hero know villain's strategy? The algorithm could potentially be speeded up if it did.
No. During the evolution the bots play vs mutations of themselves. So they implicitly know the enemy will play at least similar to themselves. So I implicitly kind of search Nash EQ inside the current local minima with respect to the bots logic and the pool of fixed hands.
To play vs humans I have hundreds of bots at different local minima or that have proved strong, and I could pick one that should be good vs that human. As said, I'm happy with low stakes atm, even just picking a fixed one works if I don't pick a strange one (I play zoom, so most players do not recognize the leaks of my bots).