First of, thanks for all your replies, I really appreciate it. To avoid any misunderstanding: I'm not the author, I am merely considering to reproduce their work and build from there.
Quote:
As far as I can tell, none of your tests check what would happen if your opponent adapted to you. It is possible that they could adapt to you quicker than you adapt to them so you would always lose. Has this player ever played against humans?
So there is no specific test for this, and the player has never played against humans. However there is one bot (opponent) which may give an idea of how the player (ASHE) would react to a changing strategy: the 'random gambler [RG]' that 'randomly switches from other highly exploitable strategies every 50 hands'. In one of the test the player is trained against all the highly exploitable strategies, and then plays vs RG. The authors observe:
"[...] ASHE did not switch its strategy as RG did after every 50 hands; rather, as more hands were played, ASHE reached a strategy that exploited RG based on the overall distribution of its moves across the highly exploitable strategies it might randomly choose. That strategy, while being close to static after a few hundreds of hands, was much more effective in exploiting RG compared to SB's approximated equilibrium strategy."
My intuition is that, with an LSTM, the bot may react, potentially quickly, to past actions. E.g. if an opponent raised the player the last 4 hands, it will be in memory, and the player can already (slightly) change his strategy according to that (especially if he has already encountered an opponent raising so often). This might be a very wrong intuition ; believing in NN magic
. Also about exploitability: the player has a pure strategy (deterministic w.r.t. its inputs), but since it is keeping in memory the past states, and using them as inputs, it will never be in the same state twice and it would be very difficult to predict its actions.
Quote:
[...] to model opponent strategies: I think this can be done by simply counting action frequencies and showdown hand frequencies and then using some statistical logic to deduce the opponent strategy.
Though I believe that the statistical approach could model the opponent accurately, I don't see it efficiently adapting to a changing strategy, or an opponent trying to exploit the player. The way I see it, it would require to keep another set of stats with less memory (e.g. 50, 200 or 1000 hands), with the trade-off between noise (variance) and reaction speed never being really satisfying.
Quote:
I have read these papers and the whole dissertation on that subject a while ago and I am rather skeptical. I also asked the author some questions but never got a reply. I also don't think the results vs Slumbot are significant as there was a "defective" version of slumbot running at ACPC 2017
Thanks for the info. I am currently also trying to reach him. This could be a red flag. Though I will still consider reproducing the work to verify its validity. It might just be a hidden gem.
Quote:
This agent has pretty unusual playing stats that make me believe that it would lose to all halfway solid Nash Agents
That makes sense as the player is focused on exploiting and has no notion of Nash equilibria. It would need much more training and complexity to compete against NE agents I think. Do you think that most players online would be halfway solid Nash agents (honest question)?
Quote:
At the beginning i was working a lot on modeling opponents using population statistics, showdowns, bluff frequencies etc. It is perfect approach to find opponent weaknesses, but whenever you exploit opponent you are open for being exploited yourself. Humans are far more better in switching strategies [...]
This is the problem I am trying to explore / alleviate. Can I ask at what stakes you are? I have the feeling that at micro (and even low) stakes, many players barely adapt.
To give some context about me: I have played poker a few years, and I have worked on AI / ML for a while. I am new to the poker AI and botting scene though. I would like to develop a bot that plays 6max at easy levels (micro stacks), and especially I would like it to focus on exploiting weak players. Hopefully I will minimize my losses vs strong players. I feel like NE has been researched a lot, and it will be costly to arrive to a reasonably good result (correct me if I'm wrong).
I currently have the first building blocks set up: fast hand (and equity) evaluator, local bot vs bot simulations, 'api' to play online, and a tight-aggressive rule-based bot, reasonable hand and state abstraction. So I'm searching for a direction to develop an AI that will suit me. I'm open to any advice / suggestion. By the way I've learnt a lot through this forum and will try to post when I have something.