Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 2:31 pm

All times are UTC




Post new topic Reply to topic  [ 16 posts ] 
Author Message
PostPosted: Sat May 18, 2019 6:43 pm 
Offline
New Member

Joined: Sun Mar 10, 2019 10:22 pm
Posts: 6
Dynamic Adaptation and Opponent Exploitation in Computer Poker
by: Xun Li and Risto Miikkulainen

Abstract
As a classic example of imperfect information games, Heads-Up No-limit Texas Holdem (HUNL), has been studied extensively in recent years. While state-of-the-art approaches based on Nash equilibrium have been successful, they lack the ability to model and exploit opponents effectively. This paper presents an evolutionary approach to discover opponent models based Long Short Term Memory neural networks and on Pattern Recognition Trees. Experimental results showed that poker agents built in this method can adapt to opponents they have never seen in training and exploit weak strategies far more effectively than Slumbot 2017, one of the cutting-edge Nash-equilibrium-based poker agents. In addition, agents evolved through playing against relatively weak rule-based opponents tied statistically with Slumbot in heads-up matches. Thus, the proposed approach is a promising new direction for building high-performance adaptive agents in HUNL and other imperfect information games.

http://nn.cs.utexas.edu/downloads/papers/xun.aaai18.pdf


Top
 Profile  
 
PostPosted: Sat May 18, 2019 6:55 pm 
Offline
New Member

Joined: Sun Mar 10, 2019 10:22 pm
Posts: 6
I would be happy to have opinions on this paper. Especially on the use of LSTMs and a genetic algorithm. Though there seems to be more work on convolutional networks, LSTMs make sense in my opinion as poker is a sequence after all (of actions or hands). The results of this paper are quite promising as well.

For those interested, there is a slightly older paper of the same authors with a simpler architecture and training process: http://nn.cs.utexas.edu/downloads/papers/xun.aaai17.pdf


Top
 Profile  
 
PostPosted: Mon May 20, 2019 8:29 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
Quote:
I would be happy to have opinions on this paper.


I've only spent about 15 minutes reading this. As far as I can tell, none of your tests check what would happen if your opponent adapted to you. It is possible that they could adapt to you quicker than you adapt to them so you would always lose. Has this player ever played against humans? My long term aim is to produce a NE player that can then be varied slightly to exploit obvious opponent weaknesses, without also introducing large exploitabilities in itself.

I'm not convinced that you require LSTMs to model opponent strategies: I think this can be done by simply counting action frequencies and showdown hand frequencies and then using some statistical logic to deduce the opponent strategy. If I remember correctly this was the approach in Southey's paper. You do need to abstract hands and boards though, or you will never gather sufficient data. You will not get sufficient data from a single opponent to deduce his entire strategy in a realistic timescale, so you have to find how close he is to opponents you have encountered before on whom you have more data. I think this was the approach used by Ponsen.

Once you have the opponent strategy you can find the best response against it, and then mix this with your NE strategy for safety. My experiments on toy games suggest that you can exploit a lot, without becoming too exploitable yourself.


Top
 Profile  
 
PostPosted: Mon May 20, 2019 2:36 pm 
Offline
New Member

Joined: Mon May 20, 2019 2:21 pm
Posts: 6
I have read these papers and the whole dissertation on that subject a while ago and I am rather skeptical. I also asked the author some questions but never got a reply. I also don't think the results vs Slumbot are significant as there was a "defective" version of slumbot running at ACPC 2017 which was highly exploitable to river aggression. This version of slumbot even lost to Viliam Lisý's Simple Rule Agent.

There was a participant called ASHE in the 2017 ACPC Championship that finished 7th out of 15. This agent has pretty unusual playing stats that make me believe that it would lose to all halfway solid Nash Agents (and it did, in fact, lose quite significantly to places 1-6 during this competition).

Now, if you publish a paper you obviously want the results to be positive and I doubt a lot of the folks reading that paper had enough background knowledge to ask the right questions. I am fairly certain the results achieved by ASHE vs Slumbot are vs the defective version of slumbot which makes them totally meaningless.


Top
 Profile  
 
PostPosted: Tue May 21, 2019 2:43 am 
Offline
Senior Member

Joined: Fri Nov 25, 2016 10:42 pm
Posts: 122
"Once you have the opponent strategy you can find the best response against it, and then mix this with your NE strategy for safety. My experiments on toy games suggest that you can exploit a lot, without becoming too exploitable yourself."

At the beginning i was working a lot on modeling opponents using population statistics, showdowns, bluff frequencies etc. It is perfect approach to find opponent weaknesses, but whenever you exploit opponent you are open for being exploited yourself. Humans are far more better in switching strategies, tricking bot in different ways, so the best approach for bot is to play base nash equilibrium strategy and eventually slightly exploit opponent so that humans can hardly notice it.
E.g. if you see that human folds on river 55%, every bluff is profitable, but it might make sense to increase bluffs just a bit like 33% more bluffs which are very hard to recognize by any human.

If you try to make more extrem exploitative approach which is easier detectable by humans, it is very hard that any AI would be able anytime soon to be better than humans in detecting adaptions. Humans could especially trick bots, start by bluff raising a lot e.g. on flop, bot would quickly collect stats that raise stat is too high,
and would adjust by folding much less, betting with less bluffs, 3bet bluff etc, and human can switch back to balanced or value heavy strategy, so it would take time until average raise stat gets normal again.


Top
 Profile  
 
PostPosted: Tue May 21, 2019 4:14 pm 
Offline
New Member

Joined: Sun Mar 10, 2019 10:22 pm
Posts: 6
First of, thanks for all your replies, I really appreciate it. To avoid any misunderstanding: I'm not the author, I am merely considering to reproduce their work and build from there.

Quote:
As far as I can tell, none of your tests check what would happen if your opponent adapted to you. It is possible that they could adapt to you quicker than you adapt to them so you would always lose. Has this player ever played against humans?

So there is no specific test for this, and the player has never played against humans. However there is one bot (opponent) which may give an idea of how the player (ASHE) would react to a changing strategy: the 'random gambler [RG]' that 'randomly switches from other highly exploitable strategies every 50 hands'. In one of the test the player is trained against all the highly exploitable strategies, and then plays vs RG. The authors observe:
"[...] ASHE did not switch its strategy as RG did after every 50 hands; rather, as more hands were played, ASHE reached a strategy that exploited RG based on the overall distribution of its moves across the highly exploitable strategies it might randomly choose. That strategy, while being close to static after a few hundreds of hands, was much more effective in exploiting RG compared to SB's approximated equilibrium strategy."
My intuition is that, with an LSTM, the bot may react, potentially quickly, to past actions. E.g. if an opponent raised the player the last 4 hands, it will be in memory, and the player can already (slightly) change his strategy according to that (especially if he has already encountered an opponent raising so often). This might be a very wrong intuition ; believing in NN magic ;). Also about exploitability: the player has a pure strategy (deterministic w.r.t. its inputs), but since it is keeping in memory the past states, and using them as inputs, it will never be in the same state twice and it would be very difficult to predict its actions.

Quote:
[...] to model opponent strategies: I think this can be done by simply counting action frequencies and showdown hand frequencies and then using some statistical logic to deduce the opponent strategy.

Though I believe that the statistical approach could model the opponent accurately, I don't see it efficiently adapting to a changing strategy, or an opponent trying to exploit the player. The way I see it, it would require to keep another set of stats with less memory (e.g. 50, 200 or 1000 hands), with the trade-off between noise (variance) and reaction speed never being really satisfying.

Quote:
I have read these papers and the whole dissertation on that subject a while ago and I am rather skeptical. I also asked the author some questions but never got a reply. I also don't think the results vs Slumbot are significant as there was a "defective" version of slumbot running at ACPC 2017

Thanks for the info. I am currently also trying to reach him. This could be a red flag. Though I will still consider reproducing the work to verify its validity. It might just be a hidden gem.

Quote:
This agent has pretty unusual playing stats that make me believe that it would lose to all halfway solid Nash Agents

That makes sense as the player is focused on exploiting and has no notion of Nash equilibria. It would need much more training and complexity to compete against NE agents I think. Do you think that most players online would be halfway solid Nash agents (honest question)?

Quote:
At the beginning i was working a lot on modeling opponents using population statistics, showdowns, bluff frequencies etc. It is perfect approach to find opponent weaknesses, but whenever you exploit opponent you are open for being exploited yourself. Humans are far more better in switching strategies [...]

This is the problem I am trying to explore / alleviate. Can I ask at what stakes you are? I have the feeling that at micro (and even low) stakes, many players barely adapt.

To give some context about me: I have played poker a few years, and I have worked on AI / ML for a while. I am new to the poker AI and botting scene though. I would like to develop a bot that plays 6max at easy levels (micro stacks), and especially I would like it to focus on exploiting weak players. Hopefully I will minimize my losses vs strong players. I feel like NE has been researched a lot, and it will be costly to arrive to a reasonably good result (correct me if I'm wrong).
I currently have the first building blocks set up: fast hand (and equity) evaluator, local bot vs bot simulations, 'api' to play online, and a tight-aggressive rule-based bot, reasonable hand and state abstraction. So I'm searching for a direction to develop an AI that will suit me. I'm open to any advice / suggestion. By the way I've learnt a lot through this forum and will try to post when I have something.


Top
 Profile  
 
PostPosted: Tue May 21, 2019 5:42 pm 
Offline
New Member

Joined: Mon May 20, 2019 2:21 pm
Posts: 6
Quote:
That makes sense as the player is focused on exploiting and has no notion of Nash equilibria. It would need much more training and complexity to compete against NE agents I think. Do you think that most players online would be halfway solid Nash agents (honest question)?


Yeah, but the stats are so far off that ASHE should be totally exploitable vs even the weakest opponents. From a quick look, the training resulted in a hyper-aggressive play and it doesn't even seem to be adapting to opponents very much.

Also, I don't think good online players nowadays make that many clear mistakes. Actually, a lot of decisions are pretty close anyway when it comes to EV in an estimated equilibrium. Judging from what I can see in the stats, I am pretty sure every halfway decent player could absolutely destroy the version of ASHE that participated in the ACPC 2017 (not sure what version that was though and if it's actually the bot from the paper).

ASHE might be very effective in exploiting super weak strategies but I don't think any player in today's online games is as terrible as the random gambler from the paper. Again, I understand why the author has structured the paper that way, but in my opinion, this approach is in no way as strong as the paper suggests.

Happy to be proven wrong though and if you really want to go forward in implementing this and are looking for a strong nash agent to test, let me know :)


Top
 Profile  
 
PostPosted: Wed May 22, 2019 1:09 am 
Offline
Senior Member

Joined: Fri Nov 25, 2016 10:42 pm
Posts: 122
Quote:
This is the problem I am trying to explore / alleviate. Can I ask at what stakes you are? I have the feeling that at micro (and even low) stakes, many players barely adapt.


my bot plays up to 5$/10$ blinds, no limit holdem ring games and tournaments, but on sites which allow my bots to play and give 100% rakeback so with so good rakeback it is much easier to win.
Before we got most online poker rooms populated by bots, yes micro players were slower at adaption. Today poker got so much harder that even micro stake players (regulars) notice if you cbet vs them with high frequency or 3bet etc. Most Players at mid and high stakes play gto strategy which they learn from solvers, and vs fish players they play exploitative.

E.g. my bot overbet bluffs a lot on turn/river where board runout is bad for opponent, and his range is capt (weak). Population folds around 80% vs 2x overbets. At some point a player hits strong hand and calls, and sees bluff. That way he can adjust to call much looser. For human it is enough to see one showdown to make logical conclusion. That is why my bot looks at all hands from opponent how often it folded vs overbet, but also looks in last 10 samples if it made a call with weak top pair or mid pair (light call). If it did it means he most likely adjusted even though his average fold stat could be still high. Then my bot starts overbeting with balanced bluff ratio, and after some time villain will see some of the time value hand some of the time bluff so it might start to fold too much again which will be again seen in last N samples.
So i think the best approach is looking at last N samples + looking at showdowns of those N samples and making conclusion (if villain made light call, or in the case when villain bets if he bluffed us with some pure bluff which would mean he is bluff heavy). So it is very similar what humans do, observe recent sample and make conclusions using showdowns.

Neural networks are good for many different things but poker is too complex game for neural network with current state of the art.


Last edited by mlatinjo on Wed May 22, 2019 1:25 am, edited 1 time in total.

Top
 Profile  
 
PostPosted: Wed May 22, 2019 1:22 am 
Offline
Senior Member

Joined: Fri Nov 25, 2016 10:42 pm
Posts: 122
Quote:
Happy to be proven wrong though and if you really want to go forward in implementing this and are looking for a strong nash agent to test, let me know :)


did you test your nash bot on any real money tables yet? I don't know anyone who won with 100% nash strategy, the rake is too high to play 100% nash strategy, need to exploit weak opponents to make some solid winrate. Other big issue is that on most of the poker sites your nash agent would face collusion and get destroyed completely (unexploitable strategy gets brutally exploitet). Not only that bots collude but also humans, on mid and high stakes especially


Top
 Profile  
 
PostPosted: Wed May 22, 2019 6:13 am 
Offline
New Member

Joined: Mon May 20, 2019 2:21 pm
Posts: 6
Indeed, the rake is an issue. I have played around 100k hands at 50NL (HU only) at around breakeven +0.5bb/100 and with ~14bb/100 in rake :lol: ... I assume it would get better at higher limits with rake decreasing (not sure by how much this effect would be negated by decreasing winrate though). Haven't tested anything higher than 50NL.

Collusion might be a problem at multiplayer games I am talking strictly HU though.


Top
 Profile  
 
PostPosted: Wed May 22, 2019 10:57 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
Quote:
I don't know anyone who won with 100% nash strateg


You do now. https://en.wikipedia.org/wiki/Libratus


Top
 Profile  
 
PostPosted: Thu May 23, 2019 1:42 pm 
Offline
New Member

Joined: Sun Mar 10, 2019 10:22 pm
Posts: 6
All right, so even low stakes players are reasonably good. I used to play a couple of years ago and was mostly in tournaments / SnG, so I guess it was much simpler than what we get in today's cash games.

There might be some more issues that come up when using full Nash strategies in multiplayer, such as:
Quote:
Third, in imperfect information games with more than two players and multiple equilibria, if the opponents are not following the same equilibrium as approximated by an equilibrium-based agent, the agent’s performance cannot be guaranteed (Ganzfried 2016).

Though in the future we could imagine a bot aware of multiple nash equilibria and picking in a smart way.
Thanks for mentioning Ponsen's approach earlier, I have to say it's very clean how he mixes general and opponent distribution.

Quote:
So i think the best approach is looking at last N samples + looking at showdowns of those N samples and making conclusion (if villain made light call, or in the case when villain bets if he bluffed us with some pure bluff which would mean he is bluff heavy).

That does sound like it could adapt quickly. Especially if we see that the opponent played a hand that is way off the hand range we estimated and react accordingly.

Nevertheless I decided to go ahead and implement this paper (or something close to it). My expectations are not too high but it will be fun to implement. On the long term I see it only as an opponent modeling technique that would be put in a larger architecture.

I'll get back to you once it's up and running, and might take that offer of testing vs a strong nash agent :)


Top
 Profile  
 
PostPosted: Thu May 23, 2019 9:58 pm 
Offline
Senior Member

Joined: Fri Nov 25, 2016 10:42 pm
Posts: 122
user456 wrote:
Indeed, the rake is an issue. I have played around 100k hands at 50NL (HU only) at around breakeven +0.5bb/100 and with ~14bb/100 in rake :lol: ... I assume it would get better at higher limits with rake decreasing (not sure by how much this effect would be negated by decreasing winrate though). Haven't tested anything higher than 50NL.

Collusion might be a problem at multiplayer games I am talking strictly HU though.


That seems to be a good result if you get a good rakeback deal it is excellent profit.


Top
 Profile  
 
PostPosted: Thu May 23, 2019 9:59 pm 
Offline
Senior Member

Joined: Fri Nov 25, 2016 10:42 pm
Posts: 122
spears wrote:
Quote:
I don't know anyone who won with 100% nash strateg


You do now. https://en.wikipedia.org/wiki/Libratus


I was referring to 6max, 9max tables.


Top
 Profile  
 
PostPosted: Tue Oct 01, 2019 2:37 pm 
Offline
New Member

Joined: Sun Mar 10, 2019 10:22 pm
Posts: 6
Hello, I'm sharing a report about some work that is closely related to this paper:
https://mega.nz/#!voNThaaQ!N8tfyIcp_7jb ... KDe_MSschg
This work will be continued but I thought I'd briefly share information.

In a few words: The lstm + genetic algorithm approach of ASHE was reused (with some tweaks). It was expanded for a 6max sit and go setting. The bot developed is trained and tested against various weak bots. The report then follows with a qualitative analysis of the strategy developed. One contribution of this work is the bot itself (6max sng agents being less explored); the other contribution is to obtain information from the bot's developed strategy on how to beat another (human readable) beginner strategy.

The main insights for those considering to follow a similar path are:
-the genetic algorithm / neuroevolution is effectively learning. The number of games played before estimating the fitness of an agent must be high enough, otherwise an all-in-or-fold strategy develops. Heuristically, the bare minimum is to play four times 6 games (one for each position) at a table to get a reasonable estimate of the agent's fitness. Games last about a 150 hands.
-as the opponents are almost beaten to the maximum, the neural network architecture (and the lstm approach) has not been seriously put to test (yet). No serious conclusion can be drawn. Also adaptation is a tricky thing to measure and it is not possible to determine whether long term memory is used. To further test the capacity of the model and get a stronger bot, it should be confronted to stronger opponents.
-some hardware is required but it is not excessive. An aws ec2 instance c5.18xlarge is used, and the code is written to exploit multi-threading. To simulate 100'000 games it took between one and two hours. To train against one table (=set of opponents), 420'000 games are simulated. In the last implementation there are four different tables, thus the training took about 24 hours. Though a fast hand evaluator was used (OMPEval), calculating the equity is still the most time consuming element of training. Similarly, the training of ASHE (as presented in the paper above) would take roughly 5 hours.

Feel free to shoot questions / critics. The code is now being structured so that it is easier to use :) A useful element to pursue this work would be to have one (or even multiple) strong sng bot to train / validate against. Is there one accessible that is considered a reference in that game type?

ps: if an admin feels like this post should be located elsewhere, please tell me so


Top
 Profile  
 
PostPosted: Mon Oct 11, 2021 12:09 pm 
Offline
New Member

Joined: Mon Oct 11, 2021 12:07 pm
Posts: 1
It seems to be quite innovative. I like the fact that poker agents incorporated in this method can adapt to opponents they have never seen in training and can exploit weak strategies much more effectively. It is an impeccable novelty for poker agents. I'm a big fan of poker. I have been playing poker since I was a teenager. I have quite a lot of experience with this casino game. Now I play poker in non-gamstop casinos. In the future, I would like to become a poker agent. I think it's a unique experience.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group