Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 3:49 pm

All times are UTC




Post new topic Reply to topic  [ 59 posts ]  Go to page 1, 2, 3  Next
Author Message
PostPosted: Thu May 01, 2014 3:56 pm 
Offline
Junior Member
User avatar

Joined: Sun Mar 17, 2013 10:03 pm
Posts: 25
Hello guys,

I want to do some research on adaptive opponent modeling, and specific on detecting and reacting to strategy changing of the opponent.
In the light of this matter, are you currently using any method that, in any way, handles this kind of situations?

I'm doing this research for school, and am not interested in using any of it afterwards. Furthermore, I will post my findings here, so you can benefit from any ideas, suggestions, ...

Any help, ideas, shared knowledge is highly appreciated.
Thank you in advance.


PS. I'm using the open source testbed given in http://www.poker-ai.org/archive/www.pok ... f=3&t=3393 for testing. The bot I'm building, is based on the MCTSBot delivered with it. The bot uses a window-technique (FLORA2), but it does not work well at all.


Top
 Profile  
 
PostPosted: Thu May 01, 2014 5:46 pm 
Offline
Veteran Member

Joined: Mon Mar 04, 2013 9:40 pm
Posts: 269
Well its not very difficult at all but your bot is going to have to keep very detailed stats on the villain (mine tracks about 150 variables). When I think of being adaptive I believe the bot needs to realize when the villain is using fold equity against you..meaning he is bluffing more then he should, 3 betting more then he should etc. So your bot just needs to adapt until equilibrium is reached..meaning your not over folding or over calling.

The flip side of this is your bot needs to use fold equity against the villain until he adapts as well. I believe the bot should be extremely over aggressive right out of the box. Force the villain to adapt and not you is the better policy imo.


Top
 Profile  
 
PostPosted: Thu May 01, 2014 6:07 pm 
Offline
Regular Member
User avatar

Joined: Tue Mar 05, 2013 9:19 pm
Posts: 50
I've recently thought about this quite a lot - it's a challenging problem. I've developed a simulation based agent but haven't had time to implement adaptive modelling just yet.
The best idea i've seen yet for tackling this is:
The myth of opp. modelling and VPIP: Driving straight in a bend

Essentially, track several moving averages for different player statistics over time to see whether they are 'shifting gear'.

I guess another way could be to look at a single opponents data (once you've collected enough..) and then cluster their strategies based on PFR and VPIP etc..
But, you have to estimate k or use a clustering algorithm that finds the number of clusters automatically. This way you can find strategies within their whole collection of data.
Train opponent models for each of their sub-strategies and then identify which one they are playing by only looking at their last x hands of data.


Top
 Profile  
 
PostPosted: Fri May 02, 2014 1:45 pm 
Offline
Junior Member
User avatar

Joined: Sun Mar 17, 2013 10:03 pm
Posts: 25
My idea now is having one nash equilibrium model and one adaptive model.

And as the game goes on, I build up the adaptive model.
The more accurate the adaptive model gets, the more I deviate from the nash equilibrium and use the learned model.
And when the opponent changes strategies, or unknown situations encountered, the bot can fall back on the 'safe' model.

What do you guys think of this approach?


Top
 Profile  
 
PostPosted: Fri May 02, 2014 1:55 pm 
Offline
Regular Member
User avatar

Joined: Tue Mar 05, 2013 9:19 pm
Posts: 50
What limit/game are you playing? Heads-up, 6-max, etc..?


Top
 Profile  
 
PostPosted: Fri May 02, 2014 4:10 pm 
Offline
Junior Member
User avatar

Joined: Sun Mar 17, 2013 10:03 pm
Posts: 25
ibot wrote:
What limit/game are you playing? Heads-up, 6-max, etc..?
My goal is to build an opponent model that can be used in multiplayer no-limit poker, but I don't see how it affects my suggested approach?


Top
 Profile  
 
PostPosted: Fri May 02, 2014 5:35 pm 
Offline
Regular Member
User avatar

Joined: Tue Mar 05, 2013 9:19 pm
Posts: 50
The current state of the art tends to be limited to 3 player poker, with most papers focused on heads-up and limit poker. That's not to say it's not possible to get some near-nash equilibrium strategy for no-limit multiplayer poker, but it's a huge challenge within itself.

That's generally the reason that simulation based (MCTS etc..) agents are built for these games.


Top
 Profile  
 
PostPosted: Fri May 02, 2014 5:48 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
Before looking at whether an opponent adjusts to you, you should have an idea about how to build a good opponent model for a stable (not changing its strategy) opponent, which is hard enough. Most players at lower limits will never really make player-specific adjustments, and if so only based on stats, but its rare you will find anyone analyzing your strategy and trying to exploit leaks in it. However, as said, its very tough to build a good opponent model, given you have just a few hundred hands and even less SD information.


Top
 Profile  
 
PostPosted: Fri May 02, 2014 8:20 pm 
Offline
Junior Member
User avatar

Joined: Sun Mar 17, 2013 10:03 pm
Posts: 25
I understand. But it is not my intention to build a state-of-the-art pokerbot, or a winning bot for that matter. It is the research that is important, and in my case, the research on adaptive opponent modeling in poker. I'm a recreative poker player, and I've been educated in Machine Learning, programming and statistics, which all could come in handy in this research. And with this research, I hope I can help all botters who are currently working with an adaptive opponent model.

For example, in the idea above, it would not be important to have an nash equilibrium model. An 'good' model would satisfy to test this approach.

But I think you are correct to concentrate first on an usable opponent model. If I'm correct, the two biggest challenges are the small dataset and finding the right features. I've been reading some papers on the matter, but I haven't found a clear 'winning' approach.


Top
 Profile  
 
PostPosted: Fri May 02, 2014 9:15 pm 
Offline
Regular Member
User avatar

Joined: Tue Mar 05, 2013 9:19 pm
Posts: 50
I've recently finished by dissertation on a very similar topic, essentially what you're aiming for but without the adaptive modelling. To combat the lack of data, I recommend clustering on existing data and then building opponent models for strategies that you can discover within the data. Then assign a player to one of these strategies until you have enough data on them.

Another problem arises in the sense of what is 'enough' data. I played for 20k hands in simulations, and even then you can have <1000 hands of data on the river for opponents. A possible solution to this is to iteratively merge both the unique data and the clustered data, e.g. copy the closest cluster of data for a player, then every 500 hands replace 500 hands of cluster data with the unique data, thus gradually building a more accurate and unique opponent model while combating the lack of data problem.

However, here the assumption is that the player is playing to a single strategy. There are ways I can think of to make it adaptive however..

How long do you have for your project? It does sound interesting, and was what I was initially aiming to do my project on :)


Top
 Profile  
 
PostPosted: Sat May 03, 2014 2:05 pm 
Offline
Junior Member
User avatar

Joined: Sun Mar 17, 2013 10:03 pm
Posts: 25
Interesting idea for the model. Can I somewhere read more about your project, or was it a private project?

I'll start with implementing my idea with a 'safe model' and an 'opponent model'. I find it natural that the more confidence I get that I can predict my opponent, the more risk I take to exploit him, and vice versa. It has the advantage that it's somewhat independent of the used models, so work would not be lost if I change/alter the adaptive model. Or am I wrong/is this a bad idea?

About the time-frame: the project should be finished in about a month, so there is not much time to 'linger around'. I can work on it full-time though, so progress should be made quickly. I've already done much of the 'getting familiar with the code' and the problem. It's time to test/find possible solutions. ;).


Top
 Profile  
 
PostPosted: Mon May 05, 2014 7:13 pm 
Offline
Regular Member
User avatar

Joined: Tue Mar 05, 2013 9:19 pm
Posts: 50
I'll hopefully be able to upload it here soon, although there are similar ones to mine already in the papers section. A search for MCTS Poker & Poki Poker Paper should get you good results :)

Sounds like a viable solution, although a month is short amount of time for this project - make sure you know exactly what you want to do first and then implement. Ask questions here along the way and we can try help out - good luck!


Top
 Profile  
 
PostPosted: Mon May 05, 2014 7:21 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
http://poker-ai.org/archive/pokerai.org ... 634&hilit=
http://www.poker-ai.org/archive/pokerai ... &sk=t&sd=a


Top
 Profile  
 
PostPosted: Tue May 06, 2014 12:14 pm 
Offline
Junior Member
User avatar

Joined: Sun Mar 17, 2013 10:03 pm
Posts: 25
Thanks, really appriciate the help.
I'll let the adaptive thing go for now, and concentrate on an opponent model for the no-limit multiplayer variant. So this is what I've come to so far (I've thought of all this myself, so it's very likely their are (big) faults in it. Do not hesitate to point me straight!):

THE PROBLEM
For every opponent, I need a model for two probabilities: the opponent's cards and the opponent's actions. Useful information about these probs is hidden in previous actions of the opponent, as these actions are (or rather can be) based on (1) his hole cards c, (2) the public gamestate S_i (all previous actions, community cards, stacksizes, ...) and (3) the type of players he (or she, let not discriminate ;) ) faces.

note: S_i is the gamestate right up to the moment the opponent needs to make action a_i

To summarize, I need
  • P(c | S_i, L) with L the collection of opponent player types
  • P(a_i| | S_i, L) = sum over all hole cards of {P(c | S_i, L) * f(c, S_i, L)}
where f() is the opponent-specific function (read: strategy) that gives the action, given the information summed up above.

ASSUMPTION 1 - the opponents strategy is deterministic: when faced exactly the same situation, he will make exactly the same action. Wrong assumption: f(c, S_i, L) has to change in P(a_i | c, S_i, L)

If I'm correct (which almost never happens), we can calculate (Bayes' rule):
P(c | S_i-1, a_i-1, L) = P( a_i-1 | c, S_i-1, L) * P ( c | S_i-1, L) / P( a_i-1 | S_i-1, L)
  • P( a_i-1 | c, S_i-1, L) is the output of the model
  • P ( c | S_i-1, L) we have calculated before
  • P( a_i-1 | S_i-1, L) is also calculated before, also, it's just normalisation constant and it can be omitted

As the actions of the other players do not tell us anything about the hole cards, so P(c | S_i-1, a_i-1, L) = P(c | S_i, L) if we stay in the same round. If new community cards are present, we can simply adjust the probabilities by eliminating the hole cards that aren't possible anymore.*

If what's stated above is correct, the problem should come down to finding (a good approximation of) f(c, S_i, L). Is this correct?


STEP 1: simplify the problem with more assumptions, abstractions, ...
To see what simplifications are possible, let's go over each set and see how they influence the opponent's actions:

  • Hole cards c: There are 1225 (50*49/2) possible hole cards for every opponent. Before the flop (as everyone here should know, see security question), we can abstract these to 169 different situations, after the flop there is no abstraction possible (most of the time), due to the importance of suits.

    Many bots use bucketing, where hole cards are taken together based on their handstrength. Might take this approach if it turns out to be necessary, but I have not looked into this yet.
  • Actions a: In no-limit, the number of different raise-amounts are a problem. Again, bucketing should be a viable option.
  • Player types L: This influences the meaning of a player's actions. For example, you will react differently to a call of a loose passive player than to a call of a thight agressive player (or you should ;)).

    I have not seen this in any current opponent model. It implies that you assume your opponent is adaptive. For now, I'll ignore this too. Later, I might look into this and categorize every player in tight/loose and passive/agressive.
  • Gamestate S: the big one. I will discuss this in the next reply.


Other remarks:
* different opponents can not have the same hole cards, so you could take this into consideration, but I think this would only help if the reads were very strong. Just to say: it's not for any time soon. :)


Last edited by Sacré d'Jeu on Tue May 06, 2014 3:40 pm, edited 3 times in total.

Top
 Profile  
 
PostPosted: Tue May 06, 2014 1:13 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
I don't think the deterministic assumption - player will act the same way in the same situation - is a good one. Bluffs and slowplays really only work because they aren't deterministic. But you can still use Bayes rule to determine the chance of a player taking an action or holding certain cards. (I haven't checked your maths)

You will need to use bucketing on hand strength to make the problem tractable. And you need to take account of hand potential or variance too. You will not be able to distinguish a particular player's preference for individual hands this way. C'est la vie. A made hand and a draw of equal strength will be played differently throughout the hand because a missed draw will usually be folded on the river.

You can observe how different strategies play against one another by simulation. Then use Bayes rule to determine the strategy from observations and observations of many games in the wild. (The details of this bit are a bit hazy at the moment, but I'm fairly sure it's possible) The best you can ever achieve will be a probability of a strategy. It might be useful to read http://www.poker-ai.org/archive/www.pok ... =64&t=4037 too


Top
 Profile  
 
PostPosted: Tue May 06, 2014 2:29 pm 
Offline
Junior Member
User avatar

Joined: Sun Mar 17, 2013 10:03 pm
Posts: 25
Yes, you are right, the assumption is wrong, but the reasoning is the same, if you think of f() as a function that gives probability P( a | c, S, L).

The important conclusion is that it seems that I only need one model to give me both probabilities.


Top
 Profile  
 
PostPosted: Tue May 06, 2014 3:06 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
I've edited my post above.

At some stage you do have to take account of the fact that villain will adapt. If you don't your strategy will be a best response and it will be too predictable. So you have at least three ways of dealing with this:
- Continually adapt your play to the latest information about his play. The problem with this is that you can't adapt quickly enough.
- Predict how he will adapt to you. This is hard. But I guess possible, just.
- Play a strategy that exploits him without being too exploitable. This is also hard too, but maybe http://poker.cs.ualberta.ca/publication ... -rnash.pdf will provide some ideas.


Top
 Profile  
 
PostPosted: Tue May 13, 2014 8:36 pm 
Offline
Junior Member
User avatar

Joined: Sun Mar 17, 2013 10:03 pm
Posts: 25
Ok, so here's my general plan for the opponent model. I encourage you to find and point me any flaws, errors, etc. in it:
'I' stands for an information set of the game state, see more below.
I learn the opponent-specific distribution P (a | hc Є b, I ) based on the action in hands where the hole cards are shown.

For each hand:
1) I divide all possible starting hands into 9 buckets based on HS and HP (eg, one bucket with low HS and medium HP).
The initial chance that the opponent has a hand from a certain bucket P(hc Є b | I ) is in relation to the number of hands in the bucket.

2) When MCTS asks for the probability of an action a of the opponent, I calculate it:
P( a | I ) = Σ P ( hc Є b | I ) * P (a | hc Є b, I ) over all buckets

3) When the opponent makes an action a, I update the belief distribution over the buckets:
P(hc Є b | a, I) = P(hc Є b | I ) * P(a | hc Є b, I ) / P ( a | I)

4) When a new round starts and new community cards are revealed, I update the belief distribution accordingly with predefined transition function dependent on the community cards, to 9 new buckets, again based on HS and HP

5) When we come at the river, we transform the buckets into 6 buckets, only based on HS

6) When the EV of a showdown is needed in MCTS, I calculate it as follows (eg. with 2 opponents at the showdown):
EV = Σ EV(B) * P1 (hc Є b1 | I ) * P2 (hc Є b2 | I ) for all possible combinations of B = {b1, b2}
with EV(B): I win if my HS is higher than bucket, lose if it's lower. If my HS lies in the bucket, I'll try to calculate how much % I lose, draw and win. (Maybe for now, I'll just count it as a draw).

7) If a real showdown happens, I update the learned opponent model P (a | hc Є b, I ), based on the actions in this hand.

8) To start, I will learn a general model, based on many hands from many different players.


The only thing left is how my abstracted information set would look. I'm still thinking about that.


Top
 Profile  
 
PostPosted: Tue May 13, 2014 8:54 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
I think the general principle is pretty good. You might need a few more buckets though.

Sacré d'Jeu wrote:
I learn the opponent-specific distribution P (a | hc Є b, I ) based on the action in hands where the hole cards are shown.

I think that is biased. That's why I suggested simulation.

Sacré d'Jeu wrote:
7) If a real showdown happens, I update the learned opponent model P (a | hc Є b, I ), based on the actions in this hand.

How do you calculate the weighting of the update?


Top
 Profile  
 
PostPosted: Wed May 14, 2014 9:24 am 
Offline
Junior Member
User avatar

Joined: Sun Mar 17, 2013 10:03 pm
Posts: 25
spears wrote:
Sacré d'Jeu wrote:
I learn the opponent-specific distribution P (a | hc Є b, I ) based on the action in hands where the hole cards are shown.

I think that is biased. That's why I suggested simulation.

You are absolutely right. I understand that some people assume that, if he folds, he has a bad hand and also learn on these actions. Maybe I'll do that.

Can you elaborate on your suggestion? I don't understand how you would do this.


spears wrote:
Sacré d'Jeu wrote:
7) If a real showdown happens, I update the learned opponent model P (a | hc Є b, I ), based on the actions in this hand.

How do you calculate the weighting of the update?

I haven't decised the algorithm/model I'll use. I'm thinking to use a decision tree or an neural netwerk.
The weighting of the instances is also still up in the air. :)


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 59 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
cron
Powered by phpBB® Forum Software © phpBB Group