Poker-AI.org

Re: Adaptive Opponent Modeling

2014-05-29T23:42:59+00:00

Quote:

yeah it has been a dilemma for me for quite awhile. When the villain is leading the betting post flop its very accurate but when the bot is leading and the villain is check calling then it gets difficult because of the slow playing range. So what I did was keep the possible value hands in a separate range in case he decided to "let me know" he had a hand and raise me. So..I have assumed that he does NOT have those hands until he lets me know that he does which is why I have avoided weighting.

Quote:

I'd be very interested in the comparison of different algorithms to predict opponents ranges. Obviously we will always have a bias w.r.t. the folded hands (which we will not see). However, even if we only focus on hands that went to SD, how does a good evaluator function look like? It obviously can't just look if the actual hand is in the range and then return 1, otherwise 0 - this way a stupid predictor (always 100% range) would be best. So we need to combine:
- how much weight does the actual hand have in the prediction range
- how large is the predicted range
Ideally, the measure should be scaled from 0 to 1, but I'm still lacking a good idea to normalize it. Did you put some thought into this? If so, it would be cool if we could select like 1M games from real money games as a test bed and compare our algorithms to see how an algorithm compares to others.

Hmm..I see what you mean now. I will think on this.

Statistics: Posted by shalako — Thu May 29, 2014 11:42 pm

Re: Adaptive Opponent Modeling

2014-05-29T16:05:59+00:00

I think weighting hands is always more accurate than a binary model. There are some cases where the weight will be 0, e.g. a 5bet preflop with 72o should be 0 or very close to it. But for example a hand like 55 on a 865r board could be sometimes called or raised versus a cbet - you don't want to assume after villain calls that he has all sets, nor you don't want to assume he does never have a set, so you actually want to weight its probability.

I'd be very interested in the comparison of different algorithms to predict opponents ranges. Obviously we will always have a bias w.r.t. the folded hands (which we will not see). However, even if we only focus on hands that went to SD, how does a good evaluator function look like? It obviously can't just look if the actual hand is in the range and then return 1, otherwise 0 - this way a stupid predictor (always 100% range) would be best. So we need to combine:
- how much weight does the actual hand have in the prediction range
- how large is the predicted range
Ideally, the measure should be scaled from 0 to 1, but I'm still lacking a good idea to normalize it. Did you put some thought into this? If so, it would be cool if we could select like 1M games from real money games as a test bed and compare our algorithms to see how an algorithm compares to others.

Statistics: Posted by proud2bBot — Thu May 29, 2014 4:05 pm

Re: Adaptive Opponent Modeling

2014-05-28T21:01:02+00:00

Quote:

I don't believe, that you should float with hands with absolutelly zero equity. The good bots also almost never do that. You should use backdoor draws and overcards etc. for your floats and bluffraises and only use some very trashy hands in very rare situations, so you can hit different boards.

Yeah mine does not float with any zero equity hands either..as you said its backdoor draws, overcards, pair and an over card etc.

Quote:

Basing the whole thing on his distribution is of course a good idea, you should for instance bluff more and bet bigger if your range is much stronger than his, but in practice it is pretty hard to determine his range correctly if he also adapts.

Well that is the trick..determining his range correctly. It is the key to everything imo. I spent a good year working on this trying to get the bot to range as accurately as possible from pre to the river. It was not an easy task. I had the main code after three weeks but the fine tuning was extremely time consuming. The little differences made a big impact on equity calculations. Studying how the pros do this was invaluable..one player in particular.

What I really want to do next is test the range finder vs HH files that go to showdown to determine the accuracy (ie the hand was in the bots predicted range). By the river you can seriously narrow his range just in the fact that he can only be betting a small number of hands for value.

The grey areas are the 3 bet bluff ranges and slowplaying ranges. Some people think you should never eliminate hands out of a persons range and weight them but I am not so sure about that. I came up with a solution to those problems and so far it is working ok but it still bothers me that it could be more accurate.

Statistics: Posted by shalako — Wed May 28, 2014 9:01 pm

Re: Adaptive Opponent Modeling

2014-05-28T19:48:41+00:00

I am of course also slowplaying, bluffraising etc. But even for that I use a parametrization based on my EHS2 and on the river my HS.

I am pretty much parameterising my range like it is described in Mathematics of Poker.

I don't believe, that you should float with hands with absolutelly zero equity. The good bots also almost never do that. You should use backdoor draws and overcards etc. for your floats and bluffraises and only use some very trashy hands in very rare situations, so you can hit different boards.

Basing the whole thing on his distribution is of course a good idea, you should for instance bluff more and bet bigger if your range is much stronger than his, but in practice it is pretty hard to determine his range correctly if he also adapts.

Statistics: Posted by HontoNiBaka — Wed May 28, 2014 7:48 pm

Re: Adaptive Opponent Modeling

2014-05-28T16:37:18+00:00

Quote:

For instance I have my standard opening ranges based on a database analysis of winning players, the opening range itself is not pseudo GTO yet, but if I face a 3bet for instance, I will only fold 55-60% of my range, so noone can exploit me through lose 3betting.

I am very interested in your winning player database. How did you come up with that?

Quote:

I rank my hands in this spot based on EHS2 and some heurostiks and select the worst ones for a fold, the best ones for a raise and the ones in the middle for a call.

I would be careful using the above approach as it is exploitable. Your gonna have to slow play some of your value hands on the flop or else your telegraphing you have a marginal holding. Balancing this out is not easy. Move some of your floating hands into your raising range (like backdoor combo draws) and some of your value hands into your calling range.

Quote:

Of course I see a lot of weaknesses in my approach. The biggest one seems to be, that for my decisions I don't look far into the future. I might make a good 3bet bluff based on his fold to 3bet and his fold to contibet, but I will never be able to plan my exploit through the whole hand. A good algorithm would already consider his fold to riverbet and losen up my preflop range based on situations, that will occur way later in the hand.

Yeah the pros seem to have things all planned out on future streets..especially on taking away the pot later in the hand. Floating with complete air with the intention of making a big bluff later is not easy to do for a bot...however..if the villains range is mostly air then I suppose the GTO approach would be to call X% of the time so that he is not getting value on his bluffs and raise/xr on one of the later streets..preferably on a scare card or something. Technically it would be better to float with some hands that have at least some equity if called I guess.

So maybe considering is his fold to river bet % is not really what you should do and it should have more to do with his range and hand combinatorics?

Statistics: Posted by shalako — Wed May 28, 2014 4:37 pm

Re: Adaptive Opponent Modeling

2014-05-28T11:52:15+00:00

spears wrote:

HontoNiBaka wrote:

I am currently working on my advisor again, currently it is pretty much just a set of rules.

What terms are the rules written in? eg
- If I have a flush draw and there is so much in pot and villain has 3 bet then check
- If I have a 70% chance of winning at showdown vs a uniform hand yada yada
- Something else?

Are they deterministic, ie give the same action for the same situation, or non-deterministic, ie give a probability of action in a given situation?

I am using a combination of 2 models. The first one is pretty much like you describe it, it is my exploitive model.

For instance I have an AK high FD, villain raises me on a flop, he has a high fold to flop 3bet -> I 3bet.

The only opponent modeling I am using here are stats and laplace corrected stats.

My second model is a pseudo GTO model based on expert knowledge and some basic formulas.

For instance I have my standard opening ranges based on a database analysis of winning players, the opening range itself is not pseudo GTO yet, but if I face a 3bet for instance, I will only fold 55-60% of my range, so noone can exploit me through lose 3betting.

On the flop for instance, I will only fold 40-50% to a pot sized continuation bet, because my opponent has 1:1 odds on his potential bluff and also some equity, that is why I don't fold exactly 50% as the odds itself would dictate.
I rank my hands in this spot based on EHS2 and some heurostiks and select the worst ones for a fold, the best ones for a raise and the ones in the middle for a call.

Bot models are deterministic.

I get some real randomness by combining both models. For instance I might conclude based on the number of hands I have from my opponent, that I want to play my exploitive strategy 20% of the time and my pseudo GTO model 80% of the times, I decide with a random number generator which one I will use.

Of course I see a lot of weaknesses in my approach. The biggest one seems to be, that for my decisions I don't look far into the future. I might make a good 3bet bluff based on his fold to 3bet and his fold to contibet, but I will never be able to plan my exploit through the whole hand. A good algorithm would already consider his fold to riverbet and losen up my preflop range based on situations, that will occur way later in the hand.

The same goes for my pseudo GTO approach, my preflop play may already give me such a weak river range, that making a call, that would be balanced if you looked at the riverplay as isolated, will be very bad in context of the whole hand.

Statistics: Posted by HontoNiBaka — Wed May 28, 2014 11:52 am

Re: Adaptive Opponent Modeling

2014-05-27T17:26:05+00:00

HontoNiBaka wrote:

I am currently working on my advisor again, currently it is pretty much just a set of rules.

Statistics: Posted by spears — Tue May 27, 2014 5:26 pm

Re: Adaptive Opponent Modeling

2014-05-27T14:07:54+00:00

Quote:

The problem with 4bet/5bet etc. stats is that they require following a player for a large number of hands before you get anything usable. This is generally the problem with simulation based approaches. In theory it sounds like a solid enough approach, however in practice getting the data you need on each player is quite tough..

When you no data on villain model him as the weighted average of all villains. When you have a little data on villain model him as the weighted average of all villains using the most frequent stats. Use less frequent stats the more data you have on villain.

Statistics: Posted by spears — Tue May 27, 2014 2:07 pm

Re: Adaptive Opponent Modeling

2014-05-27T13:57:10+00:00

Sacré d'Jeu wrote:

Now I need training data to build a general model. I want to use a mix of table-sizes and blindsizes, all playing NL Texas Hold'em.
I've read here somewhere about a website where you could buy these, but I've forgotten the name.

HandHQ released a large database of hand histories from several sites and in several different limits, see the second post by spears here: HandHQ DB.

Sacré d'Jeu wrote:

And I've been thinking about testing and how to compare different implementations:
- Playing against other bots: alwaysCall, alwaysRaise, SimpleBot.
Are there any other bots shared in the pokerworld, that I could use?

There are a limited number of multiplayer bots available, however there is a MCTSBot that you can use via opentestbed, along with a few others.

Sacré d'Jeu wrote:

- Playing against eachother: in a heads-up setting, this is obvious. I guess, if you want to test with more players, you put in 'neutral' bots with the two bots you want to compare.

I can't imagine you'll learn much about multiplayer player using heads-up matches, I would go for running simulations against different types of players and several configurations. The more tests and the more hands played, the better.

HontoNiBaka wrote:

I think it will cover my bluffs pretty decently. For instance I am planning to learn how often a player folds to a 4Bet after he 3Bets. I plan to use his 3bet as feature, the 4Bet% of the 4 bettor, the 4 bettors fold to 5bet, PFR, a few moving averages of those stats and special features, like how many 4bets did the 4 bettor make in a row etc.

Statistics: Posted by ibot — Tue May 27, 2014 1:57 pm

Re: Adaptive Opponent Modeling

2014-05-27T13:47:13+00:00

Hey guys, a brief update:

- we've decised to include the bucketing into our research. I'm going to compare different bucketing options, some of them discussed here.
- I'm implementing a good part of the PT statistics into the feature-set. I hope to finish this today.

Now I need training data to build a general model. I want to use a mix of table-sizes and blindsizes, all playing NL Texas Hold'em.
I've read here somewhere about a website where you could buy these, but I've forgotten the name.

And I've been thinking about testing and how to compare different implementations:
- Playing against other bots: alwaysCall, alwaysRaise, SimpleBot.
Are there any other bots shared in the pokerworld, that I could use?

- Playing against eachother: in a heads-up setting, this is obvious. I guess, if you want to test with more players, you put in 'neutral' bots with the two bots you want to compare.

- Johansson used exploitability against exploitation, but I feel that this would be to much work to implement.

Statistics: Posted by Sacré d'Jeu — Tue May 27, 2014 1:47 pm

Re: Adaptive Opponent Modeling

2014-05-27T13:40:30+00:00

I am currently working on my advisor again, currently it is pretty much just a set of rules.

My planned approach is to learn a model on a big database with Regression Trees and to classify each opponent I play against based on that model.

I think it will cover my bluffs pretty decently. For instance I am planning to learn how often a player folds to a 4Bet after he 3Bets. I plan to use his 3bet as feature, the 4Bet% of the 4 bettor, the 4 bettors fold to 5bet, PFR, a few moving averages of those stats and special features, like how many 4bets did the 4 bettor make in a row etc.

If I set fold to 1 and call/5Bet to 0, I should get a decent percentage of how often players fold to a 4bet in those situations. I will also learn models of how often the call vs how often they fold (1 vs all classification basically, to get a multiclass regression.

I plan to do the same for contibets for instance, with the feature space now also containing the flop cards.

Since it doesn not matter that much which range he will exactly continue with for my bluffs, what really counts are his folding %, I think this should be a relativelly solid model.

Then I classify my opponents based on that, for instance all players who I have 0 hands from will be classified the same way, and also all players who have the exact same stats at the moment will be classified the same. In my mind that model will allow more variations, than clustering.

When it comes to value betting it will be of course more difficult, or when it comes to calling, because here his range matters, the only exception maybe being calling preflop all ins, because then he can not fold anymore and I can see all his AI hands, so I can learn a vector of probabilities for each hand with a relativelly small bias.

When it comes to actually determining a range of hands, I was thinking about a semi GTO approach. Basically I was thinking about taking the % of hands my regression tells me he is holding and through some sort of fictious play determining hands, that would have a high EV against my perceived range.

I don't know if my ideas make sense though, I have only used CFRM so far, but of course that won't help much for 6 max.

Statistics: Posted by HontoNiBaka — Tue May 27, 2014 1:40 pm

Re: Adaptive Opponent Modeling

2014-05-22T13:38:20+00:00

Neat, thanks.

Statistics: Posted by spears — Thu May 22, 2014 1:38 pm

Re: Adaptive Opponent Modeling

2014-05-22T13:19:12+00:00

spears wrote:

Is your idea to divide the space with rectangles oriented on the principal axes?

Here is what you get with the pocket pairs removed:

So you can now break that up into say 3x3 or 4x4 (using either equal areas or equal densities), transform the bounding coordinates back to your original space, and then separately break the pocket pairs in 2, 3 or 4 more clusters, etc.

By the look of it, you might get even better clusters by scaling the factor 2 axis or using quadrilaterals instead of rectangles.

Depending on exactly what you want to do, you might also be able to improve on the "stepiness" of the clustered approximations by interpolation via triangular fuzzy set membership, etc.

The Spatial Sign transformation might map the values to something interesting in 1-dimensional space so it's worth a try.

Juk

Statistics: Posted by jukofyork — Thu May 22, 2014 1:19 pm

Re: Adaptive Opponent Modeling

2014-05-22T12:24:05+00:00

jukofyork wrote:

spears wrote:

I've tried clustering in weka. Example attached. I've messed about with stretching and compressing the strength axis but hasn't been very successful. Need to think about this some more. Also wondering if points should be weighted by the number of instances. Will return to this tonight/tomorrow.

Have a look at using "Principal Component Analysis" and/or "Spatial Sign Transformation".

Juk

I think I understand PCA but not Spatial Sign Transformation. Is your idea to divide the space with rectangles oriented on the principal axes?

Statistics: Posted by spears — Thu May 22, 2014 12:24 pm

Re: Adaptive Opponent Modeling

2014-05-22T11:41:50+00:00

spears wrote:

Have a look at using "Principal Component Analysis" and/or "Spatial Sign Transformation".

Juk

Statistics: Posted by jukofyork — Thu May 22, 2014 11:41 am

Re: Adaptive Opponent Modeling

2014-05-21T16:11:13+00:00

Sacré d'Jeu wrote:

spears wrote:

- What is the timescale for this work?
- Could you summarize the current project objectives and plan?
- I'm wondering if you could use some overall project advice, rather than technical details which is what I've concentrated on so far.
- Given your initial stated objective, the strength mean/variance is something of a distraction if you are in a hurry. You could use ehs2 see page 25 of http://poker.cs.ualberta.ca/publication ... on.msc.pdf

- In about two weeks, I've to hand in my research.
- Goal is to build an as good as possible pokerbot (NL-multiplayer) with adaptive opponent modeling. I hope to have a working pokerbot by the beginning of next week. Then I've got a week for tweaking, testing and improving.
But the goal is not the most important part. It's more important I can show research and development, so don't worry about the result too much.

- My supervisor wasn't present today, so a final decision on the bucketing will be for tomorrow, but I'm guessing we will raise the means by some power (and maybe also the variance instead of a dummy point) and then use a clustering algorithm. The end results will not differ much though.
And with that, the outline of the bucketing is finished. I'll calculate the mean and variance after every possible flop for every hole, so I can create transition tables. Then do the same for flop->turn and turn->river.
(You are very kind to help me so much, and I really appreciate it!)

So, next problem: the simplified gamestate I = how can I describe the gamestate with a small number of features, so I can accurately model (most of) the opponents' possible strategies P(a | b*, I)?
I'm thinking the beliefs distribution b* of the opponents holecards has already much of the information about previous actions of the opponent, so I don't need to include such information here (correct me if I'm wrong).

Here is a first thought about the features I'll use:
- round (I'm thinking about eventually using a different model for each round, or preflop-postflop, but for now, I'll use only one)
- relative stacksize player (vs. potsize)
- absolute stacksize player (in BB)
- position (only against players still in the hand)
- relative amount to call (vs stacksize)
- absolute amount to call (in BB)
- size last raise (in BB)

- number of opponents (at the start of the hand)
- number of active opponents (= players who are still in the hand)
- average stacksize active opponents (or should I use max(player stacksize, opponent stacksize) and take the average of that?)
- average VPIP, AF and frequency actions active opponents

- number of opponents that raised this round
- average stacksize of active opponents that raised this round (same note as above)
- average VPIP, AF and frequency actions of active opponents that raised this round

- number of players all-in
- number of hands played
------------------------------------------------------------------------------------
I've also thought about:
1. The players own statistics (VPIP, AF, frequency actions), but I'm not sure cause the opponent does not make decisions based on his own statistics. Furthermore, it might cause the model to concentrate too much on these features, I guess. It should be a great help for the default model, as this would lead to different strategies for different characterised players. But for an opponent-specific model, only statistics based on his last x actions/hands would make a difference, right?

2. Include information about the belief distribution of the opponents, as this implies information about the action sequence. (That would mean I'll have to keep a belief distribution for myself too).

3. If there are any statistics I don't need to use cause I'm using MCTS.

4. Information about the board (eg dry/wet board). I've statistics of this with the calculation of the bucketing. I don't think they are already implied in the beliefs distribution, so I use give for example the mean variance of the specific board for all possible holecards as a feature.
-----------------------------------
For the model, I'm thinking to use a NN with the belief distribution and the described gamestate as input, but I'll consult my supervisor for that one too.

- Thinking about writing an adaptive bot in less than two weeks makes me feel ill.
- Maybe you could build a reinforcement learning bot that learns which actions are good and which are bad given the strength of your hand and the context. That would much less work than you doing at the moment.
- I'll try to think of some more ideas to cut down the work

Statistics: Posted by spears — Wed May 21, 2014 4:11 pm

Re: Adaptive Opponent Modeling

2014-05-21T14:41:09+00:00

spears wrote:

Statistics: Posted by Sacré d'Jeu — Wed May 21, 2014 2:41 pm

Re: Adaptive Opponent Modeling

2014-05-21T09:05:46+00:00

Statistics: Posted by spears — Wed May 21, 2014 9:05 am

Re: Adaptive Opponent Modeling

2014-05-20T21:36:08+00:00

The goal is to cluster together hands of similar strength and variance. This is slightly different to clustering together hands that have similar strategy. because you might expect for example a medium strength / low variance hand to be played the same way as a high strength / high variance hand. But your MCTS should find out how to play the different hands.

I put in the dummy point to force k means into making more divisions by strength and fewer by variance. You are right about raising the strength to a power but it hasn't been that successful. The justification for this is "expert knowledge", but it could be verified by testing bots using different schemes against one another.

You could of course use expert knowledge for the pre flop hands, but this becomes a large task for the post flop hands. That is why I advocated an algorithmic approach.

Statistics: Posted by spears — Tue May 20, 2014 9:36 pm

Re: Adaptive Opponent Modeling

2014-05-20T19:41:33+00:00

Yeah, I've been doing somewhat the same. (didn't want to post the raw data like you, thought it took to much space :p)

The number of clusters could also be altered (keeping in mind the trade-off between simplicity and accuracy).
The goal should be to cluster the starting hands that ask for the same strategy, right? So do we have an idea how a good clustering should look like? I'm guessing big buckets for bad holes, and smaller buckets for strong hands?

Raising the handstrength to a power is a way to cluster the low starting hands more together while using a less denser clustering for strong hands, right?
And what's the reasoning behind the dummy point?

Tomorrow, I'm speaking with my supervisor, who has more experience with clustering. I'll come back to here then.

THANKS!!!!

Statistics: Posted by Sacré d'Jeu — Tue May 20, 2014 7:41 pm

Re: Adaptive Opponent Modeling

2014-05-20T17:38:17+00:00

I've added in a dummy point at strength = 0.3, variance = 0.5. Then raised the strength to power of 4 and used weka k means to get this:

The dummy point gets a cluster of its own, and there are four points in the strongest cluster.

Statistics: Posted by spears — Tue May 20, 2014 5:38 pm

Re: Adaptive Opponent Modeling

2014-05-20T16:41:21+00:00

I've transformed your data so it will go into weka as a csv

Code:

AAo, 0.852000, 0.011200
AKo, 0.670400, 0.061600
AQo, 0.662100, 0.059800
AJo, 0.653900, 0.058400
ATo, 0.646000, 0.057500
A9o, 0.627800, 0.055700
A8o, 0.619400, 0.055900
A7o, 0.609800, 0.056500
A6o, 0.599100, 0.057300
A5o, 0.599200, 0.059800
A4o, 0.590300, 0.061200
A3o, 0.582200, 0.062200
A2o, 0.573800, 0.062900
KAs, 0.653200, 0.059500
KKo, 0.824000, 0.013300
KQo, 0.634000, 0.072700
KJo, 0.625700, 0.071100
KTo, 0.617900, 0.070000
K9o, 0.599900, 0.067200
K8o, 0.583100, 0.065700
K7o, 0.575400, 0.065900
K6o, 0.566400, 0.066300
K5o, 0.557900, 0.067000
K4o, 0.548800, 0.068200
K3o, 0.540500, 0.069100
K2o, 0.532100, 0.070000
QAs, 0.644300, 0.057200
QKs, 0.614600, 0.070200
QQo, 0.799300, 0.014600
QJo, 0.602600, 0.082300
QTo, 0.594700, 0.081100
Q9o, 0.576600, 0.077700
Q8o, 0.560200, 0.075400
Q7o, 0.543000, 0.073800
Q6o, 0.536100, 0.073800
Q5o, 0.527700, 0.074100
Q4o, 0.518600, 0.075200
Q3o, 0.510200, 0.075800
Q2o, 0.501700, 0.076400
JAs, 0.635600, 0.055300
JKs, 0.605700, 0.068100
JQs, 0.581300, 0.079300
JJo, 0.774700, 0.016200
JTo, 0.575300, 0.091900
J9o, 0.556600, 0.088300
J8o, 0.540200, 0.085700
J7o, 0.523200, 0.083300
J6o, 0.506100, 0.081300
J5o, 0.499900, 0.081200
J4o, 0.490700, 0.082000
J3o, 0.482300, 0.082300
J2o, 0.473800, 0.082600
TAs, 0.627200, 0.053900
TKs, 0.597400, 0.066400
TQs, 0.572900, 0.077600
TJs, 0.552500, 0.088600
TTo, 0.750100, 0.018100
T9o, 0.540300, 0.098700
T8o, 0.523300, 0.096100
T7o, 0.506400, 0.093400
T6o, 0.489400, 0.090600
T5o, 0.472200, 0.088200
T4o, 0.465300, 0.088700
T3o, 0.456900, 0.088700
T2o, 0.448400, 0.088700
9As, 0.607700, 0.051100
9Ks, 0.578100, 0.062500
9Qs, 0.553600, 0.073000
9Js, 0.532500, 0.083700
9Ts, 0.515300, 0.094200
99o, 0.720600, 0.022200
98o, 0.508000, 0.104600
97o, 0.491200, 0.102400
96o, 0.474300, 0.099500
95o, 0.457200, 0.096300
94o, 0.438600, 0.094200
93o, 0.432600, 0.094000
92o, 0.424200, 0.093700
8As, 0.598700, 0.050800
8Ks, 0.560200, 0.060000
8Qs, 0.536000, 0.069600
8Js, 0.514900, 0.079900
8Ts, 0.497200, 0.090300
89s, 0.481000, 0.098800
88o, 0.691600, 0.026800
87o, 0.479400, 0.110800
86o, 0.462400, 0.108300
85o, 0.445400, 0.105000
84o, 0.427000, 0.102200
83o, 0.408700, 0.099000
82o, 0.402700, 0.098600
7As, 0.588400, 0.050900
7Ks, 0.551900, 0.059700
7Qs, 0.517700, 0.066800
7Js, 0.496800, 0.076300
7Ts, 0.479100, 0.086400
79s, 0.463000, 0.095400
78s, 0.450500, 0.103800
77o, 0.662400, 0.031700
76o, 0.453700, 0.115700
75o, 0.436800, 0.112800
74o, 0.418500, 0.109800
73o, 0.400400, 0.106100
72o, 0.381600, 0.102400
6As, 0.576800, 0.051200
6Ks, 0.542200, 0.059600
6Qs, 0.510200, 0.066300
6Js, 0.478400, 0.073000
6Ts, 0.460900, 0.082300
69s, 0.444900, 0.091100
68s, 0.432400, 0.100000
67s, 0.423200, 0.107500
66o, 0.632800, 0.037100
65o, 0.431300, 0.118800
64o, 0.413300, 0.116300
63o, 0.395300, 0.112500
62o, 0.376700, 0.108300
5As, 0.577000, 0.053800
5Ks, 0.533100, 0.059800
5Qs, 0.501200, 0.066100
5Js, 0.471800, 0.072400
5Ts, 0.442500, 0.078500
59s, 0.426700, 0.086500
58s, 0.414300, 0.095200
57s, 0.405100, 0.103200
56s, 0.399400, 0.109400
55o, 0.603200, 0.042300
54o, 0.414500, 0.120400
53o, 0.396900, 0.117200
52o, 0.378500, 0.113100
4As, 0.567300, 0.054900
4Ks, 0.523300, 0.060600
4Qs, 0.491300, 0.066600
4Js, 0.461900, 0.072600
4Ts, 0.435000, 0.078500
49s, 0.406700, 0.082900
48s, 0.394500, 0.090900
47s, 0.385500, 0.098600
46s, 0.380100, 0.105300
45s, 0.381600, 0.109900
44o, 0.570200, 0.047800
43o, 0.386400, 0.116900
42o, 0.368300, 0.113100
3As, 0.558400, 0.055400
3Ks, 0.514300, 0.060900
3Qs, 0.482200, 0.066700
3Js, 0.452800, 0.072300
3Ts, 0.425900, 0.077900
39s, 0.400200, 0.082100
38s, 0.374800, 0.086100
37s, 0.366000, 0.093200
36s, 0.360800, 0.099800
35s, 0.362600, 0.105100
34s, 0.351500, 0.104000
33o, 0.536900, 0.052900
32o, 0.359800, 0.110700
2As, 0.549300, 0.055600
2Ks, 0.505100, 0.061300
2Qs, 0.473000, 0.066700
2Js, 0.443500, 0.072000
2Ts, 0.416700, 0.077100
29s, 0.391000, 0.081000
28s, 0.368300, 0.085000
27s, 0.345800, 0.087700
26s, 0.340800, 0.093800
25s, 0.342800, 0.099200
24s, 0.332000, 0.098300
23s, 0.323000, 0.095200
22o, 0.503300, 0.057800

I've tried clustering in weka. Example attached. I've messed about with stretching and compressing the strength axis but hasn't been very successful. Need to think about this some more. Also wondering if points should be weighted by the number of instances. Will return to this tonight/tomorrow.

Statistics: Posted by spears — Tue May 20, 2014 4:41 pm

Re: Adaptive Opponent Modeling

2014-05-20T12:52:41+00:00

The algorithm is finished:

Mean HS (in %)
A K Q J T 9 8 7 6 5 4 3 2
A 85,20 67,04 66,21 65,39 64,60 62,78 61,94 60,98 59,91 59,92 59,03 58,22 57,38
K 65,32 82,40 63,40 62,57 61,79 59,99 58,31 57,54 56,64 55,79 54,88 54,05 53,21
Q 64,43 61,46 79,93 60,26 59,47 57,66 56,02 54,30 53,61 52,77 51,86 51,02 50,17
J 63,56 60,57 58,13 77,47 57,53 55,66 54,02 52,32 50,61 49,99 49,07 48,23 47,38
T 62,72 59,74 57,29 55,25 75,01 54,03 52,33 50,64 48,94 47,22 46,53 45,69 44,84
9 60,77 57,81 55,36 53,25 51,53 72,06 50,80 49,12 47,43 45,72 43,86 43,26 42,42
8 59,87 56,02 53,60 51,49 49,72 48,10 69,16 47,94 46,24 44,54 42,70 40,87 40,27
7 58,84 55,19 51,77 49,68 47,91 46,30 45,05 66,24 45,37 43,68 41,85 40,04 38,16
6 57,68 54,22 51,02 47,84 46,09 44,49 43,24 42,32 63,28 43,13 41,33 39,53 37,67
5 57,70 53,31 50,12 47,18 44,25 42,67 41,43 40,51 39,94 60,32 41,45 39,69 37,85
4 56,73 52,33 49,13 46,19 43,50 40,67 39,45 38,55 38,01 38,16 57,02 38,64 36,83
3 55,84 51,43 48,22 45,28 42,59 40,02 37,48 36,60 36,08 36,26 35,15 53,69 35,98
2 54,93 50,51 47,30 44,35 41,67 39,10 36,83 34,58 34,08 34,28 33,20 32,30 50,33

Variance HS (*10^(-2))
A K Q J T 9 8 7 6 5 4 3 2
A 1,12 6,16 5,98 5,84 5,75 5,57 5,59 5,65 5,73 5,98 6,12 6,22 6,29
K 5,95 1,33 7,27 7,11 7,00 6,72 6,57 6,59 6,63 6,70 6,82 6,91 7,00
Q 5,72 7,02 1,46 8,23 8,11 7,77 7,54 7,38 7,38 7,41 7,52 7,58 7,64
J 5,53 6,81 7,93 1,62 9,19 8,83 8,57 8,33 8,13 8,12 8,20 8,23 8,26
T 5,39 6,64 7,76 8,86 1,81 9,87 9,61 9,34 9,06 8,82 8,87 8,87 8,87
9 5,11 6,25 7,30 8,37 9,42 2,22 10,46 10,24 9,95 9,63 9,42 9,40 9,37
8 5,08 6,00 6,96 7,99 9,03 9,88 2,68 11,08 10,83 10,50 10,22 9,90 9,86
7 5,09 5,97 6,68 7,63 8,64 9,54 10,38 3,17 11,57 11,28 10,98 10,61 10,24
6 5,12 5,96 6,63 7,30 8,23 9,11 10,00 10,75 3,71 11,88 11,63 11,25 10,83
5 5,38 5,98 6,61 7,24 7,85 8,65 9,52 10,32 10,94 4,23 12,04 11,72 11,31
4 5,49 6,06 6,66 7,26 7,85 8,29 9,09 9,86 10,53 10,99 4,78 11,69 11,31
3 5,54 6,09 6,67 7,23 7,79 8,21 8,61 9,32 9,98 10,51 10,40 5,29 11,07
2 5,56 6,13 6,67 7,20 7,71 8,10 8,50 8,77 9,38 9,92 9,83 9,52 5,78

A graph is shown below. So what do you think?
How can I share the raw data with you?

Statistics: Posted by Sacré d'Jeu — Tue May 20, 2014 12:52 pm

Re: Adaptive Opponent Modeling

2014-05-19T10:11:39+00:00

I've struggled more with the indexing algorithm than expected, but it's working now. Eventually, I've basically translated the C-code into Java. I'll upload the code somewhere when this project is over.

So now I'm doing this:
- I iterate over every possible 5-card board (= 2598960 times).

calculating the rank for every possible hole (using spears2p2)
then I calculate the handstrength of every possible hole for the particular board
and save it with the corresponding index (0 - 168) in a text-file

- Then I go over the saved handstrengths and update the corresponding mean and variance, which results in the means and variances of every possible hole (169 in total)

I'm doing the first iteration now. At the current speed, it will take around a day to complete. The second iteration shouldn't take that long. I'll post the results when I have them.

Statistics: Posted by Sacré d'Jeu — Mon May 19, 2014 10:11 am

Re: Adaptive Opponent Modeling

2014-05-17T06:47:38+00:00

Having thought about this maybe k-means clustering will work ok. You might have to stretch one axis or other to get the right division between strength and variance. Post your data and I'll do some experiments

Statistics: Posted by spears — Sat May 17, 2014 6:47 am

Re: Adaptive Opponent Modeling

2014-05-16T13:22:23+00:00

spears wrote:

If you use a clustering algorithm you need to ensure that it gives the type of clustering you want. You need to choose bins that make a good distinction between different strengths. Equal frequency binning is exactly the wrong solution because it doesn't distinguish between strengths well. There is no expert knowledge to process hands postflop so you can't use that.

I think some ad hoc approach that is a 2 dimensional equivalent of equal width binning would probably work quite well. Divide the space into equal size rectangles in such a way that the number of rectangles with any content is the number of bins.

I don't understand the graph though. AA has a strength of 85.2% (against a uniform hand). Where is that on your graph?

Oh, crap. I've calculated the numbers on the rank (using your 7-hand evaluator spears2p2). I'll be back.
(Thanks for the reaction about bucketing, I'll look what the actual graphic looks like, and then see how I'll do the division.)

Statistics: Posted by Sacré d'Jeu — Fri May 16, 2014 1:22 pm

Re: Adaptive Opponent Modeling

2014-05-16T12:24:17+00:00

Statistics: Posted by spears — Fri May 16, 2014 12:24 pm

Re: Adaptive Opponent Modeling

2014-05-16T11:09:22+00:00

I've calculated handstrength mean and variance for every starting hand, the graphic of the results are shown below.

I should strategically divide these hands into buckets. I've been thinking about using a clustering algorithm, but maybe it's better to use 'expert knowledge'. What do you think?

I understand the paper (it wasn't easy ), but it's actually quite simple. I can reproduce the algorithm to create tables.
I'm coding in JAVA, I'm gonna try to make the indexing myself, cause it's a variant where I'm only indexing the board cards (XXX | X | X).

Thanks for all help!

Statistics: Posted by Sacré d'Jeu — Fri May 16, 2014 11:09 am

Re: Adaptive Opponent Modeling

2014-05-14T17:23:00+00:00

So many nut boards ?
Good thing to know, one could do an intermediary boolean table to mark them

Statistics: Posted by Pitt — Wed May 14, 2014 5:23 pm

Re: Adaptive Opponent Modeling

2014-05-14T17:06:39+00:00

You can reduce the river to 42769 by removing nut boards

Statistics: Posted by spears — Wed May 14, 2014 5:06 pm

Re: Adaptive Opponent Modeling

2014-05-14T16:58:06+00:00

Just about the hand isomorphism paper : it can easily provide an exact isomorphism.
As you have not much time for your work and it is already implemented, I recommend you to use it directly.

Exact indexes count for flop / turn / river with imperfect recall (on turn we don't know what card is the turn's one) :
[1 755, 16 432, 134 459]

the memory sizes for 81 bytes :
[142 155, 1 330 992, 10 891 179]

Given your numbers, I assume you use perfect recall, this can easily be done as well.

I don't know what programmation language you are using, but in C / C++ you can use the original github code : https://github.com/kdub0/hand-isomorphism
And if you need a Java wrapper, I could modify the one I published to make perfect recall board indexing available.

Statistics: Posted by Pitt — Wed May 14, 2014 4:58 pm

Re: Adaptive Opponent Modeling

2014-05-14T16:06:03+00:00

Sacré d'Jeu wrote:

I'm capable of calculating the values, but how should I store them in a LUT? I'm afraid I will use all my heap space...
The three tables I need, are size 142.155 (exact), 5M and 200M (!) (last two are rough estimations)...
I probably should use an similar index-system as in viewtopic.php?f=25&t=2660.

- I didn't understand that paper.
- Devise a scheme to represent the board isomorph. eg (Aa,Ab,3a) means (Ac,Ah,3c) and also (Ah,As,3h) and a way of translating from a real board to the isomorph
- If you are going to keep this all in memory, create a hash table with the isomorph as the key and the vector of hole card strengths as the value. 200MB isn't big these days
- If you are going to put it on disk create a hash table with the isomorph as the key and a number as the value. Use the number as an index in a random access file

Statistics: Posted by spears — Wed May 14, 2014 4:06 pm

Re: Adaptive Opponent Modeling

2014-05-14T15:41:01+00:00

spears wrote:

- You have a preflop bucket distribution.
- You can get a preflop hole card distribution from that.
- The flop comes
- You can calculate the strength of every hole ***
- Since you know the strength of every hole and you know the distribution, you can calculate the bucket distribution

*** This is computationally challenging. But you could keep lookup tables for each flop isomorph, on disk if necessary.

That's what I meant in the first place, I think.

So I need a LUT for every isomorph flop, turn and river that gives me the 81 values so I can calculate the new bucket distribution:
b0_new = x00 * b0_old + x10 * b0_old + x20 * b2_old + x30 * b3_old + ...
b1_new = x01 * b0_old + x11 * b0_old + x21 * b2_old + x31 * b3_old + ...

I'm capable of calculating the values, but how should I store them in a LUT? I'm afraid I will use all my heap space...
The three tables I need, are size 142.155 (exact), 5M and 200M (!) (last two are rough estimations)...

I probably should use an similar index-system as in http://www.poker-ai.org/phpbb/viewtopic.php?f=25&t=2660.

Statistics: Posted by Sacré d'Jeu — Wed May 14, 2014 3:41 pm

Re: Adaptive Opponent Modeling

2014-05-14T14:54:27+00:00

Sacré d'Jeu wrote:

spears wrote:

You definitely shouldn't be dealing with all those post flop isomorphisms. I think it should be possible to formulate the problem to make these large numbers go away. I'm thinking something like this:
- From pre-flop betting you know a bucket distribution for villain.
- You can translate that into a hole card distribution
- Then the flop comes
- You can update villain's bucket distribution
- Villain acts
- Update villain's bucket distribution
- .....

Yes, that's what I had in mind, but how would you suggest I do the highlighted updates then?

Statistics: Posted by spears — Wed May 14, 2014 2:54 pm

Re: Adaptive Opponent Modeling

2014-05-14T14:38:39+00:00

Sacré d'Jeu wrote:

spears wrote:

Sacré d'Jeu wrote:

I learn the opponent-specific distribution P (a | hc Є b, I ) based on the action in hands where the hole cards are shown.

I think that is biased. That's why I suggested simulation.

You are absolutely right. I understand that some people assume that, if he folds, he has a bad hand and also learn on these actions. Maybe I'll do that.

We might writing at cross purposes here. I think that calculating a strategy from showdown hands will not be accurate because showdown hands are an unrepresentative sample.

When villain mucks, you can conclude that his hand is weaker than your shown hand. So you can figure out the cards he cannot have, and then update your estimates of the cards he had earlier in the hand.

It's a reasonable to assume that villain only folds weak hands, but how do you make practical use of the assumption? For example does he fold 100% of the weakest 20% or 50% of the weakest 40%?

Statistics: Posted by spears — Wed May 14, 2014 2:38 pm

Re: Adaptive Opponent Modeling

2014-05-14T14:14:17+00:00

spears wrote:

That is a great idea Thank you, sir!

spears wrote:

Yes, that's what I had in mind, but how would you suggest I do the highlighted updates then?

When new community cards are shown, the average and variance of strength change for all hole cards. Eg. for a dry board, a drawing hand will fall back in strength and variance. In an ideal situation, I should be able to update the bucket distribution according the amount of hole cards that switch from one bucket to another, that is, I should know from which old buckets the hole cards in the new bucket come: b_new = 2% b1 + 6% b2 + 30% b3 + ...
And this transition is different for every type of flop.

Statistics: Posted by Sacré d'Jeu — Wed May 14, 2014 2:14 pm

Re: Adaptive Opponent Modeling

2014-05-14T13:41:53+00:00

You are right not to use EHS. I would suggest using two parameters: average and variance of strength at showdown. A drawing hand will have a large variance, a made hand will have a low variance. I don't like current hand strength because it means nothing as it is never seen at showdown.

You definitely shouldn't be dealing with all those post flop isomorphisms. I think it should be possible to formulate the problem to make these large numbers go away. I'm thinking something like this:
- From pre-flop betting you know a bucket distribution for villain.
- You can translate that into a hole card distribution
- Then the flop comes
- You can update villain's bucket distribution
- Villain acts
- Update villain's bucket distribution
- .....
- Then the turn comes
- Update villain's bucket distribution
- ....

Statistics: Posted by spears — Wed May 14, 2014 1:41 pm

Re: Adaptive Opponent Modeling

2014-05-14T12:56:20+00:00

I'll give it some thought, and I'll come back on it later.

In the meantime, I have some questions about the bucketing and transition functions:

For a possible hole cards, I would calculate the current HS and the Hand Potential seperately (in contrast to EHS). I do this because a hand with great potential might be played differently as an already strong hand with less potential.

For Hand Potential, I thought about using HP = Ppot - Pneg. Is there a better way for this?

Every possible transition exists of 81 weight values (9 * 9 buckets). I've calculated that their are 1755 isomorphic flops possible, so this makes 81*1755 = 142155 values, just for transition at the flop. For the turn and especially the river, the number is even higher.

Could you help me how I should store all this information? I'm not looking for the fastest approach, a workable one will be fine.
Are there other botters, using a similar approach/ has this be don before?

Statistics: Posted by Sacré d'Jeu — Wed May 14, 2014 12:56 pm

Re: Adaptive Opponent Modeling

2014-05-14T10:42:39+00:00

A simulation using a known strategy S will give you observations O (action frequencies and showdown strengths) and hence probability p(O|S). I think you should be able to turn that round to give p(S|O) using Bayes rule and some other reasonable assumptions. Sorry this is so vague: I don't have a lot of time and my maths is very poor.

I think you should be able to use Bayes rule for the update weight too. The default strategy is a "prior"

Statistics: Posted by spears — Wed May 14, 2014 10:42 am

Re: Adaptive Opponent Modeling

2014-05-14T09:24:43+00:00

spears wrote:

Sacré d'Jeu wrote:

I learn the opponent-specific distribution P (a | hc Є b, I ) based on the action in hands where the hole cards are shown.

I think that is biased. That's why I suggested simulation.

You are absolutely right. I understand that some people assume that, if he folds, he has a bad hand and also learn on these actions. Maybe I'll do that.

Can you elaborate on your suggestion? I don't understand how you would do this.

spears wrote:

Sacré d'Jeu wrote:

7) If a real showdown happens, I update the learned opponent model P (a | hc Є b, I ), based on the actions in this hand.

How do you calculate the weighting of the update?

I haven't decised the algorithm/model I'll use. I'm thinking to use a decision tree or an neural netwerk.
The weighting of the instances is also still up in the air.

Statistics: Posted by Sacré d'Jeu — Wed May 14, 2014 9:24 am

Re: Adaptive Opponent Modeling

2014-05-13T20:54:08+00:00

I think the general principle is pretty good. You might need a few more buckets though.

Sacré d'Jeu wrote:

I learn the opponent-specific distribution P (a | hc Є b, I ) based on the action in hands where the hole cards are shown.

I think that is biased. That's why I suggested simulation.

Sacré d'Jeu wrote:

7) If a real showdown happens, I update the learned opponent model P (a | hc Є b, I ), based on the actions in this hand.

How do you calculate the weighting of the update?

Statistics: Posted by spears — Tue May 13, 2014 8:54 pm

Re: Adaptive Opponent Modeling

2014-05-13T20:36:57+00:00

Ok, so here's my general plan for the opponent model. I encourage you to find and point me any flaws, errors, etc. in it:
'I' stands for an information set of the game state, see more below.
I learn the opponent-specific distribution P (a | hc Є b, I ) based on the action in hands where the hole cards are shown.

For each hand:
1) I divide all possible starting hands into 9 buckets based on HS and HP (eg, one bucket with low HS and medium HP).
The initial chance that the opponent has a hand from a certain bucket P(hc Є b | I ) is in relation to the number of hands in the bucket.

2) When MCTS asks for the probability of an action a of the opponent, I calculate it:
P( a | I ) = Σ P ( hc Є b | I ) * P (a | hc Є b, I ) over all buckets

3) When the opponent makes an action a, I update the belief distribution over the buckets:
P(hc Є b | a, I) = P(hc Є b | I ) * P(a | hc Є b, I ) / P ( a | I)

4) When a new round starts and new community cards are revealed, I update the belief distribution accordingly with predefined transition function dependent on the community cards, to 9 new buckets, again based on HS and HP

5) When we come at the river, we transform the buckets into 6 buckets, only based on HS

6) When the EV of a showdown is needed in MCTS, I calculate it as follows (eg. with 2 opponents at the showdown):
EV = Σ EV(B) * P1 (hc Є b1 | I ) * P2 (hc Є b2 | I ) for all possible combinations of B = {b1, b2}
with EV(B): I win if my HS is higher than bucket, lose if it's lower. If my HS lies in the bucket, I'll try to calculate how much % I lose, draw and win. (Maybe for now, I'll just count it as a draw).

7) If a real showdown happens, I update the learned opponent model P (a | hc Є b, I ), based on the actions in this hand.

8) To start, I will learn a general model, based on many hands from many different players.

The only thing left is how my abstracted information set would look. I'm still thinking about that.

Statistics: Posted by Sacré d'Jeu — Tue May 13, 2014 8:36 pm

Re: Adaptive Opponent Modeling

2014-05-06T15:06:38+00:00

I've edited my post above.

At some stage you do have to take account of the fact that villain will adapt. If you don't your strategy will be a best response and it will be too predictable. So you have at least three ways of dealing with this:
- Continually adapt your play to the latest information about his play. The problem with this is that you can't adapt quickly enough.
- Predict how he will adapt to you. This is hard. But I guess possible, just.
- Play a strategy that exploits him without being too exploitable. This is also hard too, but maybe http://poker.cs.ualberta.ca/publication ... -rnash.pdf will provide some ideas.

Statistics: Posted by spears — Tue May 06, 2014 3:06 pm

Re: Adaptive Opponent Modeling

2014-05-06T14:29:07+00:00

Yes, you are right, the assumption is wrong, but the reasoning is the same, if you think of f() as a function that gives probability P( a | c, S, L).

The important conclusion is that it seems that I only need one model to give me both probabilities.

Statistics: Posted by Sacré d'Jeu — Tue May 06, 2014 2:29 pm

Re: Adaptive Opponent Modeling

2014-05-06T13:13:54+00:00

I don't think the deterministic assumption - player will act the same way in the same situation - is a good one. Bluffs and slowplays really only work because they aren't deterministic. But you can still use Bayes rule to determine the chance of a player taking an action or holding certain cards. (I haven't checked your maths)

You will need to use bucketing on hand strength to make the problem tractable. And you need to take account of hand potential or variance too. You will not be able to distinguish a particular player's preference for individual hands this way. C'est la vie. A made hand and a draw of equal strength will be played differently throughout the hand because a missed draw will usually be folded on the river.

You can observe how different strategies play against one another by simulation. Then use Bayes rule to determine the strategy from observations and observations of many games in the wild. (The details of this bit are a bit hazy at the moment, but I'm fairly sure it's possible) The best you can ever achieve will be a probability of a strategy. It might be useful to read http://www.poker-ai.org/archive/www.pok ... =64&t=4037 too

Statistics: Posted by spears — Tue May 06, 2014 1:13 pm

Re: Adaptive Opponent Modeling

2014-05-06T15:40:45+00:00

Thanks, really appriciate the help.
I'll let the adaptive thing go for now, and concentrate on an opponent model for the no-limit multiplayer variant. So this is what I've come to so far (I've thought of all this myself, so it's very likely their are (big) faults in it. Do not hesitate to point me straight!):

THE PROBLEM
For every opponent, I need a model for two probabilities: the opponent's cards and the opponent's actions. Useful information about these probs is hidden in previous actions of the opponent, as these actions are (or rather can be) based on (1) his hole cards c, (2) the public gamestate S_i (all previous actions, community cards, stacksizes, ...) and (3) the type of players he (or she, let not discriminate ) faces.

note: S_i is the gamestate right up to the moment the opponent needs to make action a_i

To summarize, I need

P(c | S_i, L) with L the collection of opponent player types
P(a_i| | S_i, L) = sum over all hole cards of {P(c | S_i, L) * f(c, S_i, L)}

where f() is the opponent-specific function (read: strategy) that gives the action, given the information summed up above.

ASSUMPTION 1 - the opponents strategy is deterministic: when faced exactly the same situation, he will make exactly the same action. Wrong assumption: f(c, S_i, L) has to change in P(a_i | c, S_i, L)

If I'm correct (which almost never happens), we can calculate (Bayes' rule):
P(c | S_i-1, a_i-1, L) = P( a_i-1 | c, S_i-1, L) * P ( c | S_i-1, L) / P( a_i-1 | S_i-1, L)

P( a_i-1 | c, S_i-1, L) is the output of the model
P ( c | S_i-1, L) we have calculated before
P( a_i-1 | S_i-1, L) is also calculated before, also, it's just normalisation constant and it can be omitted

As the actions of the other players do not tell us anything about the hole cards, so P(c | S_i-1, a_i-1, L) = P(c | S_i, L) if we stay in the same round. If new community cards are present, we can simply adjust the probabilities by eliminating the hole cards that aren't possible anymore.*

If what's stated above is correct, the problem should come down to finding (a good approximation of) f(c, S_i, L). Is this correct?

STEP 1: simplify the problem with more assumptions, abstractions, ...
To see what simplifications are possible, let's go over each set and see how they influence the opponent's actions:

Hole cards c: There are 1225 (50*49/2) possible hole cards for every opponent. Before the flop (as everyone here should know, see security question), we can abstract these to 169 different situations, after the flop there is no abstraction possible (most of the time), due to the importance of suits.

Many bots use bucketing, where hole cards are taken together based on their handstrength. Might take this approach if it turns out to be necessary, but I have not looked into this yet.
Actions a: In no-limit, the number of different raise-amounts are a problem. Again, bucketing should be a viable option.
Player types L: This influences the meaning of a player's actions. For example, you will react differently to a call of a loose passive player than to a call of a thight agressive player (or you should ).

I have not seen this in any current opponent model. It implies that you assume your opponent is adaptive. For now, I'll ignore this too. Later, I might look into this and categorize every player in tight/loose and passive/agressive.
Gamestate S: the big one. I will discuss this in the next reply.

Other remarks:
* different opponents can not have the same hole cards, so you could take this into consideration, but I think this would only help if the reads were very strong. Just to say: it's not for any time soon.

Statistics: Posted by Sacré d'Jeu — Tue May 06, 2014 12:14 pm

Re: Adaptive Opponent Modeling

2014-05-05T19:21:00+00:00

http://poker-ai.org/archive/pokerai.org ... 634&hilit=
http://www.poker-ai.org/archive/pokerai ... &sk=t&sd=a

Statistics: Posted by spears — Mon May 05, 2014 7:21 pm

Re: Adaptive Opponent Modeling

2014-05-05T19:13:49+00:00

I'll hopefully be able to upload it here soon, although there are similar ones to mine already in the papers section. A search for MCTS Poker & Poki Poker Paper should get you good results

Sounds like a viable solution, although a month is short amount of time for this project - make sure you know exactly what you want to do first and then implement. Ask questions here along the way and we can try help out - good luck!

Statistics: Posted by ibot — Mon May 05, 2014 7:13 pm

Re: Adaptive Opponent Modeling

2014-05-03T14:05:35+00:00

Interesting idea for the model. Can I somewhere read more about your project, or was it a private project?

I'll start with implementing my idea with a 'safe model' and an 'opponent model'. I find it natural that the more confidence I get that I can predict my opponent, the more risk I take to exploit him, and vice versa. It has the advantage that it's somewhat independent of the used models, so work would not be lost if I change/alter the adaptive model. Or am I wrong/is this a bad idea?

About the time-frame: the project should be finished in about a month, so there is not much time to 'linger around'. I can work on it full-time though, so progress should be made quickly. I've already done much of the 'getting familiar with the code' and the problem. It's time to test/find possible solutions. .

Statistics: Posted by Sacré d'Jeu — Sat May 03, 2014 2:05 pm

Re: Adaptive Opponent Modeling

2014-05-02T21:15:56+00:00

I've recently finished by dissertation on a very similar topic, essentially what you're aiming for but without the adaptive modelling. To combat the lack of data, I recommend clustering on existing data and then building opponent models for strategies that you can discover within the data. Then assign a player to one of these strategies until you have enough data on them.

Another problem arises in the sense of what is 'enough' data. I played for 20k hands in simulations, and even then you can have <1000 hands of data on the river for opponents. A possible solution to this is to iteratively merge both the unique data and the clustered data, e.g. copy the closest cluster of data for a player, then every 500 hands replace 500 hands of cluster data with the unique data, thus gradually building a more accurate and unique opponent model while combating the lack of data problem.

However, here the assumption is that the player is playing to a single strategy. There are ways I can think of to make it adaptive however..

How long do you have for your project? It does sound interesting, and was what I was initially aiming to do my project on

Statistics: Posted by ibot — Fri May 02, 2014 9:15 pm