Thanks, really appriciate the help.
I'll let the adaptive thing go for now, and concentrate on an opponent model for the no-limit multiplayer variant. So this is what I've come to so far (I've thought of all this myself, so it's very likely their are (big) faults in it. Do not hesitate to point me straight!):
THE PROBLEMFor every opponent, I need a model for two probabilities: the opponent's cards and the opponent's actions. Useful information about these probs is hidden in previous actions of the opponent, as these actions are (or rather can be) based on (1) his hole cards c, (2) the public gamestate S_i (all previous actions, community cards, stacksizes, ...) and (3) the type of players he (or she, let not discriminate
) faces.
note: S_i is the gamestate right up to the moment the opponent needs to make action a_iTo summarize, I need
- P(c | S_i, L) with L the collection of opponent player types
- P(a_i| | S_i, L) = sum over all hole cards of {P(c | S_i, L) * f(c, S_i, L)}
where f() is the opponent-specific function (read: strategy) that gives the action, given the information summed up above.
ASSUMPTION 1 - the opponents strategy is deterministic: when faced exactly the same situation, he will make exactly the same action. Wrong assumption: f(c, S_i, L) has to change in P(a_i | c, S_i, L)If I'm correct (which almost never happens), we can calculate (Bayes' rule):
P(c | S_i-1, a_i-1, L) = P( a_i-1 | c, S_i-1, L) * P ( c | S_i-1, L) / P( a_i-1 | S_i-1, L)
- P( a_i-1 | c, S_i-1, L) is the output of the model
- P ( c | S_i-1, L) we have calculated before
- P( a_i-1 | S_i-1, L) is also calculated before, also, it's just normalisation constant and it can be omitted
As the actions of the other players do not tell us anything about the hole cards, so P(c | S_i-1, a_i-1, L) = P(c | S_i, L) if we stay in the same round. If new community cards are present, we can simply adjust the probabilities by eliminating the hole cards that aren't possible anymore.*
If what's stated above is correct, the problem should come down to finding (a good approximation of) f(c, S_i, L). Is this correct?
STEP 1: simplify the problem with more assumptions, abstractions, ...To see what simplifications are possible, let's go over each set and see how they influence the opponent's actions:
- Hole cards c: There are 1225 (50*49/2) possible hole cards for every opponent. Before the flop (as everyone here should know, see security question), we can abstract these to 169 different situations, after the flop there is no abstraction possible (most of the time), due to the importance of suits.
Many bots use bucketing, where hole cards are taken together based on their handstrength. Might take this approach if it turns out to be necessary, but I have not looked into this yet. - Actions a: In no-limit, the number of different raise-amounts are a problem. Again, bucketing should be a viable option.
- Player types L: This influences the meaning of a player's actions. For example, you will react differently to a call of a loose passive player than to a call of a thight agressive player (or you should ).
I have not seen this in any current opponent model. It implies that you assume your opponent is adaptive. For now, I'll ignore this too. Later, I might look into this and categorize every player in tight/loose and passive/agressive. - Gamestate S: the big one. I will discuss this in the next reply.
Other remarks:
* different opponents can not have the same hole cards, so you could take this into consideration, but I think this would only help if the reads were very strong. Just to say: it's not for any time soon.