mlatinjo wrote:
Hi thank you for sharing your idea.
1) Can you please tell me why did you decide to use naive Bayes classifier for solving such a complex game?
Did you consider using random forests, deep neural networks?
I took an academic approach to the problem. In academic research the goal is to solve the problem one small piece at a time. This way you minimize the amount of sunk cost if the research fails. The point of the Naive Bayes was for me to figure out at an early stage with the UAlberta data set and a simple machine learning approach could I differentiate signal from noise. Once that was confirmed I proceeded to better more expensive (in time and computational resources) techniques.
mlatinjo wrote:
2) As i understood you only take into consideration board texture, position and previous actions. In poker it is also very important to incorporate stack sizes of players
and opponent types. Fish is going to play totally different strategy than regular or tight player so if your model learns from hand histories that
professional player cbets on e.g. As 9d 8s with Ac2c very big bet size, it is because villain is fish and will call wide range, but vs tight player it would be quite wrong to bet
big.
Yes that's correct. See above. My more recent code probably does this.
mlatinjo wrote:
3) I think that in poker it is very important to adjust strategy to specific player meaning that AI should notice if a player changes strategy during game, as i can see your AI is
not able to do it, it is modelling average population player and playing the same strategy vs each opponent?
This is much further than I got to. My goal was to transform a problem we don't know how to solve (n-player imperfect information extensive form game) to one we know how to solve with Deep Learning + MCTS (2-player perfect information game). See AlphaGo, AlphaGo Zero, AlphaGo Master, AlphaGo Lee. If I could do the above then I'm sure the exact same techniques used by DeepMind would work.
mlatinjo wrote:
4) Did you already test your AI against real opponents? How fast does it make decisions preflop and postflop?
My goal was to solve the game in a lab setting. In a lab setting I reached good convergence in preflop play. I was working on postflop play and beyond before other responsibilities took over my schedule.
mlatinjo wrote:
5) did you think about using equity vs villain range in your model? If you predict villain range, you could by using equity information make much more accurate decisions and be able to generalize better. You could then teach your model how to play with high equty hand, how to play with low etc.
I think my approach has two different components which might help answer this question.
1: Transform Poker from a imperfect information game to a "perfect" information game. By "perfect" in quotes I mean that many information sets are created which encapsulate the table state in an intelligent way. I consider this "intelligent" since 9 player poker can be thought to have something like 10^300 information sets. Even the largest dataset for 9-player poker has say 10^11 hands. This isn't to mention that those hands are at different sb/bb level which makes them not IID. So the problem I was trying to solve was with such an incredibly tiny data set compared to the number of information sets... how do I bucket these states so that each bucket has enough hands to be able to learn from (generalize in ML terms).
If I could do this then what remains to be done is to transform from 9-player to 2-player. This I think is considerably easier than the previous problem. This is because we can always consider a k-player game as 1 vs all for each of the k players. That is any gain for any player that is not you is a loss for you. Same should apply in reverse, that is any gain for you is a loss for all other players. There are some gotchas here that I'll get into later. The transformation from 9-player to 2-player is not perfect with this approach but I found some approaches that deals with this in an "okay" way. In essence this simple transformation leads to extremely greedy play on behalf of all the k-players... this I solved by regularizing the risk appetite for each player and that seemed to do the trick.
2: Given (1) above I can apply time tested policy network, value network, MCTS techniques to do the heavy lifting.
_________________
Are you sure you're not looking for
Donald B Gillies?