Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Sun Nov 18, 2018 12:03 pm

All times are UTC




Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Sun Sep 02, 2018 1:42 pm 
Offline
New Member

Joined: Sun Sep 02, 2018 1:13 pm
Posts: 6
Over the next few days I'll release my poker ai approach using Machine Learning. I think this approach is very good. I no longer have the time to work on this stuff, but I'm happy to release my code, research, and thoughts about what a n-player No Limit Texas Hold'em AI should look and work like.

The release will contain source code and how to get started. There's a lot of source code involved, but it's definitely worth taking a look. In this post I'll release some of my academic thoughts regarding opponent hand prediction which is probably the most important part of a poker ai. I also have lots of code on the decision making involved in playing poker using this information. I'll release this soon and continue to support evaluating and building upon this work for six months.

Please do take the time to read the pdf document I've posted and evaluate whether or not my approach deserves merit. If you think it does, stay tuned, the source code and how to get started is coming soon.

The document isn't formatted in the best way, for this I apologize.

Everything will be provided "as-is". Everything will be under public domain.

Here's the pdf document: https://mega.nz/#!DSJ2zQ5B!KEd7m_BmupIg ... ErQZuzen-w


Last edited by donaldagillies on Mon Sep 03, 2018 4:24 pm, edited 2 times in total.

Top
 Profile  
 
PostPosted: Sun Sep 02, 2018 1:50 pm 
Offline
New Member

Joined: Sun Sep 02, 2018 1:13 pm
Posts: 6
Update 1 (Sep 03, 2018 04:33PM Forum Time)
At the bottom you'll find links to source code dump which contains most of the important code. To the best of my knowledge any other code not written by me that I've included has a CopyLeft license and is licensed under those terms. All of my code is (as promised) in the public domain.

Let's talk hardware requirements. At a minimum since we're doing deep learning you're going to need two powerful GPUs (Think 1080Ti or higher). You'll need a motherboard and CPUs capable of working with server workloads. I recommend a Xeon based workstation with two cpus and somewhere on the order of 16-32 cores. This will be needed when we get into reinforcement learning for training the playing strategy. Memory requirements... I recommend 256GB+ of SECDED memory. Again, standard workstation stuff. Hard Disk requirements, I recommend RAID1 or higher with weekly backups of the code. For storing all the datasets you'll need about 5TB. With RAID1 setup that would mean 2 5TB disks. So in total about 3000USD for the hardware requirements.

Let's talk people requirements. I'd recommend a talented, tight knit team of 3-4 people. In total you'll need to have good skills in probability theory, combinatorics, topology, information theory, statistics, deep learning, systems programming, low level programming, computer system organization and probably more. I'm sure I'm forgetting something.

Tools used: C#, Java SE, NetBeans, .NET, TensorFlow, Keras, Theano. I also used a bunch of other tools while tinkering towards the most ideal setup. I did my development on Windows since I wanted to leverage .NET.

You probably want to buy your own hardware since cloud fees over the development lifetime will be prohibitively expensive. I'd recommend sharing the workstation over the team since it's an expensive workstation.

With an ideal setup and the code I've given, a talented dedicated team working 15-20 hours a week can have a very good bot working in I think 6-12 months.

Datasets: The datasets I used I got from poker-ai.org. The UAlberta dataset isn't good enough for this work. See viewtopic.php?f=34&t=2728 and viewtopic.php?f=34&t=2883

I'll be posting updates explaining the code over the next month or two. No set timeline on this as I have things to do as well. I'll try to post once a week. Please feel free to start reading through the code and again judge the credibility of my work. If you think it's worth investing it then invest in it. Be your own judge of whether this approach is actually good or not.

https://mega.nz/#!WTRHxCza!N2Gem8m_Zajz ... Wql6DU-Fts
https://mega.nz/#!DHownKyT!LJTlMeXcZwQM ... xAymsv_Ehk

_________________
Are you sure you're not looking for Donald B Gillies?


Last edited by donaldagillies on Mon Sep 03, 2018 4:49 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Sun Sep 02, 2018 6:56 pm 
Offline
Regular Member

Joined: Fri Nov 25, 2016 10:42 pm
Posts: 95
Hi thank you for sharing your idea.

1) Can you please tell me why did you decide to use naive Bayes classifier for solving such a complex game?
Did you consider using random forests, deep neural networks?

2) As i understood you only take into consideration board texture, position and previous actions. In poker it is also very important to incorporate stack sizes of players
and opponent types. Fish is going to play totally different strategy than regular or tight player so if your model learns from hand histories that
professional player cbets on e.g. As 9d 8s with Ac2c very big bet size, it is because villain is fish and will call wide range, but vs tight player it would be quite wrong to bet
big.
3) I think that in poker it is very important to adjust strategy to specific player meaning that AI should notice if a player changes strategy during game, as i can see your AI is
not able to do it, it is modelling average population player and playing the same strategy vs each opponent?
4) Did you already test your AI against real opponents? How fast does it make decisions preflop and postflop?

5) did you think about using equity vs villain range in your model? If you predict villain range, you could by using equity information make much more accurate decisions and be able to generalize better. You could then teach your model how to play with high equty hand, how to play with low etc.


Top
 Profile  
 
PostPosted: Mon Sep 03, 2018 5:20 pm 
Offline
New Member

Joined: Sun Sep 02, 2018 1:13 pm
Posts: 6
mlatinjo wrote:
Hi thank you for sharing your idea.

1) Can you please tell me why did you decide to use naive Bayes classifier for solving such a complex game?
Did you consider using random forests, deep neural networks?



I took an academic approach to the problem. In academic research the goal is to solve the problem one small piece at a time. This way you minimize the amount of sunk cost if the research fails. The point of the Naive Bayes was for me to figure out at an early stage with the UAlberta data set and a simple machine learning approach could I differentiate signal from noise. Once that was confirmed I proceeded to better more expensive (in time and computational resources) techniques.

mlatinjo wrote:

2) As i understood you only take into consideration board texture, position and previous actions. In poker it is also very important to incorporate stack sizes of players
and opponent types. Fish is going to play totally different strategy than regular or tight player so if your model learns from hand histories that
professional player cbets on e.g. As 9d 8s with Ac2c very big bet size, it is because villain is fish and will call wide range, but vs tight player it would be quite wrong to bet
big.



Yes that's correct. See above. My more recent code probably does this.

mlatinjo wrote:

3) I think that in poker it is very important to adjust strategy to specific player meaning that AI should notice if a player changes strategy during game, as i can see your AI is
not able to do it, it is modelling average population player and playing the same strategy vs each opponent?



This is much further than I got to. My goal was to transform a problem we don't know how to solve (n-player imperfect information extensive form game) to one we know how to solve with Deep Learning + MCTS (2-player perfect information game). See AlphaGo, AlphaGo Zero, AlphaGo Master, AlphaGo Lee. If I could do the above then I'm sure the exact same techniques used by DeepMind would work.

mlatinjo wrote:

4) Did you already test your AI against real opponents? How fast does it make decisions preflop and postflop?



My goal was to solve the game in a lab setting. In a lab setting I reached good convergence in preflop play. I was working on postflop play and beyond before other responsibilities took over my schedule.

mlatinjo wrote:

5) did you think about using equity vs villain range in your model? If you predict villain range, you could by using equity information make much more accurate decisions and be able to generalize better. You could then teach your model how to play with high equty hand, how to play with low etc.



I think my approach has two different components which might help answer this question.

1: Transform Poker from a imperfect information game to a "perfect" information game. By "perfect" in quotes I mean that many information sets are created which encapsulate the table state in an intelligent way. I consider this "intelligent" since 9 player poker can be thought to have something like 10^300 information sets. Even the largest dataset for 9-player poker has say 10^11 hands. This isn't to mention that those hands are at different sb/bb level which makes them not IID. So the problem I was trying to solve was with such an incredibly tiny data set compared to the number of information sets... how do I bucket these states so that each bucket has enough hands to be able to learn from (generalize in ML terms).

If I could do this then what remains to be done is to transform from 9-player to 2-player. This I think is considerably easier than the previous problem. This is because we can always consider a k-player game as 1 vs all for each of the k players. That is any gain for any player that is not you is a loss for you. Same should apply in reverse, that is any gain for you is a loss for all other players. There are some gotchas here that I'll get into later. The transformation from 9-player to 2-player is not perfect with this approach but I found some approaches that deals with this in an "okay" way. In essence this simple transformation leads to extremely greedy play on behalf of all the k-players... this I solved by regularizing the risk appetite for each player and that seemed to do the trick.

2: Given (1) above I can apply time tested policy network, value network, MCTS techniques to do the heavy lifting.

_________________
Are you sure you're not looking for Donald B Gillies?


Top
 Profile  
 
PostPosted: Mon Sep 03, 2018 8:50 pm 
Offline
Regular Member

Joined: Fri Nov 25, 2016 10:42 pm
Posts: 95
Thanks for your detailed answer.
What do you think, would imitation learning be suitable for poker? given large dataset of how professional players play and trying to imitate it?
Also could model be updated while playing actually game, when hand is lost that gives negative reward to the agent when won gives positive, so that way
it might be able to learn what lines work good vs opponent while playing a game.


Top
 Profile  
 
PostPosted: Mon Sep 03, 2018 9:02 pm 
Offline
Regular Member

Joined: Fri Nov 25, 2016 10:42 pm
Posts: 95
I think that deep neural networks would be currently the best approach for learning to play default strategy which works well vs population.
Considering that such AI wouldn't play GTO but would learn how to play good vs population, it would have quite some space for being exploited. E.g.
poker professional could be cbetting flop very bluff heavy because it is very profitable, but it will notice easily if someone is exploiting that with bluff raises. AI
wouldn't be able to notice it and would be brutally exploited. In that case AI developer would have to take into consideration specific opponents (player name)
and should retrain model for the new hand histories. that way AI should be adjusted to not anymore cbet bluff heavy vs specific opponent that bluff raises a lot.

Due to that reason i think that deep neural networks are very limited, so there is a need for a model which is able to be updated while playing and learn from low samples.
Do you know what machine learning models would be appropriate for such solution (that agent is able to learn from low sample of hands and adjust while playing) ?

By the way i understand that you started from simple models and plan to make it more complex, that makes totally sense.


Top
 Profile  
 
PostPosted: Tue Sep 04, 2018 10:25 am 
Offline
New Member

Joined: Sun Sep 02, 2018 1:13 pm
Posts: 6
mlatinjo wrote:
Thanks for your detailed answer.
What do you think, would imitation learning be suitable for poker? given large dataset of how professional players play and trying to imitate it?
Also could model be updated while playing actually game, when hand is lost that gives negative reward to the agent when won gives positive, so that way
it might be able to learn what lines work good vs opponent while playing a game.


I think that since poker is a game, techniques which work on games work on poker as well. There is nothing special about poker whether it be No Limit Texas Hold'em or Pot Limit Omaha or anything else.

In terms of what is the best way to solve poker, the best way is usually to transform poker into a game that we know how to solve. See Cook-Levin theorem and a list of problems that have been shown to be NP-Complete by reducing to SATisfiability.

Taking the above analogy exactly, we have techniques which work for game(s) of intractable size (thanks Deepmind!). Then the only thing that remains is to start reducing every game we know to something that fits in AlphaX architecture.

_________________
Are you sure you're not looking for Donald B Gillies?


Top
 Profile  
 
PostPosted: Tue Sep 04, 2018 10:32 am 
Offline
New Member

Joined: Sun Sep 02, 2018 1:13 pm
Posts: 6
mlatinjo wrote:
I think that deep neural networks would be currently the best approach for learning to play default strategy which works well vs population.
Considering that such AI wouldn't play GTO but would learn how to play good vs population, it would have quite some space for being exploited. E.g.
poker professional could be cbetting flop very bluff heavy because it is very profitable, but it will notice easily if someone is exploiting that with bluff raises. AI
wouldn't be able to notice it and would be brutally exploited. In that case AI developer would have to take into consideration specific opponents (player name)
and should retrain model for the new hand histories. that way AI should be adjusted to not anymore cbet bluff heavy vs specific opponent that bluff raises a lot.

Due to that reason i think that deep neural networks are very limited, so there is a need for a model which is able to be updated while playing and learn from low samples.



In a honestly sincere manner, the deep neural network is only one part of the solution. MCTS is another. A third part would be a population of player strategies that are evolved during the learning process using Genetic Algorithms. So when a hypothetical bot would sit down to play at a poker table... it would first pick a standard very good player strategy. And after say 20-30 hands when it knows the behavior patterns of the players at the table it will pick a player strategy that has historically performed well against the patterns of behavior exhibited at this poker table.

mlatinjo wrote:
Do you know what machine learning models would be appropriate for such solution (that agent is able to learn from low sample of hands and adjust while playing) ?

By the way i understand that you started from simple models and plan to make it more complex, that makes totally sense.


I prefer not to have learning being done while playing at a poker table. The reason being is that this computationally complex and difficult to do. It's always better to precompute everything. See Lookup tables.

Thank you for the small compliment.

_________________
Are you sure you're not looking for Donald B Gillies?


Top
 Profile  
 
PostPosted: Tue Sep 04, 2018 1:19 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 529
Quote:
Taking the above analogy exactly, we have techniques which work for game(s) of intractable size (thanks Deepmind!). Then the only thing that remains is to start reducing every game we know to something that fits in AlphaX architecture.


No. The AlphaX architecture is only suitable for full information games, not partial information games like poker. If you were to apply full information techniques to a partial information game you would get a best response. That best response would be highly exploitable. Villain would change his strategy and destroy you.

I actually did this for my very first bot. It learned a best response against the Poker Academy bot using reinforcement learning over the course of a few weeks, eventually winning. Then I played it myself. It was insanely aggressive and it beat me for the first hour. Then I learned how to play it and destroyed it.


Top
 Profile  
 
PostPosted: Tue Sep 04, 2018 3:46 pm 
Offline
New Member

Joined: Sun Sep 02, 2018 1:13 pm
Posts: 6
spears wrote:
Quote:
Taking the above analogy exactly, we have techniques which work for game(s) of intractable size (thanks Deepmind!). Then the only thing that remains is to start reducing every game we know to something that fits in AlphaX architecture.


No. The AlphaX architecture is only suitable for full information games, not partial information games like poker. If you were to apply full information techniques to a partial information game you would get a best response. That best response would be highly exploitable. Villain would change his strategy and destroy you.

I actually did this for my very first bot. It learned a best response against the Poker Academy bot using reinforcement learning over the course of a few weeks, eventually winning. Then I played it myself. It was insanely aggressive and it beat me for the first hour. Then I learned how to play it and destroyed it.


A response from the site admin in 5 posts... is that a record?

In all sincerity, I'll rewrite something you might have missed:

A third part would be a population of player strategies that are evolved during the learning process using Genetic Algorithms. So when a hypothetical bot would sit down to play at a poker table... it would first pick a standard very good player strategy. And after say 20-30 hands when it knows the behavior patterns of the players at the table it will pick a player strategy that has historically performed well against the patterns of behavior exhibited at this poker table.

I imagine that it would be a good idea to reevaluate the picked player strategy every 50-100 hands or so in case the behavior exhibited at the poker table changes over time (which it will).

_________________
Are you sure you're not looking for Donald B Gillies?


Top
 Profile  
 
PostPosted: Tue Sep 04, 2018 6:18 pm 
Offline
Regular Member

Joined: Fri Nov 25, 2016 10:42 pm
Posts: 95
Quote:
In a honestly sincere manner, the deep neural network is only one part of the solution. MCTS is another. A third part would be a population of player strategies that are evolved during the learning process using Genetic Algorithms. So when a hypothetical bot would sit down to play at a poker table... it would first pick a standard very good player strategy. And after say 20-30 hands when it knows the behavior patterns of the players at the table it will pick a player strategy that has historically performed well against the patterns of behavior exhibited at this poker table.


sure it totallly makes sense to me. With this approach bot would be able to adjust to specific opponent. There is a very low number of specific opponent types, and lines that opponents take, so it would be enough to recognize what player type it is, what lines it takes, and then pick strategy that works the best against it. The only difficulty is if
opponent changes strategy a lot after some time, bot would have hard time realizing this, so you would have to observe large sample of hand histories against specific opponent but also last N hands in order to conclude if he changed strategy.


Top
 Profile  
 
PostPosted: Tue Sep 04, 2018 6:25 pm 
Offline
Regular Member

Joined: Fri Nov 25, 2016 10:42 pm
Posts: 95
Quote:
I actually did this for my very first bot. It learned a best response against the Poker Academy bot using reinforcement learning over the course of a few weeks, eventually winning. Then I played it myself. It was insanely aggressive and it beat me for the first hour. Then I learned how to play it and destroyed it.


yes that was also my point. It might sound like a little detail (that bot recognizes when opponent changes strategy), but it is actually very important. This is in my opinion
the biggest challenge for poker AI. E.g if i open raise from UTG and see that MP 3bets me with e.g. 79o, i would just by seeing that hand conclude that he is super agro vs me and exploits me with 3bets (because that hand is totally non standard to 3bet and ther are many better bluffs, meaing that he is most likely bluffing me super often).
It is similar when i see that villain cbets river with totally non standard bluff, meaning that he is super aggressive. That is how humans make quick conclusions about villain tendencies and make logical conclusions.


Top
 Profile  
 
PostPosted: Sun Oct 07, 2018 7:20 pm 
Offline
Regular Member
User avatar

Joined: Tue Mar 05, 2013 9:19 pm
Posts: 50
Thanks for releasing the code and some details on the project, it's always great to see :) One note is that similar machine learning approaches have been undertaken before, and from what I can understand in at least similar ways. However, deep learning is fairly new and I haven't been keeping up to date with progress over the past years.

There's been previous talk on this forum on using machine learning for predicting actions, which can be seen as a similar problem to predicting cards, given that cards and actions are probably quite tightly connected. I was wondering about the hardware requirements you've posted, and if you have any more details on the deep learning approach that you've taken? The requirements seem quite high, so would be great to hear more about what's going on in the deep learning side.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 3 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
cron
Powered by phpBB® Forum Software © phpBB Group