Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 4:03 pm

All times are UTC




Post new topic Reply to topic  [ 7 posts ] 
Author Message
PostPosted: Mon Jun 20, 2016 1:37 am 
Offline
New Member

Joined: Mon Jun 20, 2016 1:22 am
Posts: 1
Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games Using Convolutional Networks

by:
Nikolai Yakovenko
Liangliang Cao
Colin Raffel
James Fan

Abstract:
Poker is a family of card games that includes many variations. We hypothesize that most poker games can be solved as a pattern matching problem, and propose creating a strong poker playing system based on a unified poker representation. Our poker player learns through iterative self-play, and improves its understanding of the game by training on the results of its previous actions without sophisticated domain knowledge. We evaluate our system on three poker games: single player video poker, two-player Limit Texas Hold’em, and finally two-player 2-7 triple draw poker. We show that our model can quickly learn patterns in these very different poker games while it improves from zero knowledge to a competitive player against human experts. The contributions of this paper include: (1) a novel representation for poker games, extendable to different poker variations, (2) a Convolutional Neural Network (CNN) based learning model that can effectively learn the patterns in three different games, and (3) a self-trained system that significantly beats the heuristic-based program on which it is trained, and our system is competitive against human expert players.

http://colinraffel.com/publications/aaai2016poker.pdf

I hope this paper is of interest.


Top
 Profile  
 
PostPosted: Sat Jun 25, 2016 8:51 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
See also http://www.pokernews.com/strategy/poker ... -24246.htm

It's good to see something new that is not yet another version of CFR, and also that huge models are not required.

I don't understand how the training process leads to a mixed strategy. Does it move the action frequency a little in the direction of greatest reward in each game using this Nestorev momentum thingy?


Top
 Profile  
 
PostPosted: Thu Aug 25, 2016 4:27 pm 
Offline
Junior Member

Joined: Mon Aug 08, 2016 9:37 pm
Posts: 13
spears wrote:
See also http://www.pokernews.com/strategy/poker ... -24246.htm

It's good to see something new that is not yet another version of CFR, and also that huge models are not required.

I don't understand how the training process leads to a mixed strategy. Does it move the action frequency a little in the direction of greatest reward in each game using this Nestorev momentum thingy?


Where have you read that this method leads to a mixed strategy? The goal of this neural networks is to find the best move after the most reasonable thing is to take it 100% of the time i belive.
Nestorev momentiun is simply a descent algorithm of the gradient, tecnical stuff to train the network.

CFR leads to an equilibrium strategy, so a mixed strategy that is a probability distribution on a support of strategy with the same utility, this exploatative method leads to a pure best strategy. Trained against an equilibrium agent it should leads to a pure strategy in the equilibrium respons support getting the same utility of the equilibrium response but still being a pure strategy.


Top
 Profile  
 
PostPosted: Thu Aug 25, 2016 7:24 pm 
Offline
Junior Member

Joined: Sat Apr 26, 2014 7:29 am
Posts: 34
Thanks a lot for that link.

I am atm playing with deep q-learning (see google's DeepMind, dqn, dual dqn, dueling dqn,...).

However, while all DeepMind approaches use convolutions I did not use them till now, I thought those would not help much for poker. But I think it is time to try this too...


Top
 Profile  
 
PostPosted: Thu Aug 25, 2016 7:48 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
AlephZero wrote:

Where have you read that this method leads to a mixed strategy? The goal of this neural networks is to find the best move after the most reasonable thing is to take it 100% of the time i belive.
Nestorev momentiun is simply a descent algorithm of the gradient, tecnical stuff to train the network.

CFR leads to an equilibrium strategy, so a mixed strategy that is a probability distribution on a support of strategy with the same utility, this exploatative method leads to a pure best strategy. Trained against an equilibrium agent it should leads to a pure strategy in the equilibrium respons support getting the same utility of the equilibrium response but still being a pure strategy.


I was assuming and hoping it produces a mixed strategy. I believe a pure best response strategy would be highly exploitable.


Top
 Profile  
 
PostPosted: Fri Aug 26, 2016 8:18 pm 
Offline
Junior Member

Joined: Sat Apr 26, 2014 7:29 am
Posts: 34
spears wrote:
I was assuming and hoping it produces a mixed strategy. I believe a pure best response strategy would be highly exploitable.

Not sure, but I think you can just train an additional net as explicit policy net. So you have one net that tells you the value of states, and one that tells you percentages to take an action.
old: I would probably still use action that leads to maximum state by default, and only use the policy if the state values are near each other (during usage, in training/self-play you use the policy always (except epsilon-greedy exploration stuff))... edit: stupid me: you make the enemy indifferent to calling/folding not yourself, doh
new: just trust your nets

But I am not sure if net training would require you to evaluate all possible actions (more expensive to train, but no problem for HU-Limit, more of a problem for my case: 6-max-NL...). But we can use the state value estimation network for that.


Top
 Profile  
 
PostPosted: Thu Sep 08, 2016 2:50 am 
Offline
Junior Member

Joined: Sat Apr 26, 2014 7:29 am
Posts: 34
For details how to connect the 2 nets to get a mixed strategy profile read this: http://arxiv.org/abs/1603.01121


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
cron
Powered by phpBB® Forum Software © phpBB Group