Poker-AI.org Poker AI and Botting Discussion Forum 2016-09-08T02:50:03+00:00 http://poker-ai.org/phpbb/feed.php?f=25&t=2982 2016-09-08T02:50:03+00:00 2016-09-08T02:50:03+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2982&p=7015#p7015 <![CDATA[Re: A Pattern Learning Strategy Using Convolutional Networks]]> http://arxiv.org/abs/1603.01121

Statistics: Posted by SkyBot — Thu Sep 08, 2016 2:50 am


]]>
2016-08-26T20:18:55+00:00 2016-08-26T20:18:55+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2982&p=7002#p7002 <![CDATA[Re: A Pattern Learning Strategy Using Convolutional Networks]]> spears wrote:

I was assuming and hoping it produces a mixed strategy. I believe a pure best response strategy would be highly exploitable.

Not sure, but I think you can just train an additional net as explicit policy net. So you have one net that tells you the value of states, and one that tells you percentages to take an action.
old: I would probably still use action that leads to maximum state by default, and only use the policy if the state values are near each other (during usage, in training/self-play you use the policy always (except epsilon-greedy exploration stuff))... edit: stupid me: you make the enemy indifferent to calling/folding not yourself, doh
new: just trust your nets

But I am not sure if net training would require you to evaluate all possible actions (more expensive to train, but no problem for HU-Limit, more of a problem for my case: 6-max-NL...). But we can use the state value estimation network for that.

Statistics: Posted by SkyBot — Fri Aug 26, 2016 8:18 pm


]]>
2016-08-25T19:48:04+00:00 2016-08-25T19:48:04+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2982&p=7001#p7001 <![CDATA[Re: A Pattern Learning Strategy Using Convolutional Networks]]> AlephZero wrote:

Where have you read that this method leads to a mixed strategy? The goal of this neural networks is to find the best move after the most reasonable thing is to take it 100% of the time i belive.
Nestorev momentiun is simply a descent algorithm of the gradient, tecnical stuff to train the network.

CFR leads to an equilibrium strategy, so a mixed strategy that is a probability distribution on a support of strategy with the same utility, this exploatative method leads to a pure best strategy. Trained against an equilibrium agent it should leads to a pure strategy in the equilibrium respons support getting the same utility of the equilibrium response but still being a pure strategy.


I was assuming and hoping it produces a mixed strategy. I believe a pure best response strategy would be highly exploitable.

Statistics: Posted by spears — Thu Aug 25, 2016 7:48 pm


]]>
2016-08-25T19:24:16+00:00 2016-08-25T19:24:16+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2982&p=7000#p7000 <![CDATA[Re: A Pattern Learning Strategy Using Convolutional Networks]]>
I am atm playing with deep q-learning (see google's DeepMind, dqn, dual dqn, dueling dqn,...).

However, while all DeepMind approaches use convolutions I did not use them till now, I thought those would not help much for poker. But I think it is time to try this too...

Statistics: Posted by SkyBot — Thu Aug 25, 2016 7:24 pm


]]>
2016-08-25T16:27:46+00:00 2016-08-25T16:27:46+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2982&p=6999#p6999 <![CDATA[Re: A Pattern Learning Strategy Using Convolutional Networks]]> spears wrote:

See also http://www.pokernews.com/strategy/poker ... -24246.htm

It's good to see something new that is not yet another version of CFR, and also that huge models are not required.

I don't understand how the training process leads to a mixed strategy. Does it move the action frequency a little in the direction of greatest reward in each game using this Nestorev momentum thingy?


Where have you read that this method leads to a mixed strategy? The goal of this neural networks is to find the best move after the most reasonable thing is to take it 100% of the time i belive.
Nestorev momentiun is simply a descent algorithm of the gradient, tecnical stuff to train the network.

CFR leads to an equilibrium strategy, so a mixed strategy that is a probability distribution on a support of strategy with the same utility, this exploatative method leads to a pure best strategy. Trained against an equilibrium agent it should leads to a pure strategy in the equilibrium respons support getting the same utility of the equilibrium response but still being a pure strategy.

Statistics: Posted by AlephZero — Thu Aug 25, 2016 4:27 pm


]]>
2016-06-25T20:51:29+00:00 2016-06-25T20:51:29+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2982&p=6967#p6967 <![CDATA[Re: A Pattern Learning Strategy Using Convolutional Networks]]> http://www.pokernews.com/strategy/poker ... -24246.htm

It's good to see something new that is not yet another version of CFR, and also that huge models are not required.

I don't understand how the training process leads to a mixed strategy. Does it move the action frequency a little in the direction of greatest reward in each game using this Nestorev momentum thingy?

Statistics: Posted by spears — Sat Jun 25, 2016 8:51 pm


]]>
2016-06-20T01:37:26+00:00 2016-06-20T01:37:26+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2982&p=6966#p6966 <![CDATA[A Pattern Learning Strategy Using Convolutional Networks]]>
by:
Nikolai Yakovenko
Liangliang Cao
Colin Raffel
James Fan

Abstract:
Poker is a family of card games that includes many variations. We hypothesize that most poker games can be solved as a pattern matching problem, and propose creating a strong poker playing system based on a unified poker representation. Our poker player learns through iterative self-play, and improves its understanding of the game by training on the results of its previous actions without sophisticated domain knowledge. We evaluate our system on three poker games: single player video poker, two-player Limit Texas Hold’em, and finally two-player 2-7 triple draw poker. We show that our model can quickly learn patterns in these very different poker games while it improves from zero knowledge to a competitive player against human experts. The contributions of this paper include: (1) a novel representation for poker games, extendable to different poker variations, (2) a Convolutional Neural Network (CNN) based learning model that can effectively learn the patterns in three different games, and (3) a self-trained system that significantly beats the heuristic-based program on which it is trained, and our system is competitive against human expert players.

http://colinraffel.com/publications/aaai2016poker.pdf

I hope this paper is of interest.

Statistics: Posted by Orac — Mon Jun 20, 2016 1:37 am


]]>