Poker-AI.org
http://poker-ai.org/phpbb/

CMU Libratus wins by 15bb/100 & uses reinforcement learning
http://poker-ai.org/phpbb/viewtopic.php?f=24&t=3017
Page 1 of 1

Author:  botishardwork [ Tue Jan 31, 2017 3:58 pm ]
Post subject:  CMU Libratus wins by 15bb/100 & uses reinforcement learning

AI Decisively Defeats Human Poker Players
http://spectrum.ieee.org/automaton/robo ... er-players

Noam, the PhD who worked on Libratus, also mentioned that: The basis for the bot is reinforcement learning using a special variant of Counterfactual Regret Minimization. We use a form of Monte Carlo CFR distributed over about 200 nodes. We also incorporate a sampled form of Regret-Based Pruning which speeds up the computation quite a bit.

https://www.reddit.com/r/IAmA/comments/ ... y/dczfvej/

Author:  mlatinjo [ Wed Feb 01, 2017 9:15 am ]
Post subject:  Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn

would that method be applicable in online poker due to short time to react with some average computer?

Author:  Code-Monkey [ Sat Feb 04, 2017 9:21 am ]
Post subject:  Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn

it'll work fine if you use the $10million dollar super computer they have. think i need to upgrade my little laptop :roll:

Author:  SkyBot [ Fri Feb 10, 2017 7:23 pm ]
Post subject:  Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn

[edit:] Sorry, the following post was written under the assumption they use neural nets (I mix it up with DeepStack, they use neural nets). So you may not want to look at the GPU instance prices, but at prices for normal instances.

You can train on amazon and then just run fewer steps than them for real play on local servers. You don't have to be as good as them to beat online players. I currently train my deep nets on 3 GPUs at home, but plan on using Amazon for the training. Spot price for GPU is 10-15 cents an hour. So you can easily train on many GPUs for some days for reasonable money...

Note: I don't follow that paper, I follow the main ideas of another reinforced learning poker paper, but with some significant changes that reduce the effort greatly.

Training is what needs much resources (at least for my bot), eval is cheap compared to that (especially if you batch smartly, cost is not linear, if you do thousands of evals in one batch it is much much cheaper than thousands of single evals).

[edit:]
The problem with amazon is that single GPUs are that price, a machine with 16 GPUs is very expensive. And you have to send a lot of data around (at least for what I am doing). I am currently optimizing my data transfers to be sure I am below what a cheap GPU instance has (p2.xlarge, assuming worst case of 800Mbps, atm I am way above that with the scaling I plan to run).

Author:  HontoNiBaka [ Tue Feb 14, 2017 1:19 pm ]
Post subject:  Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn

SkyBot wrote:
[edit:]
Note: I don't follow that paper, I follow the main ideas of another reinforced learning poker paper, but with some significant changes that reduce the effort greatly.


Which paper do you follow in your training? Would be great if you could provide the name or a link.

Author:  SkyBot [ Tue Feb 14, 2017 10:41 pm ]
Post subject:  Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn

HontoNiBaka wrote:
SkyBot wrote:
[edit:]
Note: I don't follow that paper, I follow the main ideas of another reinforced learning poker paper, but with some significant changes that reduce the effort greatly.


Which paper do you follow in your training? Would be great if you could provide the name or a link.

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
https://arxiv.org/abs/1603.01121

I use the main idea of having a average and best response neural net and mix them for training. However, while their stuff should converge to Nash, I use some brutal optimizations where I may lose those properties.

Author:  brans [ Tue Mar 21, 2017 1:43 pm ]
Post subject:  Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn

Sky bot.. we can cooperate. I am also investigating this topic. I have experience in RL and deep learning. Now I am trying an other approach. But cooperation with can be helpful. Looks like a lot of work here.

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/