Poker-AI.org

Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn

2017-03-21T13:43:20+00:00

Sky bot.. we can cooperate. I am also investigating this topic. I have experience in RL and deep learning. Now I am trying an other approach. But cooperation with can be helpful. Looks like a lot of work here.

Statistics: Posted by brans — Tue Mar 21, 2017 1:43 pm

Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn

2017-02-14T22:41:06+00:00

HontoNiBaka wrote:

SkyBot wrote:

[edit:]
Note: I don't follow that paper, I follow the main ideas of another reinforced learning poker paper, but with some significant changes that reduce the effort greatly.

Which paper do you follow in your training? Would be great if you could provide the name or a link.

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
https://arxiv.org/abs/1603.01121

I use the main idea of having a average and best response neural net and mix them for training. However, while their stuff should converge to Nash, I use some brutal optimizations where I may lose those properties.

Statistics: Posted by SkyBot — Tue Feb 14, 2017 10:41 pm

Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn

2017-02-14T13:19:27+00:00

SkyBot wrote:

[edit:]
Note: I don't follow that paper, I follow the main ideas of another reinforced learning poker paper, but with some significant changes that reduce the effort greatly.

Which paper do you follow in your training? Would be great if you could provide the name or a link.

Statistics: Posted by HontoNiBaka — Tue Feb 14, 2017 1:19 pm

Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn

2017-02-10T19:23:23+00:00

[edit:] Sorry, the following post was written under the assumption they use neural nets (I mix it up with DeepStack, they use neural nets). So you may not want to look at the GPU instance prices, but at prices for normal instances.

You can train on amazon and then just run fewer steps than them for real play on local servers. You don't have to be as good as them to beat online players. I currently train my deep nets on 3 GPUs at home, but plan on using Amazon for the training. Spot price for GPU is 10-15 cents an hour. So you can easily train on many GPUs for some days for reasonable money...

Note: I don't follow that paper, I follow the main ideas of another reinforced learning poker paper, but with some significant changes that reduce the effort greatly.

Training is what needs much resources (at least for my bot), eval is cheap compared to that (especially if you batch smartly, cost is not linear, if you do thousands of evals in one batch it is much much cheaper than thousands of single evals).

[edit:]
The problem with amazon is that single GPUs are that price, a machine with 16 GPUs is very expensive. And you have to send a lot of data around (at least for what I am doing). I am currently optimizing my data transfers to be sure I am below what a cheap GPU instance has (p2.xlarge, assuming worst case of 800Mbps, atm I am way above that with the scaling I plan to run).

Statistics: Posted by SkyBot — Fri Feb 10, 2017 7:23 pm

Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn

2017-02-04T09:21:10+00:00

it'll work fine if you use the $10million dollar super computer they have. think i need to upgrade my little laptop

Statistics: Posted by Code-Monkey — Sat Feb 04, 2017 9:21 am

Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn

2017-02-01T09:15:21+00:00

would that method be applicable in online poker due to short time to react with some average computer?

Statistics: Posted by mlatinjo — Wed Feb 01, 2017 9:15 am

CMU Libratus wins by 15bb/100 & uses reinforcement learning

2017-01-31T15:58:07+00:00

AI Decisively Defeats Human Poker Players
http://spectrum.ieee.org/automaton/robo ... er-players

Noam, the PhD who worked on Libratus, also mentioned that: The basis for the bot is reinforcement learning using a special variant of Counterfactual Regret Minimization. We use a form of Monte Carlo CFR distributed over about 200 nodes. We also incorporate a sampled form of Regret-Based Pruning which speeds up the computation quite a bit.

https://www.reddit.com/r/IAmA/comments/ ... y/dczfvej/

Statistics: Posted by botishardwork — Tue Jan 31, 2017 3:58 pm