Poker-AI.org Poker AI and Botting Discussion Forum 2017-03-21T13:43:20+00:00 http://poker-ai.org/phpbb/feed.php?f=24&t=3017 2017-03-21T13:43:20+00:00 2017-03-21T13:43:20+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3017&p=7114#p7114 <![CDATA[Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn]]> Statistics: Posted by brans — Tue Mar 21, 2017 1:43 pm


]]>
2017-02-14T22:41:06+00:00 2017-02-14T22:41:06+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3017&p=7105#p7105 <![CDATA[Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn]]> HontoNiBaka wrote:

SkyBot wrote:
[edit:]
Note: I don't follow that paper, I follow the main ideas of another reinforced learning poker paper, but with some significant changes that reduce the effort greatly.


Which paper do you follow in your training? Would be great if you could provide the name or a link.

Deep Reinforcement Learning from Self-Play in Imperfect-Information Games
https://arxiv.org/abs/1603.01121

I use the main idea of having a average and best response neural net and mix them for training. However, while their stuff should converge to Nash, I use some brutal optimizations where I may lose those properties.

Statistics: Posted by SkyBot — Tue Feb 14, 2017 10:41 pm


]]>
2017-02-14T13:19:27+00:00 2017-02-14T13:19:27+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3017&p=7104#p7104 <![CDATA[Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn]]> SkyBot wrote:

[edit:]
Note: I don't follow that paper, I follow the main ideas of another reinforced learning poker paper, but with some significant changes that reduce the effort greatly.


Which paper do you follow in your training? Would be great if you could provide the name or a link.

Statistics: Posted by HontoNiBaka — Tue Feb 14, 2017 1:19 pm


]]>
2017-02-10T19:23:23+00:00 2017-02-10T19:23:23+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3017&p=7102#p7102 <![CDATA[Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn]]>
You can train on amazon and then just run fewer steps than them for real play on local servers. You don't have to be as good as them to beat online players. I currently train my deep nets on 3 GPUs at home, but plan on using Amazon for the training. Spot price for GPU is 10-15 cents an hour. So you can easily train on many GPUs for some days for reasonable money...

Note: I don't follow that paper, I follow the main ideas of another reinforced learning poker paper, but with some significant changes that reduce the effort greatly.

Training is what needs much resources (at least for my bot), eval is cheap compared to that (especially if you batch smartly, cost is not linear, if you do thousands of evals in one batch it is much much cheaper than thousands of single evals).

[edit:]
The problem with amazon is that single GPUs are that price, a machine with 16 GPUs is very expensive. And you have to send a lot of data around (at least for what I am doing). I am currently optimizing my data transfers to be sure I am below what a cheap GPU instance has (p2.xlarge, assuming worst case of 800Mbps, atm I am way above that with the scaling I plan to run).

Statistics: Posted by SkyBot — Fri Feb 10, 2017 7:23 pm


]]>
2017-02-04T09:21:10+00:00 2017-02-04T09:21:10+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3017&p=7098#p7098 <![CDATA[Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn]]>

Statistics: Posted by Code-Monkey — Sat Feb 04, 2017 9:21 am


]]>
2017-02-01T09:15:21+00:00 2017-02-01T09:15:21+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3017&p=7095#p7095 <![CDATA[Re: CMU Libratus wins by 15bb/100 & uses reinforcement learn]]> Statistics: Posted by mlatinjo — Wed Feb 01, 2017 9:15 am


]]>
2017-01-31T15:58:07+00:00 2017-01-31T15:58:07+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3017&p=7094#p7094 <![CDATA[CMU Libratus wins by 15bb/100 & uses reinforcement learning]]> http://spectrum.ieee.org/automaton/robo ... er-players

Noam, the PhD who worked on Libratus, also mentioned that: The basis for the bot is reinforcement learning using a special variant of Counterfactual Regret Minimization. We use a form of Monte Carlo CFR distributed over about 200 nodes. We also incorporate a sampled form of Regret-Based Pruning which speeds up the computation quite a bit.

https://www.reddit.com/r/IAmA/comments/ ... y/dczfvej/

Statistics: Posted by botishardwork — Tue Jan 31, 2017 3:58 pm


]]>