Poker-AI.org Poker AI and Botting Discussion Forum 2018-07-26T14:33:06+00:00 http://poker-ai.org/phpbb/feed.php?f=24&t=3107 2018-07-26T14:33:06+00:00 2018-07-26T14:33:06+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3107&p=7686#p7686 <![CDATA[Re: Neural net based approach inspired by CFR]]> happypepper wrote:

I think cfr has a chance against 6max.

Using deepstack method, the solving can be limited to 1 street or even half streets.

Furthermore, postflop situations are most of the time 2-3 player only. We'd only have to worry about solving 6max for single street preflop.


Perhaps I was being strong, but at the same time the tree does get larger, even if many of those scenarios don't play out in reality. I guess in reality if you do have all players in the pot gets larger and things potentially simplify as well. Possible that there are abstractions that don't lose much.

Have you experimented at all in any such fashion?

Statistics: Posted by PassiveBot — Thu Jul 26, 2018 2:33 pm


]]>
2018-07-25T17:54:10+00:00 2018-07-25T17:54:10+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3107&p=7684#p7684 <![CDATA[Re: Neural net based approach inspired by CFR]]>
Using deepstack method, the solving can be limited to 1 street or even half streets.

Furthermore, postflop situations are most of the time 2-3 player only. We'd only have to worry about solving 6max for single street preflop.

Statistics: Posted by happypepper — Wed Jul 25, 2018 5:54 pm


]]>
2018-07-25T15:41:03+00:00 2018-07-25T15:41:03+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3107&p=7683#p7683 <![CDATA[Re: Neural net based approach inspired by CFR]]> cantina wrote:

I think it's feasible with a robust enough network. These days it's not unusual for DBNs to have many thousands of inputs for image recognition.

I tried this awhile back with HUL and SFF NNs. It converged to a point but began favoring more common hands (center of the bell curve).


It's definitely more than feasible. Deepstack is a neural net approach inspired by CFR.

I've had some moderate testing success with some non-CFR based neural networks. Experimented with both ANNs and CNNs (actually also a hybrid of a CNN and an ANN) however I didn't bother with CFR as I was looking for something that stood a chance at 6 max.

Statistics: Posted by PassiveBot — Wed Jul 25, 2018 3:41 pm


]]>
2018-07-24T11:05:40+00:00 2018-07-24T11:05:40+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3107&p=7681#p7681 <![CDATA[Re: Neural net based approach inspired by CFR]]>
I tried this awhile back with HUL and SFF NNs. It converged to a point but began favoring more common hands (center of the bell curve).

Statistics: Posted by cantina — Tue Jul 24, 2018 11:05 am


]]>
2018-01-13T23:20:01+00:00 2018-01-13T23:20:01+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3107&p=7519#p7519 <![CDATA[Re: Neural net based approach inspired by CFR]]> menc wrote:

Quote:
So after the update, it might output "91%" for example, depending on the learning rate.


yep, that's where could be hard for an NN approach.

how NN updates its weights in one iter:
1. you set a differentiable loss function loss(x)
2. for each weight wi, calculate its partial derivative dloss/dw and its gradient.
3. update wi = wi - learning_rate * gradient.


for NN or DL approaches, the forward is like:
one_layer = activate(wx + b)
pred = softmax(one_layer(one_layer(...))

how would you design your loss function and one iter?


If you're generating examples of State & Action and the Regret Value, then all you would need to do is use mean squared error as a loss function and train the network on those values.

Of course you're going to need a lot of examples and/or to represent the state in a way that buckets things enough.

Statistics: Posted by PassiveBot — Sat Jan 13, 2018 11:20 pm


]]>
2018-01-13T21:09:22+00:00 2018-01-13T21:09:22+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3107&p=7518#p7518 <![CDATA[Re: Neural net based approach inspired by CFR]]> Quote:

So after the update, it might output "91%" for example, depending on the learning rate.


yep, that's where could be hard for an NN approach.

how NN updates its weights in one iter:
1. you set a differentiable loss function loss(x)
2. for each weight wi, calculate its partial derivative dloss/dw and its gradient.
3. update wi = wi - learning_rate * gradient.


for NN or DL approaches, the forward is like:
one_layer = activate(wx + b)
pred = softmax(one_layer(one_layer(...))

how would you design your loss function and one iter?

Statistics: Posted by menc — Sat Jan 13, 2018 9:09 pm


]]>
2018-01-13T00:25:46+00:00 2018-01-13T00:25:46+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3107&p=7517#p7517 <![CDATA[Re: Neural net based approach inspired by CFR]]> Quote:

i wonder how you would set a differentiable loss function so that the SGD based optimizers could work? the original CFR algo seems undifferentiable.


Not sure I entirely follow you. I imagine it could work like this. In every iteration:

1. Sample from the game tree, calculate regrets for each information set, calculate strategies matching the regrets. For example, we can have "with AA preflop, raise 100% of the time".

2. Do one training backpropagation step on the neural network(s). Let's say the neural network currently predicts "with AA preflop, raise 90% of the time". We say, "no, 90% is incorrect, the correct answer is 100%" and perform an update in that direction. So after the update, it might output "91%" for example, depending on the learning rate.

An important note is that the learning rate would be proportional to the counterfactual reach probability. This is similar to how regrets are weighted in CFR. That's the main idea behind this neural network approach. Intuitively, it seems that this could possibly work well, because it works for CFR.

Another note, the strategies from step one are obviously incorrect, because they're based on just one sample. But on average, they should be correct (or not? not sure here).

Statistics: Posted by listerofsmeg — Sat Jan 13, 2018 12:25 am


]]>
2018-01-11T20:48:45+00:00 2018-01-11T20:48:45+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3107&p=7516#p7516 <![CDATA[Re: Neural net based approach inspired by CFR]]> but in my opinion, the CFR algo and value-based qlearning equal.
CFR focus on "how much you didn't get" while qlearning focus on "how much you get", they are almost the same.

i wonder how you would set a differentiable loss function so that the SGD based optimizers could work? the original CFR algo seems undifferentiable.

Statistics: Posted by menc — Thu Jan 11, 2018 8:48 pm


]]>
2018-01-06T15:25:04+00:00 2018-01-06T15:25:04+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3107&p=7514#p7514 <![CDATA[Re: Neural net based approach inspired by CFR]]>
Anyway, I'm probably gonna go ahead with the approach of making a formula- or rule-based decision for every hand in my range. First form my raise range, then form a call range based on betsize, and finally the bluff range based on the number of value bets.

Statistics: Posted by listerofsmeg — Sat Jan 06, 2018 3:25 pm


]]>
2018-01-06T09:44:35+00:00 2018-01-06T09:44:35+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3107&p=7513#p7513 <![CDATA[Re: Neural net based approach inspired by CFR]]>
The inputs to the NN would have to encode the current hand, current and past boards, and all previous actions (of the current hand) of all players. What would the encoding be? If there are too many inputs there would be no generalisation so learning would be very slow. If there are too few inputs there would be error. I think the NN would find it difficult to extract the important features of unencoded inputs in any reasonable timescale.

Statistics: Posted by spears — Sat Jan 06, 2018 9:44 am


]]>
2018-01-05T14:37:59+00:00 2018-01-05T14:37:59+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3107&p=7512#p7512 <![CDATA[Neural net based approach inspired by CFR]]>
Basically, have a neural net that has information visible to the current player as an input and action probabilities as its output.

The training would work like this. Play a batch of say 1000 games and calculate regrets and regret-matched strategies for each information set encountered, similarly to outcome-sampled CFR. Then, take these information sets and their strategies use them to update the neural network, so that it plays closer to the computed strategies.

Thoughts?

Statistics: Posted by listerofsmeg — Fri Jan 05, 2018 2:37 pm


]]>