Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Sat Sep 22, 2018 4:05 am

All times are UTC




Post new topic Reply to topic  [ 11 posts ] 
Author Message
PostPosted: Fri Jan 05, 2018 2:37 pm 
Offline
Junior Member

Joined: Mon Sep 11, 2017 8:01 pm
Posts: 16
Do you think the following approach might work for 6-max no limit?

Basically, have a neural net that has information visible to the current player as an input and action probabilities as its output.

The training would work like this. Play a batch of say 1000 games and calculate regrets and regret-matched strategies for each information set encountered, similarly to outcome-sampled CFR. Then, take these information sets and their strategies use them to update the neural network, so that it plays closer to the computed strategies.

Thoughts?


Top
 Profile  
 
PostPosted: Sat Jan 06, 2018 9:44 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 528
I think it would be hard.

The inputs to the NN would have to encode the current hand, current and past boards, and all previous actions (of the current hand) of all players. What would the encoding be? If there are too many inputs there would be no generalisation so learning would be very slow. If there are too few inputs there would be error. I think the NN would find it difficult to extract the important features of unencoded inputs in any reasonable timescale.


Top
 Profile  
 
PostPosted: Sat Jan 06, 2018 3:25 pm 
Offline
Junior Member

Joined: Mon Sep 11, 2017 8:01 pm
Posts: 16
My thinking was to encode hand type (e.g. middle pair + flush draw), hole cards, our stack size, opponent stack sizes summary, pot size and very summarized betting history. For example, on the flop, preflop betting can be summarized as "3bet, we were the aggressor". The pot and stacks are needed because they can't be inferred from the betting history, because it's summarized.

Anyway, I'm probably gonna go ahead with the approach of making a formula- or rule-based decision for every hand in my range. First form my raise range, then form a call range based on betsize, and finally the bluff range based on the number of value bets.


Top
 Profile  
 
PostPosted: Thu Jan 11, 2018 8:48 pm 
Offline
New Member

Joined: Thu Jan 11, 2018 8:35 pm
Posts: 3
maybe it could be possible.
but in my opinion, the CFR algo and value-based qlearning equal.
CFR focus on "how much you didn't get" while qlearning focus on "how much you get", they are almost the same.

i wonder how you would set a differentiable loss function so that the SGD based optimizers could work? the original CFR algo seems undifferentiable.


Top
 Profile  
 
PostPosted: Sat Jan 13, 2018 12:25 am 
Offline
Junior Member

Joined: Mon Sep 11, 2017 8:01 pm
Posts: 16
Quote:
i wonder how you would set a differentiable loss function so that the SGD based optimizers could work? the original CFR algo seems undifferentiable.


Not sure I entirely follow you. I imagine it could work like this. In every iteration:

1. Sample from the game tree, calculate regrets for each information set, calculate strategies matching the regrets. For example, we can have "with AA preflop, raise 100% of the time".

2. Do one training backpropagation step on the neural network(s). Let's say the neural network currently predicts "with AA preflop, raise 90% of the time". We say, "no, 90% is incorrect, the correct answer is 100%" and perform an update in that direction. So after the update, it might output "91%" for example, depending on the learning rate.

An important note is that the learning rate would be proportional to the counterfactual reach probability. This is similar to how regrets are weighted in CFR. That's the main idea behind this neural network approach. Intuitively, it seems that this could possibly work well, because it works for CFR.

Another note, the strategies from step one are obviously incorrect, because they're based on just one sample. But on average, they should be correct (or not? not sure here).


Top
 Profile  
 
PostPosted: Sat Jan 13, 2018 9:09 pm 
Offline
New Member

Joined: Thu Jan 11, 2018 8:35 pm
Posts: 3
Quote:
So after the update, it might output "91%" for example, depending on the learning rate.


yep, that's where could be hard for an NN approach.

how NN updates its weights in one iter:
1. you set a differentiable loss function loss(x)
2. for each weight wi, calculate its partial derivative dloss/dw and its gradient.
3. update wi = wi - learning_rate * gradient.


for NN or DL approaches, the forward is like:
one_layer = activate(wx + b)
pred = softmax(one_layer(one_layer(...))

how would you design your loss function and one iter?


Top
 Profile  
 
PostPosted: Sat Jan 13, 2018 11:20 pm 
Offline
New Member

Joined: Mon Nov 06, 2017 7:19 pm
Posts: 4
menc wrote:
Quote:
So after the update, it might output "91%" for example, depending on the learning rate.


yep, that's where could be hard for an NN approach.

how NN updates its weights in one iter:
1. you set a differentiable loss function loss(x)
2. for each weight wi, calculate its partial derivative dloss/dw and its gradient.
3. update wi = wi - learning_rate * gradient.


for NN or DL approaches, the forward is like:
one_layer = activate(wx + b)
pred = softmax(one_layer(one_layer(...))

how would you design your loss function and one iter?


If you're generating examples of State & Action and the Regret Value, then all you would need to do is use mean squared error as a loss function and train the network on those values.

Of course you're going to need a lot of examples and/or to represent the state in a way that buckets things enough.


Top
 Profile  
 
PostPosted: Tue Jul 24, 2018 11:05 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 438
I think it's feasible with a robust enough network. These days it's not unusual for DBNs to have many thousands of inputs for image recognition.

I tried this awhile back with HUL and SFF NNs. It converged to a point but began favoring more common hands (center of the bell curve).


Top
 Profile  
 
PostPosted: Wed Jul 25, 2018 3:41 pm 
Offline
New Member

Joined: Mon Nov 06, 2017 7:19 pm
Posts: 4
cantina wrote:
I think it's feasible with a robust enough network. These days it's not unusual for DBNs to have many thousands of inputs for image recognition.

I tried this awhile back with HUL and SFF NNs. It converged to a point but began favoring more common hands (center of the bell curve).


It's definitely more than feasible. Deepstack is a neural net approach inspired by CFR.

I've had some moderate testing success with some non-CFR based neural networks. Experimented with both ANNs and CNNs (actually also a hybrid of a CNN and an ANN) however I didn't bother with CFR as I was looking for something that stood a chance at 6 max.


Top
 Profile  
 
PostPosted: Wed Jul 25, 2018 5:54 pm 
Offline
New Member

Joined: Sun Jun 03, 2018 11:57 pm
Posts: 7
I think cfr has a chance against 6max.

Using deepstack method, the solving can be limited to 1 street or even half streets.

Furthermore, postflop situations are most of the time 2-3 player only. We'd only have to worry about solving 6max for single street preflop.


Top
 Profile  
 
PostPosted: Thu Jul 26, 2018 2:33 pm 
Offline
New Member

Joined: Mon Nov 06, 2017 7:19 pm
Posts: 4
happypepper wrote:
I think cfr has a chance against 6max.

Using deepstack method, the solving can be limited to 1 street or even half streets.

Furthermore, postflop situations are most of the time 2-3 player only. We'd only have to worry about solving 6max for single street preflop.


Perhaps I was being strong, but at the same time the tree does get larger, even if many of those scenarios don't play out in reality. I guess in reality if you do have all players in the pot gets larger and things potentially simplify as well. Possible that there are abstractions that don't lose much.

Have you experimented at all in any such fashion?


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 6 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
cron
Powered by phpBB® Forum Software © phpBB Group