Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 2:02 pm

All times are UTC




Post new topic Reply to topic  [ 35 posts ]  Go to page Previous  1, 2
Author Message
PostPosted: Wed Jul 03, 2019 8:04 pm 
Offline
Junior Member

Joined: Wed Jun 26, 2019 4:02 am
Posts: 28
I figured out one of the issues. In my best response calculations, when I dealt a river card, I was dividing the EV of each hand by 48 (52 - 4 cards on board), but you need to divide it by 44 for some reason (52 - 4 cards on board - 4 cards in player's hands). Sort of makes sense but at the same time it doesn't.


Top
 Profile  
 
PostPosted: Thu Jul 04, 2019 9:58 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
Fossana wrote:
I figured out one of the issues. In my best response calculations, when I dealt a river card, I was dividing the EV of each hand by 48 (52 - 4 cards on board), but you need to divide it by 44 for some reason (52 - 4 cards on board - 4 cards in player's hands). Sort of makes sense but at the same time it doesn't.


Does that solve the (lack of) convergence problem?


Top
 Profile  
 
PostPosted: Thu Jul 04, 2019 4:23 pm 
Offline
Junior Member

Joined: Wed Jun 26, 2019 4:02 am
Posts: 28
spears wrote:
Fossana wrote:
I figured out one of the issues. In my best response calculations, when I dealt a river card, I was dividing the EV of each hand by 48 (52 - 4 cards on board), but you need to divide it by 44 for some reason (52 - 4 cards on board - 4 cards in player's hands). Sort of makes sense but at the same time it doesn't.


Does that solve the (lack of) convergence problem?


It converges to a lower level of exploitability now but it still gets stuck pretty quickly. I’ve compared my best response calculations with piosolver and I seem to be getting the same EVs, but dividing by 44 instead of 48 when dealing the river can’t possibly be right. Dividing by 44 must be making up for some other mistake in my best response code.

I may implement vanilla cfr just to test my best response when the opponent’s strategy isn’t uniform mixed like it is before any cfr iterations.

When calculating best response, do you need to weight the probability of cards being dealt by the opponent’s strategy? Like if you get to a node where the opponent gets to that node with the majority of their Ax combos, then having an equal chance of dealing an A as other cards doesn’t make much sense.


Top
 Profile  
 
PostPosted: Thu Jul 04, 2019 5:35 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
Fossana wrote:
When calculating best response, do you need to weight the probability of cards being dealt by the opponent’s strategy? Like if you get to a node where the opponent gets to that node with the majority of their Ax combos, then having an equal chance of dealing an A as other cards doesn’t make much sense.


Consider the simple but slow method where you pass a single best response player's hand and the equilibrium player's reach vector down the tree and you pass a single ev back up the tree. In that case you can modify the equilibrium player's reach vector at the root to reflect the collisions between the br player's hand and eq players hand.

Consider the faster method where you just pass the equilibrium player's reach vector down the tree and pass a vector of evs back, each ev representing alternative br players hand holding. In that case I guess you have to account for the collisions at the terminal nodes instead. (Have you not figured this out already in your PCS solution?) I'm not sure if I ever figured that out properly, because I disappeared down a rabbit hole of complexity looking at hand abstraction.

It might be worth looking at Amax's code. http://www.poker-ai.org/archive/www.pok ... &sk=t&sd=a I see there are other sources available online now too.


Top
 Profile  
 
PostPosted: Thu Jul 04, 2019 6:15 pm 
Offline
Junior Member

Joined: Wed Jun 26, 2019 4:02 am
Posts: 28
spears wrote:
Fossana wrote:
When calculating best response, do you need to weight the probability of cards being dealt by the opponent’s strategy? Like if you get to a node where the opponent gets to that node with the majority of their Ax combos, then having an equal chance of dealing an A as other cards doesn’t make much sense.


Consider the simple but slow method where you pass a single best response player's hand and the equilibrium player's hand distribution down the tree and you pass a single ev back up the tree. In that case you can modify the equilibrium player's hand distribution at the root to reflect the collisions between the br player's

Consider the faster method where you just pass the equilibrium player's hand distribution down the tree and pass a vector of evs back, each ev representing alternative br players hand holding. In that case I guess you have to account for the collisions at the terminal nodes instead. (Have you not figured this out already in your PCS solution?) I'm not sure if I ever figured that out properly, because I disappeared down a rabbit hole of complexity looking at hand abstraction.

It might be worth looking at Amax's code. http://www.poker-ai.org/archive/www.pok ... &sk=t&sd=a I see there are other sources available now too.


My code is heavily based on Amax's code. I have implemented the best response calculations both the fast and slow way, but I need to divide the EVs I get from chance nodes by 44 rather than some other number to get the right answer.


Top
 Profile  
 
PostPosted: Thu Jul 04, 2019 9:28 pm 
Offline
Junior Member

Joined: Wed Jun 26, 2019 4:02 am
Posts: 28
I figured out why dividing the reach probabilities for each of villlain's hand by 44 when dealing the river makes sense now. Think about this way: how many rivers are possible for one of villain's hands? There are 4 cards on the board and 2 cards in hero's hand, so that leaves 46 rivers. Villain's hand will block 2 of of those rivers, leaving only 44. If you divide by 46, then you will mistakenly get 44/46 + 0 + 0 = 44/46, when the probability needs to add up to 1.


Top
 Profile  
 
PostPosted: Fri Jul 05, 2019 1:23 am 
Offline
Junior Member

Joined: Wed Jun 26, 2019 4:02 am
Posts: 28
I think my PCS implementation is actually correct. It does converge to 0 after a million iterations. Just kind of disturbing how it can go from 22% exploitability to less than 1% exploitability in 10k iterations and then it takes another 50k iterations to drop below 0.1% exploitability and then another 750k iterations to drop below 0.01% exploitability. Perhaps that's the nature of CFR, but I don't understand how commercial solvers work so quickly.


Top
 Profile  
 
PostPosted: Sat Jul 06, 2019 6:07 am 
Offline
Junior Member

Joined: Wed Jun 26, 2019 4:02 am
Posts: 28
Implemented CFR+ and it seems to be a lot faster than PCS. I compared the speed of my CFR+ implementation to piosolver on the following turn subgame:

OOP range: 100% of hands
IP range: 100% of hands
board: KdJdTd5s
pot size: 100
effective stack size: 1000
turn bet sizes: 0.5pot, 1pot
turn raise sizes: 0.5pot
river bet sizes: 0.25pot, 0.5pot, 1pot
river raise sizes: 0.5pot
donking is allowed
allin threshold = 0.67

It took piosolver 13.5s to solve this subgame down to an exploitability of 0.413% of the pot per hand. It took my CFR+ implementation 54.8s (300 iterations) to solve the same subgame down to an exploitability of 0.404% of the pot per hand. I suspect if I exploit isomorphisms I can halve this time, especially since the board I used has three cards of the same suit.

Another thing I did to speed things up was convert all 2d arrays to 1d arrays. In languages like C++ and C#, 2d arrays are actually stored as 1d arrays internally, but in Java, 2d arrays are an array of arrays. This has a lot of overhead and made my program 33% slower before.

Not sure why PCS is so slow, maybe because it there's a lot of overhead involved in traversing the tree and variance. I found that PCS+ converges a lot faster. For PCS+, every time the number of iterations was divisible by the number of possible runouts, I reset negative cumulative regret to 0, and I also updated cumulative strategy like so: cumulativeStrategy += (1 + iterationCount/numRunouts) * strategy * reachProbability. For a turn subgame, the number of possible runouts is 48, and for flop subgames it would be 49*48=2352.

Here's how to deal with board card chance nodes in CFR+:
1. Set the reach probability for hands that conflict with the turn/river card to 0.
2. If you're dealing the river, divide the utilities that propagate back up by 44, and if you're dealing the turn, divide the utilities by 45. The reason you do this is because let's say you have a hand AcAh and your opponent has 460 combos that don't conflict with this hand. For each of the 46 possible rivers (52 - 4 cards on board - 2 cards contained in AcAh), on average, 20 of your opponents combos will be blocked by that river card, so you get a utility for AcAh that was computed against 440 combos on average rather than 460.
3. Divide hero's reach probabilities by 46 if dealing the river and by 45 if dealing the turn. If hero holds AcAh and there's 4 cards on the board, then there are 46 different rivers that hero can see with this hand, so the reach probability for each of these rivers is 1/46 for AcAh. This is actually not completely accurate because if your opponent's strategy is one in which they mostly have KK in a spot, then it's less likely your AcAh will see a K river. Though I find in practice just using 1/46 is fine instead of manually calculating the actual reach probability for each river.

Here's how to deal with board card chance nodes in PCS:
1. sample board cards ahead of time
2. set the reach probability for hands that conflict with the board (including the sampled cards) to 0 before starting the iteration
3. at terminal nodes, make sure that the utility for hands that conflict with the board is 0, otherwise nonzero utility will be propagated up and you'll have nonzero regret for hands that you can't even have on the board (do the same with CFR+)


Top
 Profile  
 
PostPosted: Sat Jul 06, 2019 7:26 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
Great work. Thank you for the contribution.


Top
 Profile  
 
PostPosted: Sat Jul 06, 2019 10:02 am 
Offline
Junior Member

Joined: Wed Jun 26, 2019 4:02 am
Posts: 28
Just implemented DCFR (discounted CFR) which converges faster than CFR+. Only 2x slower than piosolver now.


Top
 Profile  
 
PostPosted: Fri Jul 19, 2019 1:39 pm 
Offline
Junior Member

Joined: Wed Jun 26, 2019 4:02 am
Posts: 28
This is rather preliminary, but I converted all of my Java code to C++ and did a lot of optimizations (mainly loop unrolling and taking advantage of the fact that reach probabilities/strategy sums/regret sums can be smaller on the river due to the river card blocking a bunch of hands), and my solver is currently 10% faster than piosolver, at least for turn subgames.


Top
 Profile  
 
PostPosted: Fri Jul 19, 2019 2:09 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
Is bounds checking on or off?


Top
 Profile  
 
PostPosted: Fri Jul 19, 2019 2:17 pm 
Offline
Junior Member

Joined: Wed Jun 26, 2019 4:02 am
Posts: 28
spears wrote:
Is bounds checking on or off?


I think my compiler is turning bounds checking off since I was getting some illegal read operations earlier.


Top
 Profile  
 
PostPosted: Tue Jul 30, 2019 5:36 pm 
Offline
Junior Member

Joined: Wed Jun 26, 2019 4:02 am
Posts: 28
I have a new idea for speeding things up. If P1’s range is AA,KK and P2’s range is QQ,JJ, then the reach probabilities for each player would consist of two floats, but I think it would be better to merge the ranges and pretend both players have the same range. This would result in reach probabilities consisting of four floats with some of the reach probabilities initialized to zero for both players. This makes the showdown utility calculations faster because you don’t have to compare the current player’s hand ranks to the opponent’s hand ranks. You just have to iterate over the hands that have the same rank in the current player’s range. This saves like 1326 condition checks for 100% ranges.

I think this only speeds things up if the ranges have a decent amount of overlap though.

By the way, another way to speed things up is to update the opponent’s strategy sum rather than the current player’s strategy sum on each iteration. This way you don’t have to keep track of or calculate the current player’s reach probabilities at all (just the opponent’s). The convergence rate is the same.


Top
 Profile  
 
PostPosted: Wed Aug 14, 2019 10:52 am 
Offline
New Member

Joined: Wed Aug 14, 2019 10:49 am
Posts: 3
You need to be curious,to do it


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 35 posts ]  Go to page Previous  1, 2

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group