Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 2:31 pm

All times are UTC




Post new topic Reply to topic  [ 16 posts ] 
Author Message
PostPosted: Tue Mar 05, 2013 9:51 pm 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
Using Sliding Windows to Generate
Action Abstractions in Extensive-Form Games

by: John Hawkin and Robert C. Holte and Duane Szafron

Abstract
In extensive-form games with a large number of actions, careful
abstraction of the action space is critically important to
performance. In this paper we extend previous work on action
abstraction using no-limit poker games as our test domains.
We show that in such games it is no longer necessary
to choose, a priori, one specific range of possible bet sizes.
We introduce an algorithm that adjusts the range of bet sizes
considered for each bet individually in an iterative fashion.
This flexibility results in a substantially improved game value
in no-limit Leduc poker. When applied to no-limit Texas
Hold’em our algorithm produces an action abstraction that
is about one third the size of a state of the art hand-crafted
action abstraction, yet has a better overall game value.

http://poker.cs.ualberta.ca/publication ... sizing.pdf

_________________
Cheers.


Top
 Profile  
 
PostPosted: Mon Mar 11, 2013 4:34 am 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
I posted this mainly to get an idea if anybody has done research in this area.

Basically my idea consists of this:
Build a game tree with only F/C/R nodes and then fill the raise nodes with observed raise sizes (not absolute but in relation to pot and/or stack). Then when doing CFRM or whatever alteration of it, you pick raise sizes based on the observations. If the memory to keep these observations is an issue, one could probably just keep a distribution or approximate a function for the distribution and sample from that.

I'm not sure if this would be a stronger agent in bot vs. bot match-ups but I think it would make a lot more sense for real botting. And the game tree is amazingly small (basically like Limit poker instead of NL) so you can focus on card abstraction a lot more. Actually the tree should be smaller than the Limit tree because there are rarely more than 7 raises across all streets in one hand. That I know for a fact based on hand history from a couple years ago. Granted that might be a little different today with raise sizes getting smaller but not significantly, 8 maybe 9 tops, in most cases it's probably a lot less.

Sooo now that I've spilled my beans, does anybody have thoughts/numbers on this?

_________________
Cheers.


Top
 Profile  
 
PostPosted: Mon Mar 11, 2013 10:35 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
Hi coffee,

what do you mean with "and then fill the raise nodes with observed raise sizes"? Where did you observe them?
And how does your idea differs from the Holte paper?
I was thinking about implementing the sliding window approach as it seems pretty nice (better game value + smaller game tree) but haven't done it so far.


Top
 Profile  
 
PostPosted: Mon Mar 11, 2013 11:00 pm 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
I meant just keeping a record of raise sizes for each unique raise action in the tree. How you store it is essentially irrelevant.
You can observe them wherever you want, past ACPC results, bought hand histories, own played games, etc. That depends on your goal and your target villain I guess.

The idea differs in two ways (without rereading the paper so I hope I'm right):
1) It uses one raise action instead of multiple ones (FCR vs fchpa e.g.)
2) It doesn't randomize or solve for bet sizes but rather uses historic data to pick them.

It's definitely next on my list as well so I wanted to get some input from people that might have played around with this already.

_________________
Cheers.


Top
 Profile  
 
PostPosted: Tue Mar 12, 2013 3:43 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
I think I still didnt get your approach: lets assume we start with a newly created game tree. According to your approach, it has one betting action, right?
1. Which information is stored initially in it - apart from the regular regret/cum. strategy?
2. How does the decision node decide how much to bet?
3. How do you determine how deep the tree is, given the betting sizes are not fixed?

I just reread the paper and they also use a single betting node (and an all-in), which may have two betsizes (low and high). The latter change dynamically based on regret. However, even though they discuss it briefly in their paper, I'm not sure how they cope with the betting tree size having changing bet sizes.

For example: we initialize high=potsize, low=0.8*potsize and run the algorithm. Now we will find after the first short run, that P_low = 1, so we adjust to new betting sizes high=0.9, low=0.7. However, do we build basically a completely new tree (and forget about all regrets/cum. strategies)? If not, how can we transfer the information for information sets after the node - given we changed the parameters of the bet, those information might not be accurate anymore and even the structure of the tree can change significantly (getting deeper or shallower depending whether we decreased/increased the range).


Top
 Profile  
 
PostPosted: Tue Mar 12, 2013 4:31 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
Another note: given the system they propose in the paper, I'm not sure how it performs in terms of unabstracted game exploitability. For instance, if we use their approach and find out that we should do a only low bets of 0.2x pot in a certain spot, if villain pots into us (or overbets), this results in a way larger gap between the unabstracted action and the available actions in the abstracted model, thus probably leading in a bigger exploitation compared to a model where we have say 4 fixed bet sizes.


Top
 Profile  
 
PostPosted: Tue Mar 12, 2013 7:44 pm 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
I guess my (initial) approach is different in that it doesn't try to find the perfect bet size but instead uses observed hand histories to create a distribution of possible bet sizes and then when running the algorithm you pick bet sizes randomly based on that distribution.
Let's see if I can answer your questions:
1) Nothing, you have a separate data structure (probably a tree as well) to save observed bet sizes and to pick from them (similar to the bet size game from the paper)
2) Randomly, based on aformentioned data structure, aka observed bet size distribution. So if we see 70% pot bets and 30% half pot bets in a certain game state we pick those values at the same rates.
3) In terms of tree size, you can use observed hand histories as well to find the maximum of raises in any given game (or a number that covers 99.9% of observed games) and then you just prune the tree to that number only allowing the last raise to be an all-in bet.

I don't have mathematical proof that this would perform well at all but in my mind it makes sense and kind of blurrs the line of exploitability because of fixed bet sizes. I wouldn't be surprised if this isn't enough yet and you had to split up that one raise node into two to account for different game scenarios (a lower bet needs to be handled differently than a higher bet) but the idea is the same.

Or what you can do instead is initialize low/high of the papers approach using observed hand histories and then just run it as described in the paper.

_________________
Cheers.


Top
 Profile  
 
PostPosted: Tue Mar 12, 2013 8:48 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
I like C4tWs idea. You're essentially trading utility accuracy for hand accuracy in the f/c nodes, yes? I wonder if the trade-off will increase value. You could actually crunch multiple EQs with different raise functions. Then, during live play you could pick (via simulation and an opponent model) which raise function would most likely have played the actual game state, and use the EQ generated with it, thus narrowing the utility inaccuracy. Or, you could use something like Importance Sampling to select which strategy has the highest value versus a given opponent. These EQs would also converge much faster (faster than HUL) because only a small portion of the game is being computed.


Top
 Profile  
 
PostPosted: Tue Mar 12, 2013 9:55 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
I think it won't work - if I understood it correctly - and here's why:

1. Lets assume we have a node and found with datamining that most bets are 30% or 70% of the pot. Then, the expected betsize - given we perform the selection randomly - is 50% pot, hence our algorithm would lead to exactly the same EV like a fixed betsize of half pot. What might be valuable however, is to mine the observed bet sizes and uses the information when creating our game tree by using the average (if we only want one betting node).
2. The problem with the tree size is not that easy. Assume we have observed that players bet postflop either 50% or 100% pot. We are in the first postflop node with a pot of 4bb and have 18bb stacks. The possible final potsizes if P2 can only call are:
- b0.5, b0,5, b0,5: 32 - valid
- b0,5, b0,5, b1: 48 - valid, but we are all-in with last bet
- b1, b1, b1: 108 - invalid, we would be all in on the turn already!
Given there are multiple raises possible, the range of possible final pot sizes increases in practice way more...
3. Regarding exploitability: I dont see what we gain if we follow your approach. Lets assume we have a betting node and the model sais its either bet 50% or 100% pot. Villain wants to exploit us and bets 50% with all his draws/weaker hands and 100% purely for value. I'd guess its still possible, right?

Let me know if I misinterpreted something with your idea. And sorry for playing the devils advocate - I appreciate the discussion a lot ;-)


Top
 Profile  
 
PostPosted: Tue Mar 12, 2013 10:35 pm 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
Nasher wrote:
I like C4tWs idea. You're essentially trading utility accuracy for hand accuracy in the f/c nodes, yes?

Yeah I believe that would be the case. Also I believe it would add a kind of non-exploitability effect as Restricted Nash Response (RNR) does, just not based on hands but based on raise sizes.

proud2bBot wrote:
I think it won't work - if I understood it correctly - and here's why

1) While the average is the same I do think the result will be different if we pick random values at each node because that essentially changes the utility of each action sequence. If you only have fixed ones then one action sequence will always be greater in utility than the other, which might not be true in real life. If we randomize it, the utility changes each time and will eventually lead to a different result. If you can mathematically prove that I'm wrong I believe you, you might actually be on to something there but I'd like to believe it's better. Maybe I am mixing two inherently different things up here but why else would RNR work for example?
2) This is no problem at all I think. For each game or iteration you make different sizes and actions valid or invalid. If we are not all-in by the river, fine. If we are all-in by the river, okay. If we are all-in by the turn, force check/check the river. The resulting pot size and therefore utility needs to be dynamically determined for each iteration.
3) Yeah that's why I was saying we might need to split up the one node into two. This is opponent depended though, if we have a model that does well against the average opponent we are in a pretty good position. For exploiting opponents we can use what Nasher said and precompute multiple ones including RNR into this approach. Essentially what I am trying to gain is a small tree, not necessarily a less exploitable abstraction but I still believe it is a lot better than anything with one fixed size raise node would be. Is it better than two fixed size raise nodes? Hard to say but that's kind of an unfair comparison.

_________________
Cheers.


Top
 Profile  
 
PostPosted: Tue Mar 12, 2013 11:23 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
1. You're right. Though, I was thinking of using a model where hand value is an attribute.

2. This is what I meant by trading utility accuracy for hand accuracy. In a situation where f/c is the only option you'd probably have to ignore the raise probability and re-normalize.

3. My approach would help reduce the state error associated with #2, yes? Not completely, but it would give you an abstraction more closely matching the real game state. You'd want to use the EQ with a raise function most closely matching the opp model. I'm not sure I follow the point you're making about exploitability. We could increase our value against villain by either using an EQ with a game state that resembles, if only in frequency, his behavior or by Importance Sampling, which scores all strategies based on past gains/losses.

All this is assuming we have a raise function and opp model with hand value attributes, so maybe this is where we're diverging.


Top
 Profile  
 
PostPosted: Wed Mar 13, 2013 12:34 am 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
Right, I actually wasn't considering hand value as an attribute for the raise function because that adds a whole new level of complexity if you are using observed hand histories as your base where you won't be able to see hero's hands.

But that's fine in my opinion because we aren't trying to come op with a perfect response to an opponent that is giving away hand strength by selecting certain bet sizes. A good opponent will choose raise sizes only based on globally observable attributes such as pot size and stack size, which I want to use fractions of to measure the bet sizes instead of using absolute values.

The point I was trying to make about exploitability is the fact that a fixed size bet abstraction is always heavily exploitable in the real game once you figure out the thresholds and state translation function. Randomizing bet sizes should negate that fact or at least make it a LOT more difficult to figure out.

_________________
Cheers.


Top
 Profile  
 
PostPosted: Wed Mar 13, 2013 1:09 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
Just to make it clear, my points are all based on the assumption that we develop a general method for different bet sizes independent of our hole cards and not having an opponent model (I thing nasher is going in this direction).


Coffee4tw wrote:
1) While the average is the same I do think the result will be different if we pick random values at each node because that essentially changes the utility of each action sequence. If you only have fixed ones then one action sequence will always be greater in utility than the other, which might not be true in real life. If we randomize it, the utility changes each time and will eventually lead to a different result. If you can mathematically prove that I'm wrong I believe you, you might actually be on to something there but I'd like to believe it's better. Maybe I am mixing two inherently different things up here but why else would RNR work for example?

As far as I understand, the authors of the paper proved it here: http://poker.cs.ualberta.ca/publications/AAAI11.pdf

Coffee4tw wrote:
3) Yeah that's why I was saying we might need to split up the one node into two. This is opponent depended though, if we have a model that does well against the average opponent we are in a pretty good position. For exploiting opponents we can use what Nasher said and precompute multiple ones including RNR into this approach. Essentially what I am trying to gain is a small tree, not necessarily a less exploitable abstraction but I still believe it is a lot better than anything with one fixed size raise node would be. Is it better than two fixed size raise nodes? Hard to say but that's kind of an unfair comparison.


I still don't get why it should be less exploitable. Lets assume we are in the learning phase. Out data suggest that we should bet 1/3 or 2/3 of the pot each half of the time, so on average we make 50% pot bet. As the algorithm isnt aware of our cards and bets randomly, the resulting strategy of this node would be the same as a fixed betsize of 1/2 pot.
Now we apply your algorithm in a real game and our opponent bets into us. He valuebets 2/3pot and bluffs 1/3 pot. As we just see a bet, we still perform similar against both actions which leads to getting exploitet.


Thinking a bit more about your idea, I think it has some value given our abstraction does not use board textures: on boards like 552r bets are typically small while bets on JT8ss are typically large for obvious reasons. Even if we have different bet size nodes, the algorithm can't exploit those given our abstraction isnt aware of public cards. Given your approach, however, we could bet differently depending on the board texture. This should increase our EV a bit, but not change our exploitability.


Top
 Profile  
 
PostPosted: Thu Mar 14, 2013 12:38 am 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
Okay I can't come up with a compelling counter anymore why this should work. For some reason I have a feeling that there is value in incorporating the variance of poker without adding more nodes to the tree. There HAS to be a way... This isn't limited to bet/pot sizes but also to opponent hand ranges.

Maybe we do need to create an opponent model with that approach and incorporate it before it actually becomes useful.

_________________
Cheers.


Top
 Profile  
 
PostPosted: Thu Mar 14, 2013 4:22 am 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
Opponent-specific there are for sure spots where this is relevant. For instance, if you take over the mouse from your bot and play a session and have a nut-like hand on the river. If you assume your opponents range is full of very weak hands/air, you'd bet something like 1/4th of the pot to either get value from weak second pair/third pair type of hands or induce bluff raises, while you would overbet if your opponents range is very strong but still weaker than your hand as he "cant" fold it. But this is an exploit, i.e. an inbalanced strategy (we could balance it e.g. by overbetting nuts and air in the right frequencies, but it would lead to a lower EV vs an e-Nash opponent as it affects also all of our other hands).


Top
 Profile  
 
PostPosted: Tue Apr 07, 2020 6:23 pm 
Offline
New Member

Joined: Mon Apr 06, 2020 8:07 pm
Posts: 2
I'm not sure if this would be a stronger agent in bot vs. bot match-ups but I think it would make a lot more sense for real botting. And the game tree is amazingly small (basically like Limit poker instead of NL) so you can focus on card abstraction a lot more. Actually the tree should be smaller than the Limit tree because there are rarely more than 7 raises across all streets in one hand. That I know for a fact based on hand history from a couple years ago. Granted that might be a little different today with raise sizes getting smaller but not significantly, 8 maybe 9 tops, in most cases it's probably a lot less.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 16 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group