Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 5:30 pm

All times are UTC




Post new topic Reply to topic  [ 59 posts ]  Go to page 1, 2, 3  Next
Author Message
 Post subject: Dynamic Bucketing
PostPosted: Fri Mar 15, 2013 12:23 am 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
I very much like that idea Nasher and it goes along the lines of what I am thinking about.
As for your player clones, do you have/consider holecard data or are you just using public information?

As for the hand abstraction problem, I could never wrap my head around the idea that hundreds of buckets are needed to get this done right. If you think about a pro player and his thought process he is going to bucket different hands into ranges and play all hands in those ranges the same to avoid being read and exploitable. So I have this idea in my head that we need only a few betting sequences based on historic data, then bucket the public information aka boards into reasonable buckets that require different strategy approaches and then we have 3 types of hands that we need to consider: bluffing hands, semi-bluffs, value. Maybe a forth: thin value.

The problem there is that individual hands might have to change buckets during the learning phase because our opponent's strategy changes and therefore the line between the categories shifts.

AFAIK no academic approach ever has used dynamic bucketing, am I right?

_________________
Cheers.


Top
 Profile  
 
 Post subject: Re: The memory dilemma
PostPosted: Fri Mar 15, 2013 12:48 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Yes, they use hole card data. However, it's not necessary, as you could probably just as easily make decisions with a different strategy and just query the player clone based on public info alone, or deduce the bet nodes in some other fashion. It all works out the same in the end: player X will bet according to some distribution at game state Y. The reason for doing so is irrelevant (to this purpose).

I don't recall anybody using dynamic bucketing as a means of opponent exploitation. Maybe the same thing is accomplished offline via mixed RNR/DBR strategies, but I don't recall it mentioned explicitly. It would be another interesting thing to try.


Top
 Profile  
 
 Post subject: Re: The memory dilemma
PostPosted: Fri Mar 15, 2013 1:35 am 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
I mean you would have to do it to converge to an equilibrium as well, not only to exploit a certain player. I meant exploiting the player as a means to get to the equilibrium.
As part of calculating a best response basically.

_________________
Cheers.


Top
 Profile  
 
 Post subject: Re: The memory dilemma
PostPosted: Fri Mar 15, 2013 1:48 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
How would you calculate that? And, how would you store the information if the buckets are constantly changing?


Top
 Profile  
 
 Post subject: Re: The memory dilemma
PostPosted: Fri Mar 15, 2013 6:14 am 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
1) You have full knowledge of your opponents possible hands in CFRM so you define your buckets something like this:
- Value: Everything that's +EV against any hand that villain might showdown
- Semi-Bluff: Everything that's -EV against villains showdown hands but +EV considering villains folds
- Bluff: Hands that are -EV regardless.

2) You'll just keep the number of buckets the same and pretend they are not changing. Keep accumulating regret for each bucket and there you go.

_________________
Cheers.


Top
 Profile  
 
 Post subject: Re: The memory dilemma
PostPosted: Fri Mar 15, 2013 1:40 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
Regarding the bucketing decision, I always have the thought/thesis that our current approaches are just not efficient and thats why we need lots of different buckets. A simple example and an idea how to overcome it:
Lets assume we are on the river and use EHS bucketing with 50 buckets in a 1% range - very simple bucketing scheme. Now we run CFRM for xM iterations and look at the river nodes and will find out that the strategy is basically FOLD 100% for the bottom x buckets. Hence, we could merge those buckets into a single one and make space for other buckets and rerun CFRM with the updated scheme again. Hence, we have different bucket sizes based on the relevance in the strategy. Maybe it's smart to incorporate the EV of a node (higher EVs are more relevant to our strategy and thus should get more attention, i.e., finer grained abstractions). I didn't implement something like this yet, but still think it could be worth it.

Edit: an even better measure for spots where we want to have more detailed buckets would be a combination of EV and variance, though I'm not sure if thats feasible given restricted memory.


Top
 Profile  
 
 Post subject: Re: The memory dilemma
PostPosted: Fri Mar 15, 2013 7:52 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Coffee4tw wrote:
1) You have full knowledge of your opponents possible hands in CFRM so you define your buckets something like this:
- Value: Everything that's +EV against any hand that villain might showdown
- Semi-Bluff: Everything that's -EV against villains showdown hands but +EV considering villains folds
- Bluff: Hands that are -EV regardless.

2) You'll just keep the number of buckets the same and pretend they are not changing. Keep accumulating regret for each bucket and there you go.


How would you apply that to the real game?


Top
 Profile  
 
 Post subject: Re: The memory dilemma
PostPosted: Sat Mar 16, 2013 12:10 am 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
If you start changing the number of buckets during your iterations that might screw with the regret values. combining buckets probably is okay but splitting them up might be a problem. Or you just rerun CFRM from scratch but then you are really screwing yourself on time spent.

What do you mean apply that to the real game? In the end you'll have a strategy with ranges for each bucket so you can just play according to those without changing them anymore.
It's just during your iterations that you are using private information available to the algorithm to determine strength or hands versus the strategy of the opponent at the time.

Actually I have an addition about the bucket distribution. I don't think Value hands should only be against showndown hands but maybe against all hands that villain is willing to continue with (aka not fold) in the next node. Not sure which one is going to be more effective in the end but something like that would be good.
OR we just define buckets somewhat arbitrarily into:
- either a percentage of our range, ordered by strength against villains hand (5 buckets, 20% of our hands in each)
- or distributed by strength (highest-lowest EV / 5).

That's similar to what people were doing before but they were doing it static and against a 100% range I think, that's basically what EHS/EHS2 is, isn't it?

_________________
Cheers.


Top
 Profile  
 
 Post subject: Re: The memory dilemma
PostPosted: Sat Mar 16, 2013 12:41 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Some things that come to mind:

- How do you translate hand X into EV during live play? And, how do get the current regret to pass forward in CFRM if the EV is going to dynamically decide your bucket?
- I like the idea of splitting up buckets like this, but I still think the granularity matters when facing a random opponent.


Top
 Profile  
 
 Post subject: Re: The memory dilemma
PostPosted: Sat Mar 16, 2013 12:48 am 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
- You don't need to translate it to EV in live play, you just use the last hand to bucket distribution from your learning run. I expect this to converge towards the end.
- You update the range buckets on your way down based on EV (starting from the leafs to the root) and get the regret on the way back (root to leaf).
- Sure the granularity might not be optimal with 3-5 buckets but I think this is going to be superior to other, static 3-5 bucket abstractions. A LOT superior.

_________________
Cheers.


Top
 Profile  
 
 Post subject: Re: The memory dilemma
PostPosted: Sat Mar 16, 2013 1:05 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
So, each bucket represents a range of hands. A range, meaning A to B. Are A and B numbers? How do you define what hands are > A and < B? How do you store that, in an array for example?

To calculate the EV you need to pass the algorithm (or just "use") the previous regret first. Are you meaning you pass it the old regret, get the EV, then update the range lines with the new EV?

How do you update the range lines? Some hands may be mostly +EV, but occasionally -EV, assuming you're doing monte-carlo sampling. You could maybe adjust the line by some weighted value for each hand that switches buckets, based on how far it is from the center.


Top
 Profile  
 
 Post subject: Re: The memory dilemma
PostPosted: Sat Mar 16, 2013 1:34 am 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
How you store the range is up to you. A list of hands should do but if you want to save space (which is the point of this thread) you can probably come up with a different scheme for that.
Easiest is having a boolean array with 1326 elements and storing true or false whether that holding is part of the range or not.

I think we are talking about two different EVs here. For the purposes of the regret algorithm you need EVs for the buckets and each action. For the purposes of adjusting the hand to bucket mapping (aka range adjusting) you need per hand EVs. So to make this less confusing I think you need to do the updating of the buckets in between regret iterations but then you continue with the old regret, get the new EV for the buckets and update the cumulative regret.

_________________
Cheers.


Top
 Profile  
 
 Post subject: Re: The memory dilemma
PostPosted: Sat Mar 16, 2013 1:46 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Coffee4tw wrote:
How you store the range is up to you. A list of hands should do but if you want to save space (which is the point of this thread) you can probably come up with a different scheme for that.
Easiest is having a boolean array with 1326 elements and storing true or false whether that holding is part of the range or not.

I don't understand. You're talking about the preflop? What about the other rounds?


Top
 Profile  
 
 Post subject: Re: The memory dilemma
PostPosted: Sun Mar 17, 2013 12:52 am 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
No, all rounds. Boards are their own nodes so you only need to keep track of the (possible) two cards in your hand.

_________________
Cheers.


Top
 Profile  
 
 Post subject: Re: The memory dilemma
PostPosted: Sun Mar 17, 2013 1:37 am 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
If you want to store 5 or 7 card hands then you'll probably need to keep a list instead of an array. That could become pretty memory costly I guess.

_________________
Cheers.


Top
 Profile  
 
 Post subject: Re: Dynamic Bucketing
PostPosted: Sun Mar 17, 2013 1:59 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Yeah, I'm thinking in terms of a typical CFRM setup, where at each action node there is an array containing the AS and CR for the entire hand, however it's formatted. Something like this on the turn:

regret[flop_texture, turn_texture, EHS2] += something;

How to dynamically partition that isn't clear to me, but I'm sure there are other ways of thinking about the problem that I'm not accustom to.


Top
 Profile  
 
 Post subject: Re: Dynamic Bucketing
PostPosted: Sun Mar 17, 2013 2:30 am 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
For the regrets you would just do:
regret[bucket x] += something

How you keep track of what belongs into which bucket is orthogonal to that.

_________________
Cheers.


Top
 Profile  
 
 Post subject: Re: Dynamic Bucketing
PostPosted: Sun Mar 17, 2013 3:03 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Right. "How" is the question, for all hands. :) How would you do it with MC CFRM where the hand is in different buckets depending on chance?

I guess you could keep some sort of LUT, but that would get ridiculously large for all hands, even if you just used a byte to represent the bucket for each one. Otherwise you would need to encode the hand in some way, into EHS^2 for example, and then decide somehow, dynamically, what bucket it would land in.

:idea:

It's a clustering problem. You use EV to decide/adjust your bucket centroids. The centroids representing EHS^2 and HP, or whatever you want. In practice, you just find the bucket with the least distance to your current hand. Problem with that is you would need to keep another array representing your centroids, which would take 1/3 more memory.


Top
 Profile  
 
 Post subject: Re: Dynamic Bucketing
PostPosted: Sun Mar 17, 2013 3:35 am 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
I like the clustering idea. I hadn't thought of that before.

However, I don't think it's going to work if we want to get our hand's EV against villain's actual range because we can't get villain's range anymore if we just store the centroids for buckets. It could probably be estimated by sampling or something but that would take too long to actually be useful in the end I think.

The LUT would essentially have to be of size = nodes * buckets * x, with x being either all hands or some other better approach. A string in the form A2o+, KTs+ etc maybe.

I think saving the one hand at the bottom limit of each bucket might be enough, e.g. using EHS.
For that to work though the following has to be true:
Consider two holdings h1 and h2 against a set villain range (e.g. top 10% of holdings). If EHS(h1) > EHS(h2) then EV(h1) > EV(h2), where EV is the expected value against villain's range.
Is that true for all hands? Hmm probably not ...

_________________
Cheers.


Top
 Profile  
 
 Post subject: Re: Dynamic Bucketing
PostPosted: Sun Mar 17, 2013 3:45 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Why do we need to know villains range? We just need to know the EV after the CFRM iteration, right?

When getting the regret on the way down the tree:

Code:
bucket = find(centroids, EHS2, HP);

for all i ..
s[i] =  positiveNormalized(regret[bucket, i]);
u[i] = doCFRM(s[i], ... );
ev += u[i] * s[i];

const double weight = 0.999995;
regret_bucket = divideWithBounds(u[i] - ev, centroids.length);
centroids[regret_bucket] = centroids[regret_bucket] * weight + new_centroid;

for all i ...
regret[regret_bucket] += u[i] - ev;


Updating multiple centroids is probably more complicated, but you get the idea. And, you're right, it would take more time to search for the regret bucket. But, if it's ordered, you could use a binary search. Question is: would the improved hand bucketing be worth the memory trade-off?


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 59 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group