Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 2:22 pm

All times are UTC




Post new topic Reply to topic  [ 15 posts ] 
Author Message
PostPosted: Mon Apr 22, 2013 4:49 am 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
CFRM has been used to construct models that play sufficiently robust but exploitive at the same time (RNR, DBRCS). However, these techniques require a basic understanding of how villain plays, e.g. with which kind of hands/buckets he will take an action. Obviously, this information is hard to get if we operate outside of the academic area, where we could get the hole cards even if villain foldet. In real games, however, we might have only holecard information of maybe 5% of all cards and even if we have a lot of hands, we cannot use it directly as those are biased towards the stronger hands or draws that got there - weaker hands/busted draws are typically folded before SD.
However, an unbiased information we can get are the action frequencies, i.e., how often does he fold/call/raise. Now given these information are correct and a good prediction for future actions, I wonder if anyone has an idea how to force the MCCFRM algorithm to obey these restrictions?
For instance, we know that we should fold/call/raise 10%/50%/40% of hands in a certain spot according to GTO, but our opponent only plays a raise/fold game with frequencies like 40%/0%/60% for example. The basic idea is to force the algorithm that he will ensure that the frequencies match our opponent model, but he is free to choose which hands to play with which action (and thus would find the perfect mix given the frequencies).
Has anybody worked on this or an idea how to implement the restricting?


Top
 Profile  
 
PostPosted: Mon Apr 22, 2013 11:48 pm 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
I think to achieve this you would need to add a normalizing step after each iteration (or after each x number of iterations) and kind of shovel the regrets for hands around so that they end up to be the action sequences that you want to have.

This is just speculating now but you'll probably want to have some kind of order of hands by their regret values and then move the ones that are closer to the bucket that you need to redistribute to over.

For example:
You have three different hands that have the following action probabilities based on accumulated regret:
Code:
A: 0/0/100
B: 0/50/50
C: 50/50/0
==========
17/33/50


But you want to achieve:
Code:
50/0/50


Intuitively I would shift all of C's call actions and all of B's call actions to fold to achieve this. This example is probably too easy to solve but essentially we need more folds and less calls so at first we move percentage points from the action triplet that's closest to 100/0/0.

If I had to put this in an algorithm then it would look something like this:
Code:
1) Calculate action frequency for each action over all hands (in integer percentages for simplicity)
2) Decide which action we need to move from and which we need to move to (constraint, only neighboring moves are allowed, not fold->raise or raise-> fold)
3) Calculated which hand's frequency is closest to (but different from) the desired 100% action that we want to move to (or any other metric)
4) Move one percentage point over (or all or a fraction)
5) Recalculate overall action frequency
6) If we still need to adjust, repeat from step 2, otherwise stop.


There probably is a way to do this "online" while doing your regular CFRM iterations, possibly by filtering the regret into the action frequency that you want (aka dropping the regret for actions that are beyond the desired threshold) but I currently can't think of a way that wouldn't lead to uniform action frequencies across all hands that way.

_________________
Cheers.


Top
 Profile  
 
PostPosted: Tue Apr 23, 2013 5:28 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
@p2bb Clever :!: - and maybe you could force the showdowns to be accurate too.


Top
 Profile  
 
PostPosted: Fri May 03, 2013 5:28 pm 
Offline
Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34
not sure how this can be done, but a simple approach which could work reasonably well would be to simply alter the ev of the actions. if our equilibrium strategy 3bets 10% and we want to force 20%, we increase the ev of that action by some percentage.

if we find the percentages which genererates an equilibrium with the right frequencies (not sure how hard these percentages are to find), then that equilibrium would probably be close to the optimal one with the forced frequencies.


Top
 Profile  
 
PostPosted: Sat May 04, 2013 9:39 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
somehomelessguy wrote:
not sure how this can be done, but a simple approach which could work reasonably well would be to simply alter the ev of the actions. if our equilibrium strategy 3bets 10% and we want to force 20%, we increase the ev of that action by some percentage.

if we find the percentages which genererates an equilibrium with the right frequencies (not sure how hard these percentages are to find), then that equilibrium would probably be close to the optimal one with the forced frequencies.


How do we choose the strength bucket in the action whose frequency we want to modify?


Top
 Profile  
 
PostPosted: Sat May 04, 2013 5:04 pm 
Offline
Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34
spears wrote:
somehomelessguy wrote:
not sure how this can be done, but a simple approach which could work reasonably well would be to simply alter the ev of the actions. if our equilibrium strategy 3bets 10% and we want to force 20%, we increase the ev of that action by some percentage.

if we find the percentages which genererates an equilibrium with the right frequencies (not sure how hard these percentages are to find), then that equilibrium would probably be close to the optimal one with the forced frequencies.


How do we choose the strength bucket in the action whose frequency we want to modify?


we increase the payoff for the action we want to increase the frequency of (regardless of strength bucket). then, automagically, the optimal strength buckets should switch action. it's just a matter of finding the payoff increase which corresponds to the correct frequency increase.

i was thinking something like this:

1. run for n iterations
2. alter payoff for misrepresented actions
3. goto 1. maybe reset all regrets or atleast make the regrets from the previous iterations less influential.


Top
 Profile  
 
PostPosted: Sat May 04, 2013 6:46 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
Short update: I tried it out and it works, but adds a huge overhead to the general process. First we need to calculate SB and BBs strategy separately, which doubles the time. Second we need to find the fixed strategy after xM iterations (we need to do it frequently as it has a big impact on the EV), which also slows down the learning significantly. Still thinking about how to make it faster...


Top
 Profile  
 
PostPosted: Sat May 04, 2013 9:13 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
OK, I think I get it. You want to change the payoff at the leaves, not the ev in the branches. So increasing the payoff at a leaf increases the frequency for the branches leading to that leaf.

I wonder if there is a simple relationship between delta payoff and delta frequency.


Top
 Profile  
 
PostPosted: Sun May 05, 2013 12:22 am 
Offline
Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34
spears wrote:
OK, I think I get it. You want to change the payoff at the leaves, not the ev in the branches.


does it make a difference?


Top
 Profile  
 
PostPosted: Sun May 05, 2013 7:27 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
I don't have payoffs at internal nodes. I have evs and they are a function of strength.


Top
 Profile  
 
PostPosted: Thu May 16, 2013 10:11 am 
Offline
Junior Member

Joined: Mon Apr 08, 2013 1:13 pm
Posts: 15
Has anyone tried to strictly define the problem?

If you try to pose the problem as "find the nash equilibrium" then there is an issue, I think.

Let's say we are fixing player2's action probabilities. As soon as player1 changes his strategy we find out that constraints on player2's probabilities are broken (because probabilities of getting to this node for different information sets will change). So the set of strategies available for player2 to choose from depends on player1's strategy. The definition of NE hinges on one player's deviation from his current strategy being profitable. But this setting adds the issue of a deviation being valid... :|


Top
 Profile  
 
PostPosted: Wed May 22, 2013 4:28 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
p2bb mysteriously disappeared after he said he got this working. I wonder if the government came to silence him? :shock:


Top
 Profile  
 
PostPosted: Wed May 22, 2013 4:38 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
spears wrote:
I wonder if there is a simple relationship between delta payoff and delta frequency

By delta, you mean the rate of change? Or, probability alone?


Top
 Profile  
 
PostPosted: Thu May 23, 2013 9:50 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
This is just a vague idea. At a decision node in an equilibrium strategy the evs of actions with probabilities other than 0 or 1 are all equal. This means the action probability and the ev of the successor node are inversely proportional. So it might be, that close to equilibrium, if you increase the ev of the successor node by some small amount, then the action frequency might decrease proportionally.


Top
 Profile  
 
PostPosted: Sun Jun 30, 2013 9:35 pm 
Offline
Junior Member
User avatar

Joined: Mon Jun 03, 2013 9:06 pm
Posts: 20
somehomelessguy wrote:
not sure how this can be done, but a simple approach which could work reasonably well would be to simply alter the ev of the actions. if our equilibrium strategy 3bets 10% and we want to force 20%, we increase the ev of that action by some percentage.

if we find the percentages which genererates an equilibrium with the right frequencies (not sure how hard these percentages are to find), then that equilibrium would probably be close to the optimal one with the forced frequencies.


Is that this algorithm produces the least dominated strategy with 20% 3bet ?


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 15 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group