Poker-AI.org • View topic - MCCFRM: Forcing action frequencies in learning process?

View unanswered posts | View active topics

Board index » Public Forums » AI Research

All times are UTC

MCCFRM: Forcing action frequencies in learning process?

Page 1 of 1

[ 15 posts ]

Print view

Previous topic | Next topic

Author

Message

proud2bBot

Post subject: MCCFRM: Forcing action frequencies in learning process?

Posted: Mon Apr 22, 2013 4:49 am

Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216

CFRM has been used to construct models that play sufficiently robust but exploitive at the same time (RNR, DBRCS). However, these techniques require a basic understanding of how villain plays, e.g. with which kind of hands/buckets he will take an action. Obviously, this information is hard to get if we operate outside of the academic area, where we could get the hole cards even if villain foldet. In real games, however, we might have only holecard information of maybe 5% of all cards and even if we have a lot of hands, we cannot use it directly as those are biased towards the stronger hands or draws that got there - weaker hands/busted draws are typically folded before SD.
However, an unbiased information we can get are the action frequencies, i.e., how often does he fold/call/raise. Now given these information are correct and a good prediction for future actions, I wonder if anyone has an idea how to force the MCCFRM algorithm to obey these restrictions?
For instance, we know that we should fold/call/raise 10%/50%/40% of hands in a certain spot according to GTO, but our opponent only plays a raise/fold game with frequencies like 40%/0%/60% for example. The basic idea is to force the algorithm that he will ensure that the frequencies match our opponent model, but he is free to choose which hands to play with which action (and thus would find the perfect mix given the frequencies).
Has anybody worked on this or an idea how to implement the restricting?

Top

Coffee4tw

Post subject: Re: MCCFRM: Forcing action frequencies in learning process?

Posted: Mon Apr 22, 2013 11:48 pm

Site Admin

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230

I think to achieve this you would need to add a normalizing step after each iteration (or after each x number of iterations) and kind of shovel the regrets for hands around so that they end up to be the action sequences that you want to have.

This is just speculating now but you'll probably want to have some kind of order of hands by their regret values and then move the ones that are closer to the bucket that you need to redistribute to over.

For example:
You have three different hands that have the following action probabilities based on accumulated regret:

Code:

A: 0/0/100 
B: 0/50/50
C: 50/50/0
==========
17/33/50

But you want to achieve:

Code:

50/0/50

Intuitively I would shift all of C's call actions and all of B's call actions to fold to achieve this. This example is probably too easy to solve but essentially we need more folds and less calls so at first we move percentage points from the action triplet that's closest to 100/0/0.

If I had to put this in an algorithm then it would look something like this:

Code:

1) Calculate action frequency for each action over all hands (in integer percentages for simplicity)
2) Decide which action we need to move from and which we need to move to (constraint, only neighboring moves are allowed, not fold->raise or raise-> fold)
3) Calculated which hand's frequency is closest to (but different from) the desired 100% action that we want to move to (or any other metric)
4) Move one percentage point over (or all or a fraction)
5) Recalculate overall action frequency
6) If we still need to adjust, repeat from step 2, otherwise stop.

There probably is a way to do this "online" while doing your regular CFRM iterations, possibly by filtering the regret into the action frequency that you want (aka dropping the regret for actions that are beyond the desired threshold) but I currently can't think of a way that wouldn't lead to uniform action frequencies across all hands that way.

_________________
Cheers.

Top

spears

Post subject: Re: MCCFRM: Forcing action frequencies in learning process?

Posted: Tue Apr 23, 2013 5:28 am

Site Admin

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642

@p2bb Clever :!:

- and maybe you could force the showdowns to be accurate too.

Top

somehomelessguy

Post subject: Re: MCCFRM: Forcing action frequencies in learning process?

Posted: Fri May 03, 2013 5:28 pm

Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34

not sure how this can be done, but a simple approach which could work reasonably well would be to simply alter the ev of the actions. if our equilibrium strategy 3bets 10% and we want to force 20%, we increase the ev of that action by some percentage.

if we find the percentages which genererates an equilibrium with the right frequencies (not sure how hard these percentages are to find), then that equilibrium would probably be close to the optimal one with the forced frequencies.

Top

spears

Post subject: Re: MCCFRM: Forcing action frequencies in learning process?

Posted: Sat May 04, 2013 9:39 am

Site Admin

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642

somehomelessguy wrote:

How do we choose the strength bucket in the action whose frequency we want to modify?

Top

somehomelessguy

Post subject: Re: MCCFRM: Forcing action frequencies in learning process?

Posted: Sat May 04, 2013 5:04 pm

Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34

spears wrote:

somehomelessguy wrote:

How do we choose the strength bucket in the action whose frequency we want to modify?

we increase the payoff for the action we want to increase the frequency of (regardless of strength bucket). then, automagically, the optimal strength buckets should switch action. it's just a matter of finding the payoff increase which corresponds to the correct frequency increase.

i was thinking something like this:

1. run for n iterations
2. alter payoff for misrepresented actions
3. goto 1. maybe reset all regrets or atleast make the regrets from the previous iterations less influential.

Top

proud2bBot

Post subject: Re: MCCFRM: Forcing action frequencies in learning process?

Posted: Sat May 04, 2013 6:46 pm

Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216

Short update: I tried it out and it works, but adds a huge overhead to the general process. First we need to calculate SB and BBs strategy separately, which doubles the time. Second we need to find the fixed strategy after xM iterations (we need to do it frequently as it has a big impact on the EV), which also slows down the learning significantly. Still thinking about how to make it faster...

Top

spears

Post subject: Re: MCCFRM: Forcing action frequencies in learning process?

Posted: Sat May 04, 2013 9:13 pm

Site Admin

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642

OK, I think I get it. You want to change the payoff at the leaves, not the ev in the branches. So increasing the payoff at a leaf increases the frequency for the branches leading to that leaf.

I wonder if there is a simple relationship between delta payoff and delta frequency.

Top

somehomelessguy

Post subject: Re: MCCFRM: Forcing action frequencies in learning process?

Posted: Sun May 05, 2013 12:22 am

Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34

spears wrote:

OK, I think I get it. You want to change the payoff at the leaves, not the ev in the branches.

does it make a difference?

Top

spears

Post subject: Re: MCCFRM: Forcing action frequencies in learning process?

Posted: Sun May 05, 2013 7:27 am

Site Admin

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642

I don't have payoffs at internal nodes. I have evs and they are a function of strength.

Top

alex

Post subject: Re: MCCFRM: Forcing action frequencies in learning process?

Posted: Thu May 16, 2013 10:11 am

Junior Member

Joined: Mon Apr 08, 2013 1:13 pm
Posts: 15

Has anyone tried to strictly define the problem?

If you try to pose the problem as "find the nash equilibrium" then there is an issue, I think.

Let's say we are fixing player2's action probabilities. As soon as player1 changes his strategy we find out that constraints on player2's probabilities are broken (because probabilities of getting to this node for different information sets will change). So the set of strategies available for player2 to choose from depends on player1's strategy. The definition of NE hinges on one player's deviation from his current strategy being profitable. But this setting adds the issue of a deviation being valid...

Top

cantina

Post subject: Re: MCCFRM: Forcing action frequencies in learning process?

Posted: Wed May 22, 2013 4:28 am

Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437

p2bb mysteriously disappeared after he said he got this working. I wonder if the government came to silence him? :shock:

Top

cantina

Post subject: Re: MCCFRM: Forcing action frequencies in learning process?

Posted: Wed May 22, 2013 4:38 am

Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437

spears wrote:

I wonder if there is a simple relationship between delta payoff and delta frequency

By delta, you mean the rate of change? Or, probability alone?

Top

spears

Post subject: Re: MCCFRM: Forcing action frequencies in learning process?

Posted: Thu May 23, 2013 9:50 pm

Site Admin

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642

This is just a vague idea. At a decision node in an equilibrium strategy the evs of actions with probabilities other than 0 or 1 are all equal. This means the action probability and the ev of the successor node are inversely proportional. So it might be, that close to equilibrium, if you increase the ev of the successor node by some small amount, then the action frequency might decrease proportionally.

Top

Isildur11

Post subject: Re: MCCFRM: Forcing action frequencies in learning process?

Posted: Sun Jun 30, 2013 9:35 pm

Junior Member

Joined: Mon Jun 03, 2013 9:06 pm
Posts: 20

somehomelessguy wrote:

Is that this algorithm produces the least dominated strategy with 20% 3bet ?

Top

Page 1 of 1

[ 15 posts ]

Board index » Public Forums » AI Research

All times are UTC

Who is online

Users browsing this forum: Google [Bot] and 2 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum