Poker-AI.org
http://poker-ai.org/phpbb/

MCCFRM: Forcing action frequencies in learning process?
http://poker-ai.org/phpbb/viewtopic.php?f=24&t=2460
Page 1 of 1

Author:  proud2bBot [ Mon Apr 22, 2013 4:49 am ]
Post subject:  MCCFRM: Forcing action frequencies in learning process?

CFRM has been used to construct models that play sufficiently robust but exploitive at the same time (RNR, DBRCS). However, these techniques require a basic understanding of how villain plays, e.g. with which kind of hands/buckets he will take an action. Obviously, this information is hard to get if we operate outside of the academic area, where we could get the hole cards even if villain foldet. In real games, however, we might have only holecard information of maybe 5% of all cards and even if we have a lot of hands, we cannot use it directly as those are biased towards the stronger hands or draws that got there - weaker hands/busted draws are typically folded before SD.
However, an unbiased information we can get are the action frequencies, i.e., how often does he fold/call/raise. Now given these information are correct and a good prediction for future actions, I wonder if anyone has an idea how to force the MCCFRM algorithm to obey these restrictions?
For instance, we know that we should fold/call/raise 10%/50%/40% of hands in a certain spot according to GTO, but our opponent only plays a raise/fold game with frequencies like 40%/0%/60% for example. The basic idea is to force the algorithm that he will ensure that the frequencies match our opponent model, but he is free to choose which hands to play with which action (and thus would find the perfect mix given the frequencies).
Has anybody worked on this or an idea how to implement the restricting?

Author:  Coffee4tw [ Mon Apr 22, 2013 11:48 pm ]
Post subject:  Re: MCCFRM: Forcing action frequencies in learning process?

I think to achieve this you would need to add a normalizing step after each iteration (or after each x number of iterations) and kind of shovel the regrets for hands around so that they end up to be the action sequences that you want to have.

This is just speculating now but you'll probably want to have some kind of order of hands by their regret values and then move the ones that are closer to the bucket that you need to redistribute to over.

For example:
You have three different hands that have the following action probabilities based on accumulated regret:
Code:
A: 0/0/100
B: 0/50/50
C: 50/50/0
==========
17/33/50


But you want to achieve:
Code:
50/0/50


Intuitively I would shift all of C's call actions and all of B's call actions to fold to achieve this. This example is probably too easy to solve but essentially we need more folds and less calls so at first we move percentage points from the action triplet that's closest to 100/0/0.

If I had to put this in an algorithm then it would look something like this:
Code:
1) Calculate action frequency for each action over all hands (in integer percentages for simplicity)
2) Decide which action we need to move from and which we need to move to (constraint, only neighboring moves are allowed, not fold->raise or raise-> fold)
3) Calculated which hand's frequency is closest to (but different from) the desired 100% action that we want to move to (or any other metric)
4) Move one percentage point over (or all or a fraction)
5) Recalculate overall action frequency
6) If we still need to adjust, repeat from step 2, otherwise stop.


There probably is a way to do this "online" while doing your regular CFRM iterations, possibly by filtering the regret into the action frequency that you want (aka dropping the regret for actions that are beyond the desired threshold) but I currently can't think of a way that wouldn't lead to uniform action frequencies across all hands that way.

Author:  spears [ Tue Apr 23, 2013 5:28 am ]
Post subject:  Re: MCCFRM: Forcing action frequencies in learning process?

@p2bb Clever :!: - and maybe you could force the showdowns to be accurate too.

Author:  somehomelessguy [ Fri May 03, 2013 5:28 pm ]
Post subject:  Re: MCCFRM: Forcing action frequencies in learning process?

not sure how this can be done, but a simple approach which could work reasonably well would be to simply alter the ev of the actions. if our equilibrium strategy 3bets 10% and we want to force 20%, we increase the ev of that action by some percentage.

if we find the percentages which genererates an equilibrium with the right frequencies (not sure how hard these percentages are to find), then that equilibrium would probably be close to the optimal one with the forced frequencies.

Author:  spears [ Sat May 04, 2013 9:39 am ]
Post subject:  Re: MCCFRM: Forcing action frequencies in learning process?

somehomelessguy wrote:
not sure how this can be done, but a simple approach which could work reasonably well would be to simply alter the ev of the actions. if our equilibrium strategy 3bets 10% and we want to force 20%, we increase the ev of that action by some percentage.

if we find the percentages which genererates an equilibrium with the right frequencies (not sure how hard these percentages are to find), then that equilibrium would probably be close to the optimal one with the forced frequencies.


How do we choose the strength bucket in the action whose frequency we want to modify?

Author:  somehomelessguy [ Sat May 04, 2013 5:04 pm ]
Post subject:  Re: MCCFRM: Forcing action frequencies in learning process?

spears wrote:
somehomelessguy wrote:
not sure how this can be done, but a simple approach which could work reasonably well would be to simply alter the ev of the actions. if our equilibrium strategy 3bets 10% and we want to force 20%, we increase the ev of that action by some percentage.

if we find the percentages which genererates an equilibrium with the right frequencies (not sure how hard these percentages are to find), then that equilibrium would probably be close to the optimal one with the forced frequencies.


How do we choose the strength bucket in the action whose frequency we want to modify?


we increase the payoff for the action we want to increase the frequency of (regardless of strength bucket). then, automagically, the optimal strength buckets should switch action. it's just a matter of finding the payoff increase which corresponds to the correct frequency increase.

i was thinking something like this:

1. run for n iterations
2. alter payoff for misrepresented actions
3. goto 1. maybe reset all regrets or atleast make the regrets from the previous iterations less influential.

Author:  proud2bBot [ Sat May 04, 2013 6:46 pm ]
Post subject:  Re: MCCFRM: Forcing action frequencies in learning process?

Short update: I tried it out and it works, but adds a huge overhead to the general process. First we need to calculate SB and BBs strategy separately, which doubles the time. Second we need to find the fixed strategy after xM iterations (we need to do it frequently as it has a big impact on the EV), which also slows down the learning significantly. Still thinking about how to make it faster...

Author:  spears [ Sat May 04, 2013 9:13 pm ]
Post subject:  Re: MCCFRM: Forcing action frequencies in learning process?

OK, I think I get it. You want to change the payoff at the leaves, not the ev in the branches. So increasing the payoff at a leaf increases the frequency for the branches leading to that leaf.

I wonder if there is a simple relationship between delta payoff and delta frequency.

Author:  somehomelessguy [ Sun May 05, 2013 12:22 am ]
Post subject:  Re: MCCFRM: Forcing action frequencies in learning process?

spears wrote:
OK, I think I get it. You want to change the payoff at the leaves, not the ev in the branches.


does it make a difference?

Author:  spears [ Sun May 05, 2013 7:27 am ]
Post subject:  Re: MCCFRM: Forcing action frequencies in learning process?

I don't have payoffs at internal nodes. I have evs and they are a function of strength.

Author:  alex [ Thu May 16, 2013 10:11 am ]
Post subject:  Re: MCCFRM: Forcing action frequencies in learning process?

Has anyone tried to strictly define the problem?

If you try to pose the problem as "find the nash equilibrium" then there is an issue, I think.

Let's say we are fixing player2's action probabilities. As soon as player1 changes his strategy we find out that constraints on player2's probabilities are broken (because probabilities of getting to this node for different information sets will change). So the set of strategies available for player2 to choose from depends on player1's strategy. The definition of NE hinges on one player's deviation from his current strategy being profitable. But this setting adds the issue of a deviation being valid... :|

Author:  cantina [ Wed May 22, 2013 4:28 am ]
Post subject:  Re: MCCFRM: Forcing action frequencies in learning process?

p2bb mysteriously disappeared after he said he got this working. I wonder if the government came to silence him? :shock:

Author:  cantina [ Wed May 22, 2013 4:38 am ]
Post subject:  Re: MCCFRM: Forcing action frequencies in learning process?

spears wrote:
I wonder if there is a simple relationship between delta payoff and delta frequency

By delta, you mean the rate of change? Or, probability alone?

Author:  spears [ Thu May 23, 2013 9:50 pm ]
Post subject:  Re: MCCFRM: Forcing action frequencies in learning process?

This is just a vague idea. At a decision node in an equilibrium strategy the evs of actions with probabilities other than 0 or 1 are all equal. This means the action probability and the ev of the successor node are inversely proportional. So it might be, that close to equilibrium, if you increase the ev of the successor node by some small amount, then the action frequency might decrease proportionally.

Author:  Isildur11 [ Sun Jun 30, 2013 9:35 pm ]
Post subject:  Re: MCCFRM: Forcing action frequencies in learning process?

somehomelessguy wrote:
not sure how this can be done, but a simple approach which could work reasonably well would be to simply alter the ev of the actions. if our equilibrium strategy 3bets 10% and we want to force 20%, we increase the ev of that action by some percentage.

if we find the percentages which genererates an equilibrium with the right frequencies (not sure how hard these percentages are to find), then that equilibrium would probably be close to the optimal one with the forced frequencies.


Is that this algorithm produces the least dominated strategy with 20% 3bet ?

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/