Poker-AI.org Poker AI and Botting Discussion Forum 2013-06-30T21:35:45+00:00 http://poker-ai.org/phpbb/feed.php?f=24&t=2460 2013-06-30T21:35:45+00:00 2013-06-30T21:35:45+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2460&p=4368#p4368 <![CDATA[Re: MCCFRM: Forcing action frequencies in learning process?]]> somehomelessguy wrote:

not sure how this can be done, but a simple approach which could work reasonably well would be to simply alter the ev of the actions. if our equilibrium strategy 3bets 10% and we want to force 20%, we increase the ev of that action by some percentage.

if we find the percentages which genererates an equilibrium with the right frequencies (not sure how hard these percentages are to find), then that equilibrium would probably be close to the optimal one with the forced frequencies.


Is that this algorithm produces the least dominated strategy with 20% 3bet ?

Statistics: Posted by Isildur11 — Sun Jun 30, 2013 9:35 pm


]]>
2013-05-23T21:50:02+00:00 2013-05-23T21:50:02+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2460&p=4226#p4226 <![CDATA[Re: MCCFRM: Forcing action frequencies in learning process?]]> Statistics: Posted by spears — Thu May 23, 2013 9:50 pm


]]>
2013-05-22T04:38:38+00:00 2013-05-22T04:38:38+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2460&p=4223#p4223 <![CDATA[Re: MCCFRM: Forcing action frequencies in learning process?]]> spears wrote:

I wonder if there is a simple relationship between delta payoff and delta frequency

By delta, you mean the rate of change? Or, probability alone?

Statistics: Posted by cantina — Wed May 22, 2013 4:38 am


]]>
2013-05-22T04:28:50+00:00 2013-05-22T04:28:50+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2460&p=4222#p4222 <![CDATA[Re: MCCFRM: Forcing action frequencies in learning process?]]>

Statistics: Posted by cantina — Wed May 22, 2013 4:28 am


]]>
2013-05-16T10:11:18+00:00 2013-05-16T10:11:18+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2460&p=4179#p4179 <![CDATA[Re: MCCFRM: Forcing action frequencies in learning process?]]>
If you try to pose the problem as "find the nash equilibrium" then there is an issue, I think.

Let's say we are fixing player2's action probabilities. As soon as player1 changes his strategy we find out that constraints on player2's probabilities are broken (because probabilities of getting to this node for different information sets will change). So the set of strategies available for player2 to choose from depends on player1's strategy. The definition of NE hinges on one player's deviation from his current strategy being profitable. But this setting adds the issue of a deviation being valid... :|

Statistics: Posted by alex — Thu May 16, 2013 10:11 am


]]>
2013-05-05T07:27:15+00:00 2013-05-05T07:27:15+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2460&p=4106#p4106 <![CDATA[Re: MCCFRM: Forcing action frequencies in learning process?]]> Statistics: Posted by spears — Sun May 05, 2013 7:27 am


]]>
2013-05-05T00:22:28+00:00 2013-05-05T00:22:28+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2460&p=4104#p4104 <![CDATA[Re: MCCFRM: Forcing action frequencies in learning process?]]> spears wrote:

OK, I think I get it. You want to change the payoff at the leaves, not the ev in the branches.


does it make a difference?

Statistics: Posted by somehomelessguy — Sun May 05, 2013 12:22 am


]]>
2013-05-04T21:13:29+00:00 2013-05-04T21:13:29+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2460&p=4103#p4103 <![CDATA[Re: MCCFRM: Forcing action frequencies in learning process?]]>
I wonder if there is a simple relationship between delta payoff and delta frequency.

Statistics: Posted by spears — Sat May 04, 2013 9:13 pm


]]>
2013-05-04T18:46:43+00:00 2013-05-04T18:46:43+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2460&p=4101#p4101 <![CDATA[Re: MCCFRM: Forcing action frequencies in learning process?]]> Statistics: Posted by proud2bBot — Sat May 04, 2013 6:46 pm


]]>
2013-05-04T17:04:51+00:00 2013-05-04T17:04:51+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2460&p=4099#p4099 <![CDATA[Re: MCCFRM: Forcing action frequencies in learning process?]]> spears wrote:

somehomelessguy wrote:
not sure how this can be done, but a simple approach which could work reasonably well would be to simply alter the ev of the actions. if our equilibrium strategy 3bets 10% and we want to force 20%, we increase the ev of that action by some percentage.

if we find the percentages which genererates an equilibrium with the right frequencies (not sure how hard these percentages are to find), then that equilibrium would probably be close to the optimal one with the forced frequencies.


How do we choose the strength bucket in the action whose frequency we want to modify?


we increase the payoff for the action we want to increase the frequency of (regardless of strength bucket). then, automagically, the optimal strength buckets should switch action. it's just a matter of finding the payoff increase which corresponds to the correct frequency increase.

i was thinking something like this:

1. run for n iterations
2. alter payoff for misrepresented actions
3. goto 1. maybe reset all regrets or atleast make the regrets from the previous iterations less influential.

Statistics: Posted by somehomelessguy — Sat May 04, 2013 5:04 pm


]]>
2013-05-04T09:39:19+00:00 2013-05-04T09:39:19+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2460&p=4098#p4098 <![CDATA[Re: MCCFRM: Forcing action frequencies in learning process?]]> somehomelessguy wrote:

not sure how this can be done, but a simple approach which could work reasonably well would be to simply alter the ev of the actions. if our equilibrium strategy 3bets 10% and we want to force 20%, we increase the ev of that action by some percentage.

if we find the percentages which genererates an equilibrium with the right frequencies (not sure how hard these percentages are to find), then that equilibrium would probably be close to the optimal one with the forced frequencies.


How do we choose the strength bucket in the action whose frequency we want to modify?

Statistics: Posted by spears — Sat May 04, 2013 9:39 am


]]>
2013-05-03T17:28:22+00:00 2013-05-03T17:28:22+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2460&p=4083#p4083 <![CDATA[Re: MCCFRM: Forcing action frequencies in learning process?]]>
if we find the percentages which genererates an equilibrium with the right frequencies (not sure how hard these percentages are to find), then that equilibrium would probably be close to the optimal one with the forced frequencies.

Statistics: Posted by somehomelessguy — Fri May 03, 2013 5:28 pm


]]>
2013-04-23T05:28:14+00:00 2013-04-23T05:28:14+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2460&p=3910#p3910 <![CDATA[Re: MCCFRM: Forcing action frequencies in learning process?]]> - and maybe you could force the showdowns to be accurate too.

Statistics: Posted by spears — Tue Apr 23, 2013 5:28 am


]]>
2013-04-22T23:48:48+00:00 2013-04-22T23:48:48+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2460&p=3906#p3906 <![CDATA[Re: MCCFRM: Forcing action frequencies in learning process?]]>
This is just speculating now but you'll probably want to have some kind of order of hands by their regret values and then move the ones that are closer to the bucket that you need to redistribute to over.

For example:
You have three different hands that have the following action probabilities based on accumulated regret:
Code:
A: 0/0/100
B: 0/50/50
C: 50/50/0
==========
17/33/50


But you want to achieve:
Code:
50/0/50


Intuitively I would shift all of C's call actions and all of B's call actions to fold to achieve this. This example is probably too easy to solve but essentially we need more folds and less calls so at first we move percentage points from the action triplet that's closest to 100/0/0.

If I had to put this in an algorithm then it would look something like this:
Code:
1) Calculate action frequency for each action over all hands (in integer percentages for simplicity)
2) Decide which action we need to move from and which we need to move to (constraint, only neighboring moves are allowed, not fold->raise or raise-> fold)
3) Calculated which hand's frequency is closest to (but different from) the desired 100% action that we want to move to (or any other metric)
4) Move one percentage point over (or all or a fraction)
5) Recalculate overall action frequency
6) If we still need to adjust, repeat from step 2, otherwise stop.


There probably is a way to do this "online" while doing your regular CFRM iterations, possibly by filtering the regret into the action frequency that you want (aka dropping the regret for actions that are beyond the desired threshold) but I currently can't think of a way that wouldn't lead to uniform action frequencies across all hands that way.

Statistics: Posted by Coffee4tw — Mon Apr 22, 2013 11:48 pm


]]>
2013-04-22T04:49:04+00:00 2013-04-22T04:49:04+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2460&p=3899#p3899 <![CDATA[MCCFRM: Forcing action frequencies in learning process?]]> However, an unbiased information we can get are the action frequencies, i.e., how often does he fold/call/raise. Now given these information are correct and a good prediction for future actions, I wonder if anyone has an idea how to force the MCCFRM algorithm to obey these restrictions?
For instance, we know that we should fold/call/raise 10%/50%/40% of hands in a certain spot according to GTO, but our opponent only plays a raise/fold game with frequencies like 40%/0%/60% for example. The basic idea is to force the algorithm that he will ensure that the frequencies match our opponent model, but he is free to choose which hands to play with which action (and thus would find the perfect mix given the frequencies).
Has anybody worked on this or an idea how to implement the restricting?

Statistics: Posted by proud2bBot — Mon Apr 22, 2013 4:49 am


]]>