Poker-AI.org • View topic - Simple vanilla CFRM question

View unanswered posts | View active topics

Board index » Public Forums » AI Research

All times are UTC

Simple vanilla CFRM question

Page 1 of 1

[ 17 posts ]

Print view

Previous topic | Next topic

Author

Message

longshot

Post subject: Simple vanilla CFRM question

Posted: Tue Apr 23, 2013 11:24 pm

Junior Member

Joined: Thu Apr 11, 2013 10:13 pm
Posts: 22

I just read the part of Michael Johanson's MS thesis where he gives a detailed example of CFR, and now I am doubting my baseline level of understanding for the algorithm.

The way I have currently implemented vanilla CFRM is to track the cumulative positive CFR for every information set. Then at each iteration, I update the strategy to be proportional to the cumulative sum. By contrast, from the MS thesis description, it seems like what I should do is track the total CFR (positive or negative) and then update the strategy to be proportional to the positive results. However, when I try implementing it using that strategy, it does not converge at all.

There is some mention in the thesis about the "average strategy" converging. Is the idea that you do something like fictitious play, where you need to average together all the policies at time T in order to get the true optimal strategy? In that case, you really need to keep both your current strategy and your average strategy-- which of course you can update incrementally. That's the only way I can see this approach working.

Can someone give me some clarity here?

Top

proud2bBot

Post subject: Re: Simple vanilla CFRM question

Posted: Wed Apr 24, 2013 12:36 am

Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216

the current strategy can be derived from the regret values, but you store the cumulative strategies in each node too - this is where the end result comes from.

Top

longshot

Post subject: Re: Simple vanilla CFRM question

Posted: Wed Apr 24, 2013 6:05 pm

Junior Member

Joined: Thu Apr 11, 2013 10:13 pm
Posts: 22

Okay, then I am really confused. I just implemented it that way and it is not nearly as effective or stable. I must be doing something wrong.

Here is my full python repo: https://github.com/tansey/pycfr

If you go to the main directory and run

Code:

python tests/test_cfr.py

, it should run with the canonical implementation on first half-street kuhn and then leduc. HSK works fine, since it's such a simple problem. However, when I run on leduc, it converges slowly and then actually starts to bounce back up once it gets around a 0-exploitability payoff for player 2 (when we know they value of leduc is around -0.08 for player 1).

If you run change the object created in test_cfr.py from creating an object of type

Code:

ProperCounterfactualRegretMinimizer

to just

Code:

CounterfactualRegretMinimizer

(you can keep all other code exactly the same), it uses the approach I originally implemented, based on the proportion of cumulative positive regret. It converges much faster and doesn't bounce around at all.

Did I do something wrong here? If anybody's implemented vanilla CFR for leduc, I'd be interested in taking a look at an output of the exploitability of their agents every 10 iterations for the first 2000 iterations, since that's about the timeframe I'm looking at right now.

Top

HontoNiBaka

Post subject: Re: Simple vanilla CFRM question

Posted: Wed Apr 24, 2013 6:43 pm

Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267

The strategy for the next iteration is computed with the regret values. You have to keep track of positive and begative values, but you only use the positive ones in the actual strategy computation.

This strategy will change quite often, hands with mixed strategies oscilate.

The average strategy is the sum of all strategies normalized by the number of iterations. This average strategy doesnt change much, especially after lots of iterations. Intuitivelly this should make sense, as if you have a million of itterations, another itteration cant change the average that much.

Top

spears

Post subject: Re: Simple vanilla CFRM question

Posted: Wed Apr 24, 2013 6:46 pm

Site Admin

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642

Amax posted some nice code on the old forum at http://poker-ai.org/archive/www.pokerai ... 335#p40335

Top

somehomelessguy

Post subject: Re: Simple vanilla CFRM question

Posted: Wed Apr 24, 2013 7:00 pm

Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34

HontoNiBaka wrote:

this. but you normalize with the acummulated sum of reach probabilties. if you have strategy 0/0/1 one iteration where you had 2% prob to reach, and strategy 0.5/0.5/0 in another iteration where you had 4% prob to reach the information set, the average strategy is 0.333/0.333/0.333 for example.

also, hijacking for follow up question which i can't seem to prove:

does the average EV of the CFRM-iterations converge to the EV between the average strategies?

Top

longshot

Post subject: Re: Simple vanilla CFRM question

Posted: Wed Apr 24, 2013 7:05 pm

Junior Member

Joined: Thu Apr 11, 2013 10:13 pm
Posts: 22

HontoNiBaka wrote:

Right. I am pretty sure that's what I'm doing. I track counterfactual regret and maintain a "current strategy profile" and "average strategy profile", where the latter is the average of all the previous instances of the current profiles.

I agree that it shouldn't change much after millions of iterations, but I am working right now on the order of thousands and printing out the exploitability every 10 iterations. The results can thus change quite a bit there.

Top

longshot

Post subject: Re: Simple vanilla CFRM question

Posted: Wed Apr 24, 2013 7:08 pm

Junior Member

Joined: Thu Apr 11, 2013 10:13 pm
Posts: 22

spears wrote:

Amax posted some nice code on the old forum at http://poker-ai.org/archive/www.pokerai ... 335#p40335

Thanks Spears. I've seen and downloaded that code before. It only covers the various MC-CFR versions though, not vanilla CFR right? I suppose I could intuit most of the algorithm by looking closely at external sampling, but it would still not be the exact algorithm.

Top

HontoNiBaka

Post subject: Re: Simple vanilla CFRM question

Posted: Wed Apr 24, 2013 7:10 pm

Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267

public chance sampling is vanilla for kuhn and the riversolver

Top

somehomelessguy

Post subject: Re: Simple vanilla CFRM question

Posted: Wed Apr 24, 2013 7:12 pm

Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34

glanced at your code and found this:

1.0 / (self.iterations + 1) * (self.iterations * prev_cfr + immediate_cfr)

this is usually a very bad idea due to rounding errors. just add them all up and divide with the iterations when you need the value.

Top

longshot

Post subject: Re: Simple vanilla CFRM question

Posted: Wed Apr 24, 2013 7:13 pm

Junior Member

Joined: Thu Apr 11, 2013 10:13 pm
Posts: 22

Quote:

you normalize with the acummulated sum of reach probabilties. if you have strategy 0/0/1 one iteration where you had 2% prob to reach, and strategy 0.5/0.5/0 in another iteration where you had 4% prob to reach the information set, the average strategy is 0.333/0.333/0.333 for example.

Wouldn't that be taken into account by the CFR weighting? I thought CFR was basically:

Code:

reachprob * (action_ev - strategy_ev)

So wouldn't weighting the averages be double-counting things?

Top

longshot

Post subject: Re: Simple vanilla CFRM question

Posted: Wed Apr 24, 2013 7:15 pm

Junior Member

Joined: Thu Apr 11, 2013 10:13 pm
Posts: 22

HontoNiBaka wrote:

public chance sampling is vanilla for kuhn and the riversolver

Right. Both my original implementation and what I think is the proper CFRM implementation work well on kuhn. It's leduc that I'm seeing big differences in w.r.t. convergence rates.

Top

somehomelessguy

Post subject: Re: Simple vanilla CFRM question

Posted: Wed Apr 24, 2013 7:18 pm

Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34

longshot wrote:

Quote:

Wouldn't that be taken into account by the CFR weighting? I thought CFR was basically:

Code:

reachprob * (action_ev - strategy_ev)

So wouldn't weighting the averages be double-counting things?

no. there you weigh with the opponents probability of reaching the information set.

Top

longshot

Post subject: Re: Simple vanilla CFRM question

Posted: Wed Apr 24, 2013 7:31 pm

Junior Member

Joined: Thu Apr 11, 2013 10:13 pm
Posts: 22

somehomelessguy wrote:

longshot wrote:

Quote:

Wouldn't that be taken into account by the CFR weighting? I thought CFR was basically:

Code:

reachprob * (action_ev - strategy_ev)

So wouldn't weighting the averages be double-counting things?

no. there you weigh with the opponents probability of reaching the information set.

So you're saying you need to store the cumulative reach probabilities at each information set? Then the update equation is:

Code:

total_reach = prev_reach + cur_reach
strategy = 1.0 / total_reach * (prev_reach * prev_strategy + cur_reach * cur_strategy)

That doesn't seem to appear anywhere in the original NIPS paper.

Top

longshot

Post subject: Re: Simple vanilla CFRM question

Posted: Wed Apr 24, 2013 7:40 pm

Junior Member

Joined: Thu Apr 11, 2013 10:13 pm
Posts: 22

somehomelessguy wrote:

glanced at your code and found this:

Code:

1.0 / (self.iterations + 1) * (self.iterations * prev_cfr + immediate_cfr)

this is usually a very bad idea due to rounding errors. just add them all up and divide with the iterations when you need the value.

Strangely enough, if I change it to:

Code:

self.counterfactual_regret[root.player][infoset][i] += immediate_cfr

It seems to have a substantially detrimental effect. I'm not sure why right now. That may be an indication that there is a bug somewhere in my code. Though looking at everything, it doesn't really seem like that should make a difference. We're only interested in the relative CFR, so whether I store the average or the sum, it shouldn't effect the result of the algorithm.

Last edited by longshot on Wed Apr 24, 2013 7:42 pm, edited 1 time in total.

Top

somehomelessguy

Post subject: Re: Simple vanilla CFRM question

Posted: Wed Apr 24, 2013 7:40 pm

Junior Member

Joined: Mon Apr 22, 2013 11:46 am
Posts: 34

longshot wrote:

So you're saying you need to store the cumulative reach probabilities at each information set? Then the update equation is:

Code:

total_reach = prev_reach + cur_reach
strategy = 1.0 / total_reach * (prev_reach * prev_strategy + cur_reach * cur_strategy)

That doesn't seem to appear anywhere in the original NIPS paper.

stop that. you'll get rounding errors.

keep track of sums of reach*prob for each action.
average strategy is the action's sum / sum of reaches
hint: you don't need to keep track of the sum of reaches as you can add up the actions reach*prob to get it.

Top

longshot

Post subject: Re: Simple vanilla CFRM question

Posted: Wed Apr 24, 2013 8:44 pm

Junior Member

Joined: Thu Apr 11, 2013 10:13 pm
Posts: 22

somehomelessguy wrote:

Ah, I found the average strategy formula in the NIPS 2009 paper and now what you're saying makes much more sense! Initial results look like it's fixed.

Thanks SHG!

P.S. Still haven't changed/fixed all the potential rounding errors. Right now I'm just assuming it's not a huge issue until I get really close to the NE.

Top

Page 1 of 1

[ 17 posts ]

Board index » Public Forums » AI Research

All times are UTC

Who is online

Users browsing this forum: No registered users and 2 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum