Poker-AI.org • View topic - Cleaning up CFRM results

View unanswered posts | View active topics

Board index » Public Forums » AI Research

All times are UTC

Cleaning up CFRM results

Page 1 of 1

[ 9 posts ]

Print view

Previous topic | Next topic

Author

Message

proud2bBot

Post subject: Cleaning up CFRM results

Posted: Fri Mar 15, 2013 3:06 pm

Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216

When running CFRM (I'm using ASS, where the problem is bigger than in regular CFRM I guess, but there its still existent), we obtain the e-Nash solution by looking at the average strategy (derived from the cumulative strategy stored in the nodes). In practice, I find nodes where after 1B iterations, its a clear decision, but the strategy does for example 99.1% Action X and 0.9% Action Y, because at the beginning of the learning process, Y seems to be good but as the convergence progresses, it showed that X is better. I wonder if we can somehow clean up these results, either during the process or after it. The latter case is easy: given we have a situation like that (e.g., AA in my 25bb game is raised as first action 99.2% and called 0.8%), we make sure the dominant action is over a certain threshold (e.g. 99%) and all other actions have a large negative cumulative regret. However, this makes our strategy a bit more "readable" and probably also a bit better, but we miss a lot of spots close to the threshold (e.g. 98.9%, while the regrets for all others are hugely negative).
My idea is that it would be better to value regrets/strategies higher, that are added in a more recent iteration (obv. not a single one). For instance, in our example, there are 1B iterations and almost all calls come from the first ones where we didn't have a clue how we play on later decisions. If we could find a function that values more recent results slightly higher (continuously), our strategy should converge faster; however we cannot overdo this as otherwise we would probably make a mistake when switching back and forth between two equilibrium points.
The question I have is does someone implement such an approach and which function/heuristic are you using? Or has anyone thoughts on the general approach, i.e., do you think it will work/won't work? It's not only the 0.x% difference that makes me interested in this optimization, but also speeding up the algorithm itself: if we have the AA situation described above, the Call node is always evaluated too, as long its % is larger than 0,1%, so finding out faster that calling is worse than raising enables us to cut the complete call-branch from further inspection.

Top

amax

Post subject: Re: Cleaning up CFRM results

Posted: Fri Mar 15, 2013 4:06 pm

New Member

Joined: Sat Mar 09, 2013 4:56 pm
Posts: 5

I've used this simple post-processing algorithm:

Code:

if (regret[action] < 0 && strategy[action] < threshold)
    strategy[action] = 0;

It usually increases theoretical exploitability but performs better in practice.

Top

proud2bBot

Post subject: Re: Cleaning up CFRM results

Posted: Fri Mar 15, 2013 4:16 pm

Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216

Yes, I'm using something similar (but have an additional threshold for regret in order to cope with mixed strategies where the regret fluctuates between 0), but I am still thinking of a possibility to do it on the run while learning.

Top

cantina

Post subject: Re: Cleaning up CFRM results

Posted: Fri Mar 15, 2013 7:47 pm

Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437

I think is was Carnagie Mellon that showed in some abstractions a certain degree of purification decreased exploitability in the real game. Conversely, for the U of A's CFRM-BR strategy, which already had a very low real-game exploitability, it increased it.

I've tried some purification in training with chance sampling, which didn't seem to work. However, maybe by purifying the cumulative regret when doing ASS it might work. Not actually storing the purification, just when deciding which node to sample. Seems reasonable, yes?

As far as weighted updates in training, slumbot 2012 did that -- it seemed to work Ok for him. I've tried it, but not thoroughly enough to show any kind of empirically meaningful results. Such a thing bodes the question: why use cumulative regret at all?

Top

proud2bBot

Post subject: Re: Cleaning up CFRM results

Posted: Fri Mar 15, 2013 7:59 pm

Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216

Nasher, just to be clear, I'm not using cum. regret directly, but Average Strategy Sampling, which samples based on the relative cumulative strategy for an action. Your idea to not change the regrets/cumulative strategies, but change the sampling seems nice, I will try that.

Top

cantina

Post subject: Re: Cleaning up CFRM results

Posted: Fri Mar 15, 2013 8:05 pm

Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437

Wait, you don't have cum in your ASS?

<-- This is a proud day for poker AI, because it's completely legitimate.

I just mean in ASS, when summing up the cumulative regret and normalizing it, you could perform some purification (i.e. only sample the node if the normalized CR is above some threshold).

Top

proud2bBot

Post subject: Re: Cleaning up CFRM results

Posted: Fri Mar 15, 2013 8:39 pm

Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216

Nasher wrote:

Wait, you don't have cum in your ASS?

<-- This is a proud day for poker AI, because it's completely legitimate.

hahaha, pure gold. I have to admit: like the fossilman I do! And yes, thats what I was thinking after your reply. I'll try it out. Thanks for the idea and making me laugh

Top

birchy

Post subject: Re: Cleaning up CFRM results

Posted: Fri Mar 15, 2013 9:48 pm

Junior Member

Joined: Sun Mar 10, 2013 11:38 pm
Posts: 34

What info do you feed into ASS,HOLE cards? I'm sorry, but I can't take this thread seriously. :lol:

_________________
www.bespokebots.com

Top

cantina

Post subject: Re: Cleaning up CFRM results

Posted: Fri Mar 15, 2013 11:39 pm

Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437

proud2bBot wrote:

hahaha, pure gold. I have to admit: like the fossilman I do! And yes, thats what I was thinking after your reply. I'll try it out. Thanks for the idea and making me laugh

You can purify the CS before passing it to the e-greedy formula. The formula will still select nodes below your threshold by epsilon %, but by temporarily purifying and re-accumulating CS, the probability of taking the OTHER nodes will increase because your cs_sum will be smaller.

This makes me wonder if anybody has experimented with other purification schemes? Like, maybe squaring and re-normalizing, or something like that.

Top

Page 1 of 1

[ 9 posts ]

Board index » Public Forums » AI Research

All times are UTC

Who is online

Users browsing this forum: No registered users and 1 guest

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum