Poker-AI.org Poker AI and Botting Discussion Forum 2013-03-15T23:39:23+00:00 http://poker-ai.org/phpbb/feed.php?f=24&t=2402 2013-03-15T23:39:23+00:00 2013-03-15T23:39:23+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2402&p=3268#p3268 <![CDATA[Re: Cleaning up CFRM results]]> proud2bBot wrote:

hahaha, pure gold. I have to admit: like the fossilman I do! And yes, thats what I was thinking after your reply. I'll try it out. Thanks for the idea and making me laugh :)


You can purify the CS before passing it to the e-greedy formula. The formula will still select nodes below your threshold by epsilon %, but by temporarily purifying and re-accumulating CS, the probability of taking the OTHER nodes will increase because your cs_sum will be smaller.

This makes me wonder if anybody has experimented with other purification schemes? Like, maybe squaring and re-normalizing, or something like that.

Statistics: Posted by cantina — Fri Mar 15, 2013 11:39 pm


]]>
2013-03-15T21:48:52+00:00 2013-03-15T21:48:52+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2402&p=3261#p3261 <![CDATA[Re: Cleaning up CFRM results]]>

Statistics: Posted by birchy — Fri Mar 15, 2013 9:48 pm


]]>
2013-03-15T20:39:38+00:00 2013-03-15T20:39:38+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2402&p=3256#p3256 <![CDATA[Re: Cleaning up CFRM results]]> Nasher wrote:

Wait, you don't have cum in your ASS? :D <-- This is a proud day for poker AI, because it's completely legitimate.


hahaha, pure gold. I have to admit: like the fossilman I do! And yes, thats what I was thinking after your reply. I'll try it out. Thanks for the idea and making me laugh :)

Statistics: Posted by proud2bBot — Fri Mar 15, 2013 8:39 pm


]]>
2013-03-15T20:05:34+00:00 2013-03-15T20:05:34+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2402&p=3254#p3254 <![CDATA[Re: Cleaning up CFRM results]]> <-- This is a proud day for poker AI, because it's completely legitimate.

I just mean in ASS, when summing up the cumulative regret and normalizing it, you could perform some purification (i.e. only sample the node if the normalized CR is above some threshold).

Statistics: Posted by cantina — Fri Mar 15, 2013 8:05 pm


]]>
2013-03-15T19:59:38+00:00 2013-03-15T19:59:38+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2402&p=3253#p3253 <![CDATA[Re: Cleaning up CFRM results]]> Statistics: Posted by proud2bBot — Fri Mar 15, 2013 7:59 pm


]]>
2013-03-15T19:47:44+00:00 2013-03-15T19:47:44+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2402&p=3251#p3251 <![CDATA[Re: Cleaning up CFRM results]]>
I've tried some purification in training with chance sampling, which didn't seem to work. However, maybe by purifying the cumulative regret when doing ASS it might work. Not actually storing the purification, just when deciding which node to sample. Seems reasonable, yes?

As far as weighted updates in training, slumbot 2012 did that -- it seemed to work Ok for him. I've tried it, but not thoroughly enough to show any kind of empirically meaningful results. Such a thing bodes the question: why use cumulative regret at all?

Statistics: Posted by cantina — Fri Mar 15, 2013 7:47 pm


]]>
2013-03-15T16:16:33+00:00 2013-03-15T16:16:33+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2402&p=3247#p3247 <![CDATA[Re: Cleaning up CFRM results]]> Statistics: Posted by proud2bBot — Fri Mar 15, 2013 4:16 pm


]]>
2013-03-15T16:06:51+00:00 2013-03-15T16:06:51+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2402&p=3246#p3246 <![CDATA[Re: Cleaning up CFRM results]]>
Code:
if (regret[action] < 0 && strategy[action] < threshold)
    strategy[action] = 0;


It usually increases theoretical exploitability but performs better in practice.

Statistics: Posted by amax — Fri Mar 15, 2013 4:06 pm


]]>
2013-03-15T15:06:30+00:00 2013-03-15T15:06:30+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2402&p=3243#p3243 <![CDATA[Cleaning up CFRM results]]> My idea is that it would be better to value regrets/strategies higher, that are added in a more recent iteration (obv. not a single one). For instance, in our example, there are 1B iterations and almost all calls come from the first ones where we didn't have a clue how we play on later decisions. If we could find a function that values more recent results slightly higher (continuously), our strategy should converge faster; however we cannot overdo this as otherwise we would probably make a mistake when switching back and forth between two equilibrium points.
The question I have is does someone implement such an approach and which function/heuristic are you using? Or has anyone thoughts on the general approach, i.e., do you think it will work/won't work? It's not only the 0.x% difference that makes me interested in this optimization, but also speeding up the algorithm itself: if we have the AA situation described above, the Call node is always evaluated too, as long its % is larger than 0,1%, so finding out faster that calling is worse than raising enables us to cut the complete call-branch from further inspection.

Statistics: Posted by proud2bBot — Fri Mar 15, 2013 3:06 pm


]]>