Poker-AI.org
http://poker-ai.org/phpbb/

CFRM with exponential smooting?
http://poker-ai.org/phpbb/viewtopic.php?f=24&t=2481
Page 1 of 1

Author:  proud2bBot [ Sat May 04, 2013 6:44 pm ]
Post subject:  CFRM with exponential smooting?

When learning with a CFRM algorithm, we just add up cumulative regrets and strategies. One problem with this is, that convergence is not a linear process. For instance, at the beginning where every player plays a "random" strategy, open raising a hand like 53s might not be profitable, leasing to negative regrets and the strategy going towards 0 for this bucket. After xM iterations however, we will see that villain folds enough, so the bucket may get profitable. However, the larger x is, the longer it takes to converge as we already added up negative regrets for xM iterations... I wonder if we could increase the convergence rate with exponential smoothing: recent values (regrets/strategies) get a higher weight than very old ones. Did anyone try this in a CFRM context? I know some people use it for opponent modeling to increase the weight of recent observations and reduce the one of very old observations. One possible problem is always running into local minima, but given we have an exploration factor which ensures that every action is sampled with x% even if it has a negative regret, we should overcome this. So in theory I'd guess the only problem might be that if we oscillate between two strategies, we are more inclined to use the recent one. But this should only be a very small concern in terms of EV.

Author:  cantina [ Tue May 07, 2013 7:18 am ]
Post subject:  Re: CFRM with exponential smooting?

Slumbot 2012 did this. I mentioned I tried this before with HUL, but didn't conduct a conclusive analysis. I'd imagine it will get you closer to EQ quicker, but may skew things later on. I'd be interested to see a formulaic decay of the weight as the strategy converges. So, initially, more recent regrets have a larger impact, but eventually they reduce to normal -- maybe as an inverse to the node exploration probability used in average sampling?

Author:  Coffee4tw [ Sat May 11, 2013 7:20 pm ]
Post subject:  Re: CFRM with exponential smooting?

I like the idea but I don't think I am versed in it enough to come up with a proper formula that both achieves quicker convergence and also minimizes overfitting/getting stuck in local optima.

Author:  Nose [ Mon May 27, 2013 10:33 am ]
Post subject:  Re: CFRM with exponential smooting?

Naively I would say

Code:
if(0 == (#Iterations % 10^6))
  foreach(double r in Regrets)
    r *= 0.95;


where initial weights will have been reduced to 0.5% after 100M iterations. after a certain number of iterations one can simply stop smoothening the regrets like

Code:
if(0 == (#Iterations % 10^6))
  if(#Iterations < 10^8)
    foreach(double r in Regrets)
      r *= 0.95;


or alternatively one could use something similar to simulated annealing like

Code:
foreach(double r in Regrets)
  r *= (1-exp(-#Iterations));


the algorithm is still guaranteed to converge in case you stop smoothening at a certain point

[EDIT] Where 10^6 means Math.Pow(10,6) or 1,000,000

Author:  Nose [ Sat Dec 14, 2013 4:01 am ]
Post subject:  Re: CFRM with exponential smooting?

Did anyone put further work into this?

I introduced a factor f := 1.000001 (=initial value) and changed my implementation as follows:

1) Regret-Update
Code:
for (Int32 i = 0; i < totalActions; i++) _cProfile[bucket, i] += f * (regStrat[i] / (qTrain * qOpp));


2) Every 1,000,000 iterations
Code:
if( 0 == (Iterations % 1000000) ) f *= f;


It works well in the beginning, but at some point the growth of the factor is too large

Author:  ConvexPolytope [ Sat Dec 14, 2013 8:48 am ]
Post subject:  Re: CFRM with exponential smooting?

Quote:
2) Every 1,000,000 iterations
Code:
if( 0 == (Iterations % 1000000) ) f *= f;


It works well in the beginning, but at some point the growth of the factor is too large


The current value of f is squared every 1M hands, so with f(0) = 1.000001, after i-million iterations f will be

f(i) = f(0)^(2^i)

In other words, it's not f that is growing exponentially in the number of iterations, but the exponent of f, so f will get prohibitively large very fast:

f(22) = 66.3
f(23) = 4,387
f(24) = 19,331,000
f(26) = 1.39 * 10^29 > 2^64 (at this point f will overflow)

The problem with exponential growth in general is that for sufficiently many iterations, your value will always become too large.

Maybe something like this is better suited:

f(i) = (1 + 0.001 * i) ^ k

where k is some positive integer.

Author:  Nose [ Fri Mar 07, 2014 10:33 am ]
Post subject:  Re: CFRM with exponential smooting?

ConvexPolytope wrote:
Quote:
2) Every 1,000,000 iterations
Code:
if( 0 == (Iterations % 1000000) ) f *= f;


It works well in the beginning, but at some point the growth of the factor is too large


The current value of f is squared every 1M hands, so with f(0) = 1.000001, after i-million iterations f will be

f(i) = f(0)^(2^i)

In other words, it's not f that is growing exponentially in the number of iterations, but the exponent of f, so f will get prohibitively large very fast:

f(22) = 66.3
f(23) = 4,387
f(24) = 19,331,000
f(26) = 1.39 * 10^29 > 2^64 (at this point f will overflow)

The problem with exponential growth in general is that for sufficiently many iterations, your value will always become too large.

Maybe something like this is better suited:

f(i) = (1 + 0.001 * i) ^ k

where k is some positive integer.


Isn't it more like f(í)= (f(0)^2)^I = f(0)^(2*I) ?

So the table would look like

Code:
i   2*i   f(i)
0   0   1.000001
1   2   1.000002
2   4   1.000004
3   6   1.000006
4   8   1.000008
5   10   1.00001
6   12   1.000012
7   14   1.000014
8   16   1.000016
9   18   1.000018
10   20   1.00002
11   22   1.000022
12   24   1.000024
13   26   1.000026
14   28   1.000028
15   30   1.00003
16   32   1.000032
17   34   1.000034001
18   36   1.000036001
19   38   1.000038001
20   40   1.000040001
21   42   1.000042001
22   44   1.000044001
23   46   1.000046001
24   48   1.000048001
25   50   1.000050001
26   52   1.000052001
27   54   1.000054001
28   56   1.000056002
29   58   1.000058002
30   60   1.000060002
31   62   1.000062002
32   64   1.000064002
33   66   1.000066002
34   68   1.000068002
35   70   1.000070002
36   72   1.000072003
37   74   1.000074003
38   76   1.000076003
39   78   1.000078003
40   80   1.000080003
41   82   1.000082003
42   84   1.000084003
43   86   1.000086004
44   88   1.000088004
45   90   1.000090004
46   92   1.000092004
47   94   1.000094004
48   96   1.000096005
49   98   1.000098005
50   100   1.000100005
51   102   1.000102005
52   104   1.000104005
53   106   1.000106006
54   108   1.000108006
55   110   1.000110006
56   112   1.000112006
57   114   1.000114006
58   116   1.000116007
59   118   1.000118007
60   120   1.000120007
61   122   1.000122007
62   124   1.000124008
63   126   1.000126008
64   128   1.000128008
65   130   1.000130008
66   132   1.000132009
67   134   1.000134009
68   136   1.000136009
69   138   1.000138009
70   140   1.00014001
71   142   1.00014201
72   144   1.00014401
73   146   1.000146011
74   148   1.000148011
75   150   1.000150011
76   152   1.000152011
77   154   1.000154012
78   156   1.000156012
79   158   1.000158012
80   160   1.000160013
81   162   1.000162013
82   164   1.000164013
83   166   1.000166014
84   168   1.000168014
85   170   1.000170014
86   172   1.000172015
87   174   1.000174015
88   176   1.000176015
89   178   1.000178016
90   180   1.000180016
91   182   1.000182016
92   184   1.000184017
93   186   1.000186017
94   188   1.000188018
95   190   1.000190018
96   192   1.000192018
97   194   1.000194019
98   196   1.000196019
99   198   1.00019802
100   200   1.00020002


I guess the problem is rather that the scale_factor does not grow fast enough

EDIT: Must be something else that grew too fast

Author:  Nose [ Fri Mar 07, 2014 11:04 am ]
Post subject:  Re: CFRM with exponential smooting?

I changed my approach. I implemented a command that allows me to upscale/downscale cumulative regrets and average strategies to a chosen percentage

(90%, 75%, 50%, 150%, 175%, 200%)


So I guess for the next few hours I will find myself rescaling regrets and average strategies

Author:  Nose [ Fri Mar 07, 2014 9:15 pm ]
Post subject:  Re: CFRM with exponential smooting?

This brings up an idea to adress the issue p2bb brought up initially

Let's say we solved some HUNL with an card and bet sizing abstraction. Now we want to solve a game which (by some means) is similar to the game we solved.

What one could do is:
(1) Initialize the new game with the frequencies we obtained for the first game
(2) Give these frequencies some weight (ie. 250,000,000 Iterations)
(3) Run CFRM on this pre-initialised game tree

The idea is just to skip the phase where CFRM is totally clueless by *providing some (rougly accurate) frequencies.

Thoughts about that?

Author:  cantina [ Mon Mar 10, 2014 7:52 am ]
Post subject:  Re: CFRM with exponential smooting?

They're adjusted by global count or by how often the node is touched?

Author:  Nose [ Mon Mar 10, 2014 9:48 pm ]
Post subject:  Re: CFRM with exponential smooting?

Nasher wrote:
They're adjusted by global count or by how often the node is touched?


For the factor approach I used total Iterations. For the SB's open strategy it seemed to improve convergence, but I did not check other nodes

For the approach of pre-initialisation I would abstain from any adjustments. Just make CFRM believe it already completed 250'000'000 iterations and let the algorithm do its work. I haven't implemented that yet. I am just curious about possible drawbacks this approach might have

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/