Poker-AI.org http://poker-ai.org/phpbb/ |
|
CFRM with exponential smooting? http://poker-ai.org/phpbb/viewtopic.php?f=24&t=2481 |
Page 1 of 1 |
Author: | proud2bBot [ Sat May 04, 2013 6:44 pm ] |
Post subject: | CFRM with exponential smooting? |
When learning with a CFRM algorithm, we just add up cumulative regrets and strategies. One problem with this is, that convergence is not a linear process. For instance, at the beginning where every player plays a "random" strategy, open raising a hand like 53s might not be profitable, leasing to negative regrets and the strategy going towards 0 for this bucket. After xM iterations however, we will see that villain folds enough, so the bucket may get profitable. However, the larger x is, the longer it takes to converge as we already added up negative regrets for xM iterations... I wonder if we could increase the convergence rate with exponential smoothing: recent values (regrets/strategies) get a higher weight than very old ones. Did anyone try this in a CFRM context? I know some people use it for opponent modeling to increase the weight of recent observations and reduce the one of very old observations. One possible problem is always running into local minima, but given we have an exploration factor which ensures that every action is sampled with x% even if it has a negative regret, we should overcome this. So in theory I'd guess the only problem might be that if we oscillate between two strategies, we are more inclined to use the recent one. But this should only be a very small concern in terms of EV. |
Author: | cantina [ Tue May 07, 2013 7:18 am ] |
Post subject: | Re: CFRM with exponential smooting? |
Slumbot 2012 did this. I mentioned I tried this before with HUL, but didn't conduct a conclusive analysis. I'd imagine it will get you closer to EQ quicker, but may skew things later on. I'd be interested to see a formulaic decay of the weight as the strategy converges. So, initially, more recent regrets have a larger impact, but eventually they reduce to normal -- maybe as an inverse to the node exploration probability used in average sampling? |
Author: | Coffee4tw [ Sat May 11, 2013 7:20 pm ] |
Post subject: | Re: CFRM with exponential smooting? |
I like the idea but I don't think I am versed in it enough to come up with a proper formula that both achieves quicker convergence and also minimizes overfitting/getting stuck in local optima. |
Author: | Nose [ Mon May 27, 2013 10:33 am ] |
Post subject: | Re: CFRM with exponential smooting? |
Naively I would say Code: if(0 == (#Iterations % 10^6)) foreach(double r in Regrets) r *= 0.95; where initial weights will have been reduced to 0.5% after 100M iterations. after a certain number of iterations one can simply stop smoothening the regrets like Code: if(0 == (#Iterations % 10^6)) if(#Iterations < 10^8) foreach(double r in Regrets) r *= 0.95; or alternatively one could use something similar to simulated annealing like Code: foreach(double r in Regrets) r *= (1-exp(-#Iterations)); the algorithm is still guaranteed to converge in case you stop smoothening at a certain point [EDIT] Where 10^6 means Math.Pow(10,6) or 1,000,000 |
Author: | Nose [ Sat Dec 14, 2013 4:01 am ] |
Post subject: | Re: CFRM with exponential smooting? |
Did anyone put further work into this? I introduced a factor f := 1.000001 (=initial value) and changed my implementation as follows: 1) Regret-Update Code: for (Int32 i = 0; i < totalActions; i++) _cProfile[bucket, i] += f * (regStrat[i] / (qTrain * qOpp)); 2) Every 1,000,000 iterations Code: if( 0 == (Iterations % 1000000) ) f *= f; It works well in the beginning, but at some point the growth of the factor is too large |
Author: | ConvexPolytope [ Sat Dec 14, 2013 8:48 am ] |
Post subject: | Re: CFRM with exponential smooting? |
Quote: 2) Every 1,000,000 iterations Code: if( 0 == (Iterations % 1000000) ) f *= f; It works well in the beginning, but at some point the growth of the factor is too large The current value of f is squared every 1M hands, so with f(0) = 1.000001, after i-million iterations f will be f(i) = f(0)^(2^i) In other words, it's not f that is growing exponentially in the number of iterations, but the exponent of f, so f will get prohibitively large very fast: f(22) = 66.3 f(23) = 4,387 f(24) = 19,331,000 f(26) = 1.39 * 10^29 > 2^64 (at this point f will overflow) The problem with exponential growth in general is that for sufficiently many iterations, your value will always become too large. Maybe something like this is better suited: f(i) = (1 + 0.001 * i) ^ k where k is some positive integer. |
Author: | Nose [ Fri Mar 07, 2014 10:33 am ] |
Post subject: | Re: CFRM with exponential smooting? |
ConvexPolytope wrote: Quote: 2) Every 1,000,000 iterations Code: if( 0 == (Iterations % 1000000) ) f *= f; It works well in the beginning, but at some point the growth of the factor is too large The current value of f is squared every 1M hands, so with f(0) = 1.000001, after i-million iterations f will be f(i) = f(0)^(2^i) In other words, it's not f that is growing exponentially in the number of iterations, but the exponent of f, so f will get prohibitively large very fast: f(22) = 66.3 f(23) = 4,387 f(24) = 19,331,000 f(26) = 1.39 * 10^29 > 2^64 (at this point f will overflow) The problem with exponential growth in general is that for sufficiently many iterations, your value will always become too large. Maybe something like this is better suited: f(i) = (1 + 0.001 * i) ^ k where k is some positive integer. Isn't it more like f(í)= (f(0)^2)^I = f(0)^(2*I) ? So the table would look like Code: i 2*i f(i) 0 0 1.000001 1 2 1.000002 2 4 1.000004 3 6 1.000006 4 8 1.000008 5 10 1.00001 6 12 1.000012 7 14 1.000014 8 16 1.000016 9 18 1.000018 10 20 1.00002 11 22 1.000022 12 24 1.000024 13 26 1.000026 14 28 1.000028 15 30 1.00003 16 32 1.000032 17 34 1.000034001 18 36 1.000036001 19 38 1.000038001 20 40 1.000040001 21 42 1.000042001 22 44 1.000044001 23 46 1.000046001 24 48 1.000048001 25 50 1.000050001 26 52 1.000052001 27 54 1.000054001 28 56 1.000056002 29 58 1.000058002 30 60 1.000060002 31 62 1.000062002 32 64 1.000064002 33 66 1.000066002 34 68 1.000068002 35 70 1.000070002 36 72 1.000072003 37 74 1.000074003 38 76 1.000076003 39 78 1.000078003 40 80 1.000080003 41 82 1.000082003 42 84 1.000084003 43 86 1.000086004 44 88 1.000088004 45 90 1.000090004 46 92 1.000092004 47 94 1.000094004 48 96 1.000096005 49 98 1.000098005 50 100 1.000100005 51 102 1.000102005 52 104 1.000104005 53 106 1.000106006 54 108 1.000108006 55 110 1.000110006 56 112 1.000112006 57 114 1.000114006 58 116 1.000116007 59 118 1.000118007 60 120 1.000120007 61 122 1.000122007 62 124 1.000124008 63 126 1.000126008 64 128 1.000128008 65 130 1.000130008 66 132 1.000132009 67 134 1.000134009 68 136 1.000136009 69 138 1.000138009 70 140 1.00014001 71 142 1.00014201 72 144 1.00014401 73 146 1.000146011 74 148 1.000148011 75 150 1.000150011 76 152 1.000152011 77 154 1.000154012 78 156 1.000156012 79 158 1.000158012 80 160 1.000160013 81 162 1.000162013 82 164 1.000164013 83 166 1.000166014 84 168 1.000168014 85 170 1.000170014 86 172 1.000172015 87 174 1.000174015 88 176 1.000176015 89 178 1.000178016 90 180 1.000180016 91 182 1.000182016 92 184 1.000184017 93 186 1.000186017 94 188 1.000188018 95 190 1.000190018 96 192 1.000192018 97 194 1.000194019 98 196 1.000196019 99 198 1.00019802 100 200 1.00020002 I guess the problem is rather that the scale_factor does not grow fast enough EDIT: Must be something else that grew too fast |
Author: | Nose [ Fri Mar 07, 2014 11:04 am ] |
Post subject: | Re: CFRM with exponential smooting? |
I changed my approach. I implemented a command that allows me to upscale/downscale cumulative regrets and average strategies to a chosen percentage (90%, 75%, 50%, 150%, 175%, 200%) So I guess for the next few hours I will find myself rescaling regrets and average strategies |
Author: | Nose [ Fri Mar 07, 2014 9:15 pm ] |
Post subject: | Re: CFRM with exponential smooting? |
This brings up an idea to adress the issue p2bb brought up initially Let's say we solved some HUNL with an card and bet sizing abstraction. Now we want to solve a game which (by some means) is similar to the game we solved. What one could do is: (1) Initialize the new game with the frequencies we obtained for the first game (2) Give these frequencies some weight (ie. 250,000,000 Iterations) (3) Run CFRM on this pre-initialised game tree The idea is just to skip the phase where CFRM is totally clueless by *providing some (rougly accurate) frequencies. Thoughts about that? |
Author: | cantina [ Mon Mar 10, 2014 7:52 am ] |
Post subject: | Re: CFRM with exponential smooting? |
They're adjusted by global count or by how often the node is touched? |
Author: | Nose [ Mon Mar 10, 2014 9:48 pm ] |
Post subject: | Re: CFRM with exponential smooting? |
Nasher wrote: They're adjusted by global count or by how often the node is touched? For the factor approach I used total Iterations. For the SB's open strategy it seemed to improve convergence, but I did not check other nodes For the approach of pre-initialisation I would abstain from any adjustments. Just make CFRM believe it already completed 250'000'000 iterations and let the algorithm do its work. I haven't implemented that yet. I am just curious about possible drawbacks this approach might have |
Page 1 of 1 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/ |