Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 5:21 pm

All times are UTC




Post new topic Reply to topic  [ 11 posts ] 
Author Message
PostPosted: Sat May 04, 2013 6:44 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
When learning with a CFRM algorithm, we just add up cumulative regrets and strategies. One problem with this is, that convergence is not a linear process. For instance, at the beginning where every player plays a "random" strategy, open raising a hand like 53s might not be profitable, leasing to negative regrets and the strategy going towards 0 for this bucket. After xM iterations however, we will see that villain folds enough, so the bucket may get profitable. However, the larger x is, the longer it takes to converge as we already added up negative regrets for xM iterations... I wonder if we could increase the convergence rate with exponential smoothing: recent values (regrets/strategies) get a higher weight than very old ones. Did anyone try this in a CFRM context? I know some people use it for opponent modeling to increase the weight of recent observations and reduce the one of very old observations. One possible problem is always running into local minima, but given we have an exploration factor which ensures that every action is sampled with x% even if it has a negative regret, we should overcome this. So in theory I'd guess the only problem might be that if we oscillate between two strategies, we are more inclined to use the recent one. But this should only be a very small concern in terms of EV.


Top
 Profile  
 
PostPosted: Tue May 07, 2013 7:18 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Slumbot 2012 did this. I mentioned I tried this before with HUL, but didn't conduct a conclusive analysis. I'd imagine it will get you closer to EQ quicker, but may skew things later on. I'd be interested to see a formulaic decay of the weight as the strategy converges. So, initially, more recent regrets have a larger impact, but eventually they reduce to normal -- maybe as an inverse to the node exploration probability used in average sampling?


Top
 Profile  
 
PostPosted: Sat May 11, 2013 7:20 pm 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
I like the idea but I don't think I am versed in it enough to come up with a proper formula that both achieves quicker convergence and also minimizes overfitting/getting stuck in local optima.

_________________
Cheers.


Top
 Profile  
 
PostPosted: Mon May 27, 2013 10:33 am 
Offline
Regular Member
User avatar

Joined: Sat May 25, 2013 7:36 am
Posts: 73
Naively I would say

Code:
if(0 == (#Iterations % 10^6))
  foreach(double r in Regrets)
    r *= 0.95;


where initial weights will have been reduced to 0.5% after 100M iterations. after a certain number of iterations one can simply stop smoothening the regrets like

Code:
if(0 == (#Iterations % 10^6))
  if(#Iterations < 10^8)
    foreach(double r in Regrets)
      r *= 0.95;


or alternatively one could use something similar to simulated annealing like

Code:
foreach(double r in Regrets)
  r *= (1-exp(-#Iterations));


the algorithm is still guaranteed to converge in case you stop smoothening at a certain point

[EDIT] Where 10^6 means Math.Pow(10,6) or 1,000,000


Top
 Profile  
 
PostPosted: Sat Dec 14, 2013 4:01 am 
Offline
Regular Member
User avatar

Joined: Sat May 25, 2013 7:36 am
Posts: 73
Did anyone put further work into this?

I introduced a factor f := 1.000001 (=initial value) and changed my implementation as follows:

1) Regret-Update
Code:
for (Int32 i = 0; i < totalActions; i++) _cProfile[bucket, i] += f * (regStrat[i] / (qTrain * qOpp));


2) Every 1,000,000 iterations
Code:
if( 0 == (Iterations % 1000000) ) f *= f;


It works well in the beginning, but at some point the growth of the factor is too large


Top
 Profile  
 
PostPosted: Sat Dec 14, 2013 8:48 am 
Offline
Junior Member
User avatar

Joined: Sun Dec 08, 2013 4:22 pm
Posts: 15
Quote:
2) Every 1,000,000 iterations
Code:
if( 0 == (Iterations % 1000000) ) f *= f;


It works well in the beginning, but at some point the growth of the factor is too large


The current value of f is squared every 1M hands, so with f(0) = 1.000001, after i-million iterations f will be

f(i) = f(0)^(2^i)

In other words, it's not f that is growing exponentially in the number of iterations, but the exponent of f, so f will get prohibitively large very fast:

f(22) = 66.3
f(23) = 4,387
f(24) = 19,331,000
f(26) = 1.39 * 10^29 > 2^64 (at this point f will overflow)

The problem with exponential growth in general is that for sufficiently many iterations, your value will always become too large.

Maybe something like this is better suited:

f(i) = (1 + 0.001 * i) ^ k

where k is some positive integer.


Top
 Profile  
 
PostPosted: Fri Mar 07, 2014 10:33 am 
Offline
Regular Member
User avatar

Joined: Sat May 25, 2013 7:36 am
Posts: 73
ConvexPolytope wrote:
Quote:
2) Every 1,000,000 iterations
Code:
if( 0 == (Iterations % 1000000) ) f *= f;


It works well in the beginning, but at some point the growth of the factor is too large


The current value of f is squared every 1M hands, so with f(0) = 1.000001, after i-million iterations f will be

f(i) = f(0)^(2^i)

In other words, it's not f that is growing exponentially in the number of iterations, but the exponent of f, so f will get prohibitively large very fast:

f(22) = 66.3
f(23) = 4,387
f(24) = 19,331,000
f(26) = 1.39 * 10^29 > 2^64 (at this point f will overflow)

The problem with exponential growth in general is that for sufficiently many iterations, your value will always become too large.

Maybe something like this is better suited:

f(i) = (1 + 0.001 * i) ^ k

where k is some positive integer.


Isn't it more like f(í)= (f(0)^2)^I = f(0)^(2*I) ?

So the table would look like

Code:
i   2*i   f(i)
0   0   1.000001
1   2   1.000002
2   4   1.000004
3   6   1.000006
4   8   1.000008
5   10   1.00001
6   12   1.000012
7   14   1.000014
8   16   1.000016
9   18   1.000018
10   20   1.00002
11   22   1.000022
12   24   1.000024
13   26   1.000026
14   28   1.000028
15   30   1.00003
16   32   1.000032
17   34   1.000034001
18   36   1.000036001
19   38   1.000038001
20   40   1.000040001
21   42   1.000042001
22   44   1.000044001
23   46   1.000046001
24   48   1.000048001
25   50   1.000050001
26   52   1.000052001
27   54   1.000054001
28   56   1.000056002
29   58   1.000058002
30   60   1.000060002
31   62   1.000062002
32   64   1.000064002
33   66   1.000066002
34   68   1.000068002
35   70   1.000070002
36   72   1.000072003
37   74   1.000074003
38   76   1.000076003
39   78   1.000078003
40   80   1.000080003
41   82   1.000082003
42   84   1.000084003
43   86   1.000086004
44   88   1.000088004
45   90   1.000090004
46   92   1.000092004
47   94   1.000094004
48   96   1.000096005
49   98   1.000098005
50   100   1.000100005
51   102   1.000102005
52   104   1.000104005
53   106   1.000106006
54   108   1.000108006
55   110   1.000110006
56   112   1.000112006
57   114   1.000114006
58   116   1.000116007
59   118   1.000118007
60   120   1.000120007
61   122   1.000122007
62   124   1.000124008
63   126   1.000126008
64   128   1.000128008
65   130   1.000130008
66   132   1.000132009
67   134   1.000134009
68   136   1.000136009
69   138   1.000138009
70   140   1.00014001
71   142   1.00014201
72   144   1.00014401
73   146   1.000146011
74   148   1.000148011
75   150   1.000150011
76   152   1.000152011
77   154   1.000154012
78   156   1.000156012
79   158   1.000158012
80   160   1.000160013
81   162   1.000162013
82   164   1.000164013
83   166   1.000166014
84   168   1.000168014
85   170   1.000170014
86   172   1.000172015
87   174   1.000174015
88   176   1.000176015
89   178   1.000178016
90   180   1.000180016
91   182   1.000182016
92   184   1.000184017
93   186   1.000186017
94   188   1.000188018
95   190   1.000190018
96   192   1.000192018
97   194   1.000194019
98   196   1.000196019
99   198   1.00019802
100   200   1.00020002


I guess the problem is rather that the scale_factor does not grow fast enough

EDIT: Must be something else that grew too fast


Last edited by Nose on Fri Mar 07, 2014 1:22 pm, edited 1 time in total.

Top
 Profile  
 
PostPosted: Fri Mar 07, 2014 11:04 am 
Offline
Regular Member
User avatar

Joined: Sat May 25, 2013 7:36 am
Posts: 73
I changed my approach. I implemented a command that allows me to upscale/downscale cumulative regrets and average strategies to a chosen percentage

(90%, 75%, 50%, 150%, 175%, 200%)


So I guess for the next few hours I will find myself rescaling regrets and average strategies


Top
 Profile  
 
PostPosted: Fri Mar 07, 2014 9:15 pm 
Offline
Regular Member
User avatar

Joined: Sat May 25, 2013 7:36 am
Posts: 73
This brings up an idea to adress the issue p2bb brought up initially

Let's say we solved some HUNL with an card and bet sizing abstraction. Now we want to solve a game which (by some means) is similar to the game we solved.

What one could do is:
(1) Initialize the new game with the frequencies we obtained for the first game
(2) Give these frequencies some weight (ie. 250,000,000 Iterations)
(3) Run CFRM on this pre-initialised game tree

The idea is just to skip the phase where CFRM is totally clueless by *providing some (rougly accurate) frequencies.

Thoughts about that?


Top
 Profile  
 
PostPosted: Mon Mar 10, 2014 7:52 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
They're adjusted by global count or by how often the node is touched?


Top
 Profile  
 
PostPosted: Mon Mar 10, 2014 9:48 pm 
Offline
Regular Member
User avatar

Joined: Sat May 25, 2013 7:36 am
Posts: 73
Nasher wrote:
They're adjusted by global count or by how often the node is touched?


For the factor approach I used total Iterations. For the SB's open strategy it seemed to improve convergence, but I did not check other nodes

For the approach of pre-initialisation I would abstain from any adjustments. Just make CFRM believe it already completed 250'000'000 iterations and let the algorithm do its work. I haven't implemented that yet. I am just curious about possible drawbacks this approach might have


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 11 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group