Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 12:30 pm

All times are UTC




Post new topic Reply to topic  [ 14 posts ] 
Author Message
PostPosted: Sat Mar 16, 2013 10:46 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
I'm curious about the relationship between cumulative regret and average strategy in CFRM. I've heard of people using weighted CR to update AS, as to allow more recent updates to have a bigger impact on the final strategy (slumbot 2012). If the lemma that more recent updates are more important is true, what then, mathematically speaking, is the meaning/relationship/purpose of the average strategy to begin with? Why not use some weighted form of CR as your strategy alone?


Top
 Profile  
 
PostPosted: Sat Mar 16, 2013 11:00 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
I think the problem is that you need a continuous weight function for efficiently storing the data. Given we cant spend much more memory than 2 arrays of doubles, we need to store the whole info aggregated in the regrets/strategies, but at the same time, make sure that we consider all data (or at least the last xM iterations) with the more recent data a little bit higher. But I agree that AvgSS (I dont write ass anymore) is just one way of being not super greedy (i.e. only using current regret).


Top
 Profile  
 
PostPosted: Sat Mar 16, 2013 11:13 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
I understand that. But, the questions I'm asking are: can we get away with one, uniquely formulated accumulation, such that it's reflective of both CR and AS? And, would the increase in game size outweigh the inaccuracies?


Top
 Profile  
 
PostPosted: Sun Mar 17, 2013 3:19 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
Let me just puke out my thoughts on it, hopefully getting a better picture:
1. We definitely need the regrets, so if we chose between CS and CR, it should be the latter
2. Currently we have both, CS and CR for the reason that CR is changing more often. For instance, between 2 Nash Equilibrium points, we can have something like after 10M iterations regrets are {-1, 1}, after 20M {1,-1}, after 30M {-1,1} and so on. If we stop at a certain point, we only see the snapshot of this point in the regret, but not the development/fluctuation as indicated in the CS
3. If we assume that our algorithm runs long enough, we will see that dominated actions will steadily decrease their regret, dominating actions will increase, but the fluctuating ones will stay within a small boundary. The question is whether we can exploit this without having to introduce data structures that are equal or bigger than CS...


Top
 Profile  
 
PostPosted: Sun Mar 17, 2013 3:38 pm 
Offline
Site Admin
User avatar

Joined: Thu Feb 28, 2013 5:24 pm
Posts: 230
I'm not all too familiar with this but here is a random idea:
Can you keep track of variance somehow? That way you'll find the points that change a lot over iterations and the ones that are pretty steady. That would only require one additional data point to store, right?

_________________
Cheers.


Top
 Profile  
 
PostPosted: Mon Mar 18, 2013 6:52 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
No, to calculate the variance, you'd need to have all the data available, which is not an option.


Top
 Profile  
 
PostPosted: Tue Mar 19, 2013 8:56 am 
Offline
Junior Member

Joined: Wed Mar 06, 2013 8:44 am
Posts: 37
Coffee4tw wrote:
I'm not all too familiar with this but here is a random idea:
Can you keep track of variance somehow? That way you'll find the points that change a lot over iterations and the ones that are pretty steady. That would only require one additional data point to store, right?



what exactly do you mean with "keep track" in this case?
You keep track of it already in your favorite tracking tool. In HEM its the value EV bb/100.


Top
 Profile  
 
PostPosted: Tue Mar 19, 2013 9:23 am 
Offline
Junior Member

Joined: Tue Mar 05, 2013 1:27 pm
Posts: 16
proud2bBot wrote:
No, to calculate the variance, you'd need to have all the data available, which is not an option.

In fact to calculate the variance, you only need three extra variables. One is the variance itself, another one is the mean and the last one is the size of your dataset use to compute the variance and the mean. Each time, you want to compute your variance with an additional data, you have to apply the following formulas:
Code:
d = x - m
m = m + (d / (n +1))
v = 1/(n+1) * (x - m) * d + n/(n+1) * v

where v is your (new/old) variance, m you (new/old) mean, n the size of your dataset (before adding your new data) and x the data you want to add. d is temporary variable that you don't need to save.

Source : The link posted below by spears

Edit: Fix the formulas and add the source


Last edited by Romesnil on Wed Mar 20, 2013 9:51 am, edited 2 times in total.

Top
 Profile  
 
PostPosted: Tue Mar 19, 2013 3:39 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
Thanks Ramesnil! Given the formula, we could use variance, but that would still lead to the same memory consumption as with traditional approaches (we need per infoset/bucket one double for CS and one for variance). However, it might still give us more valuable information, as we could estimate e.g. how likely it is in a 99% certainty range that a regret for a bucket is negative using its mean and variance and sample actions based on this.


Top
 Profile  
 
PostPosted: Tue Mar 19, 2013 4:04 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
http://en.wikipedia.org/wiki/Algorithms ... _algorithm


Top
 Profile  
 
PostPosted: Wed Mar 20, 2013 3:38 am 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
I hope not to derail the discussion from the variance idea, but I just thought about a different approach and didn't want to open a new thread.

First, let me make clear what we expect from the CFRM:
  • We need information about the most profitable action for a bucket
  • We need to know how often each action should be performed
  • We want clearly bad decisions (folding the nuts in NLH on the flop for example) to be excluded as early to reduce the subtrees to investigate and thus speed up the algorithm
  • We want to be able to investigate non-max-EV decisions as they might improve the more the strategies change
  • We want to be able to shortcut brances of the tree - e.g. using sampling/probing/...

Current algorithms store the information in terms of cumulative regret and cumulative strategy (bullet 1+2) and use heuristics for the other bullet points. Now what about we still keep the cumulative strategy, but instead of having the regrets for each action/bucket, we store the cumulative EVs. Wouldn't the result beeing the same, i.e., the highest EV corresponds to the highest regret etc.? If so, having the EV instead of the regret would help us as its more valuable imo:

1. It can be used to calculate an approximation of the game value (within the abstraction)
2. We can better identify/distinguish clearly bad decisions from non-max-EV but +EV decisions. For instance, consider AA in a very short-stacked HU game where we can a) limp, b) call or c) minRaise. We should be able to see quickly, that both b&c are +EV and should be investigated further, while a is out of question
3. [not 100% sure of this] instead of evaluation a subtree 100%, we might be able to use its EV to a certain % as a shortcut.
4. [more a development aspect] its easier to debug as we should be able to see wrong EV values more easily than wrong regrets.

Does anyone thought about this before and especially is my assumption that EV and regrets are basically interchangeable correct?


Top
 Profile  
 
PostPosted: Wed Mar 20, 2013 7:14 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
By EV you mean for a particular action (child node) at a given decision point? a.k.a. Utility?

I think I understand what you're saying, but how would using EV instead of regret reduce the memory requirements?

Something I was thinking about today: would it be feasible to update the average strategy in a secondary cycle? i.e. Just load the regret into our tree, do X amount of iterations, updating regrets as usual, but instead of loading/updating the AS, keep a list of the updates encoded as typical training inputs/outputs. After X iterations, save the regrets, then load AS and apply the updates.

For each update it would take (for me anyways) 1 Uint16 (for the hand bucket), and 15 doubles (for the action node updates), for a total of 122 bytes (not counting the list structure overhead). So, that's 8595 node updates for every megabyte of memory and about 8,801,162 updates for every gigabyte. That could be reduced to 32 bytes, if you encoded the updates into, say, Uint16's, reducing the floating point accuracy, giving you about 32k/33m updates per megabyte/gigabyte.

Has anybody tried using just Singles for their AS, and increasing their regret buckets ~1.3x?


Top
 Profile  
 
PostPosted: Wed Mar 20, 2013 9:53 am 
Offline
Junior Member

Joined: Tue Mar 05, 2013 1:27 pm
Posts: 16
spears wrote:
http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm


Reading your link, it turns out that I forgot the term with the mean in my formula. I updated my post for the next readers.


Top
 Profile  
 
PostPosted: Wed Mar 20, 2013 4:12 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
@Nasher: Yes, EV = utility of an action. It wouldn't reduce memory (memory consumption would be equal to current ones), but might have the other benefits I mentioned. And if we can exploit the information to cut subtrees more efficient, we would speed up the algorithm, thus getting closer to our equilibrium. Instead of using a more fine-grained abstraction, we could use more runs this way.

Regarding the infrequent updates of cumulative Strategy: it might work if we assume that the huge amount of strategies are pure, i.e. 100% selecting a single action (otherwise we would need more frequent updates).


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 14 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
cron
Powered by phpBB® Forum Software © phpBB Group