Poker-AI.org

Re: Positive Normalized Regret (Average Strategy)

2013-03-20T16:12:42+00:00

@Nasher: Yes, EV = utility of an action. It wouldn't reduce memory (memory consumption would be equal to current ones), but might have the other benefits I mentioned. And if we can exploit the information to cut subtrees more efficient, we would speed up the algorithm, thus getting closer to our equilibrium. Instead of using a more fine-grained abstraction, we could use more runs this way.

Regarding the infrequent updates of cumulative Strategy: it might work if we assume that the huge amount of strategies are pure, i.e. 100% selecting a single action (otherwise we would need more frequent updates).

Statistics: Posted by proud2bBot — Wed Mar 20, 2013 4:12 pm

Re: Positive Normalized Regret (Average Strategy)

2013-03-20T09:53:14+00:00

spears wrote:

http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm

Reading your link, it turns out that I forgot the term with the mean in my formula. I updated my post for the next readers.

Statistics: Posted by Romesnil — Wed Mar 20, 2013 9:53 am

Re: Positive Normalized Regret (Average Strategy)

2013-03-20T07:14:38+00:00

By EV you mean for a particular action (child node) at a given decision point? a.k.a. Utility?

I think I understand what you're saying, but how would using EV instead of regret reduce the memory requirements?

Something I was thinking about today: would it be feasible to update the average strategy in a secondary cycle? i.e. Just load the regret into our tree, do X amount of iterations, updating regrets as usual, but instead of loading/updating the AS, keep a list of the updates encoded as typical training inputs/outputs. After X iterations, save the regrets, then load AS and apply the updates.

For each update it would take (for me anyways) 1 Uint16 (for the hand bucket), and 15 doubles (for the action node updates), for a total of 122 bytes (not counting the list structure overhead). So, that's 8595 node updates for every megabyte of memory and about 8,801,162 updates for every gigabyte. That could be reduced to 32 bytes, if you encoded the updates into, say, Uint16's, reducing the floating point accuracy, giving you about 32k/33m updates per megabyte/gigabyte.

Has anybody tried using just Singles for their AS, and increasing their regret buckets ~1.3x?

Statistics: Posted by cantina — Wed Mar 20, 2013 7:14 am

Re: Positive Normalized Regret (Average Strategy)

2013-03-20T03:38:24+00:00

I hope not to derail the discussion from the variance idea, but I just thought about a different approach and didn't want to open a new thread.

First, let me make clear what we expect from the CFRM:

We need information about the most profitable action for a bucket
We need to know how often each action should be performed
We want clearly bad decisions (folding the nuts in NLH on the flop for example) to be excluded as early to reduce the subtrees to investigate and thus speed up the algorithm
We want to be able to investigate non-max-EV decisions as they might improve the more the strategies change
We want to be able to shortcut brances of the tree - e.g. using sampling/probing/...

Current algorithms store the information in terms of cumulative regret and cumulative strategy (bullet 1+2) and use heuristics for the other bullet points. Now what about we still keep the cumulative strategy, but instead of having the regrets for each action/bucket, we store the cumulative EVs. Wouldn't the result beeing the same, i.e., the highest EV corresponds to the highest regret etc.? If so, having the EV instead of the regret would help us as its more valuable imo:

1. It can be used to calculate an approximation of the game value (within the abstraction)
2. We can better identify/distinguish clearly bad decisions from non-max-EV but +EV decisions. For instance, consider AA in a very short-stacked HU game where we can a) limp, b) call or c) minRaise. We should be able to see quickly, that both b&c are +EV and should be investigated further, while a is out of question
3. [not 100% sure of this] instead of evaluation a subtree 100%, we might be able to use its EV to a certain % as a shortcut.
4. [more a development aspect] its easier to debug as we should be able to see wrong EV values more easily than wrong regrets.

Does anyone thought about this before and especially is my assumption that EV and regrets are basically interchangeable correct?

Statistics: Posted by proud2bBot — Wed Mar 20, 2013 3:38 am

Re: Positive Normalized Regret (Average Strategy)

2013-03-19T16:04:56+00:00

http://en.wikipedia.org/wiki/Algorithms ... _algorithm

Statistics: Posted by spears — Tue Mar 19, 2013 4:04 pm

Re: Positive Normalized Regret (Average Strategy)

2013-03-19T15:39:45+00:00

Thanks Ramesnil! Given the formula, we could use variance, but that would still lead to the same memory consumption as with traditional approaches (we need per infoset/bucket one double for CS and one for variance). However, it might still give us more valuable information, as we could estimate e.g. how likely it is in a 99% certainty range that a regret for a bucket is negative using its mean and variance and sample actions based on this.

Statistics: Posted by proud2bBot — Tue Mar 19, 2013 3:39 pm

Re: Positive Normalized Regret (Average Strategy)

2013-03-20T09:51:40+00:00

proud2bBot wrote:

No, to calculate the variance, you'd need to have all the data available, which is not an option.

In fact to calculate the variance, you only need three extra variables. One is the variance itself, another one is the mean and the last one is the size of your dataset use to compute the variance and the mean. Each time, you want to compute your variance with an additional data, you have to apply the following formulas:

Code:

d = x - m
m = m + (d / (n +1))
v = 1/(n+1) * (x - m) * d + n/(n+1) * v

where v is your (new/old) variance, m you (new/old) mean, n the size of your dataset (before adding your new data) and x the data you want to add. d is temporary variable that you don't need to save.

Source : The link posted below by spears

Edit: Fix the formulas and add the source

Statistics: Posted by Romesnil — Tue Mar 19, 2013 9:23 am

Re: Positive Normalized Regret (Average Strategy)

2013-03-19T08:56:33+00:00

Coffee4tw wrote:

I'm not all too familiar with this but here is a random idea:
Can you keep track of variance somehow? That way you'll find the points that change a lot over iterations and the ones that are pretty steady. That would only require one additional data point to store, right?

what exactly do you mean with "keep track" in this case?
You keep track of it already in your favorite tracking tool. In HEM its the value EV bb/100.

Statistics: Posted by winnie — Tue Mar 19, 2013 8:56 am

Re: Positive Normalized Regret (Average Strategy)

2013-03-18T18:52:16+00:00

No, to calculate the variance, you'd need to have all the data available, which is not an option.

Statistics: Posted by proud2bBot — Mon Mar 18, 2013 6:52 pm

Re: Positive Normalized Regret (Average Strategy)

2013-03-17T15:38:10+00:00

Statistics: Posted by Coffee4tw — Sun Mar 17, 2013 3:38 pm

Re: Positive Normalized Regret (Average Strategy)

2013-03-17T15:19:52+00:00

Let me just puke out my thoughts on it, hopefully getting a better picture:
1. We definitely need the regrets, so if we chose between CS and CR, it should be the latter
2. Currently we have both, CS and CR for the reason that CR is changing more often. For instance, between 2 Nash Equilibrium points, we can have something like after 10M iterations regrets are {-1, 1}, after 20M {1,-1}, after 30M {-1,1} and so on. If we stop at a certain point, we only see the snapshot of this point in the regret, but not the development/fluctuation as indicated in the CS
3. If we assume that our algorithm runs long enough, we will see that dominated actions will steadily decrease their regret, dominating actions will increase, but the fluctuating ones will stay within a small boundary. The question is whether we can exploit this without having to introduce data structures that are equal or bigger than CS...

Statistics: Posted by proud2bBot — Sun Mar 17, 2013 3:19 pm

Re: Positive Normalized Regret (Average Strategy)

2013-03-16T23:13:27+00:00

I understand that. But, the questions I'm asking are: can we get away with one, uniquely formulated accumulation, such that it's reflective of both CR and AS? And, would the increase in game size outweigh the inaccuracies?

Statistics: Posted by cantina — Sat Mar 16, 2013 11:13 pm

Re: Positive Normalized Regret (Average Strategy)

2013-03-16T23:00:08+00:00

I think the problem is that you need a continuous weight function for efficiently storing the data. Given we cant spend much more memory than 2 arrays of doubles, we need to store the whole info aggregated in the regrets/strategies, but at the same time, make sure that we consider all data (or at least the last xM iterations) with the more recent data a little bit higher. But I agree that AvgSS (I dont write ass anymore) is just one way of being not super greedy (i.e. only using current regret).

Statistics: Posted by proud2bBot — Sat Mar 16, 2013 11:00 pm

Positive Normalized Regret (Average Strategy)

2013-03-16T22:46:42+00:00

I'm curious about the relationship between cumulative regret and average strategy in CFRM. I've heard of people using weighted CR to update AS, as to allow more recent updates to have a bigger impact on the final strategy (slumbot 2012). If the lemma that more recent updates are more important is true, what then, mathematically speaking, is the meaning/relationship/purpose of the average strategy to begin with? Why not use some weighted form of CR as your strategy alone?

Statistics: Posted by cantina — Sat Mar 16, 2013 10:46 pm