Poker-AI.org Poker AI and Botting Discussion Forum 2013-03-20T16:12:42+00:00 http://poker-ai.org/phpbb/feed.php?f=24&t=2410 2013-03-20T16:12:42+00:00 2013-03-20T16:12:42+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2410&p=3431#p3431 <![CDATA[Re: Positive Normalized Regret (Average Strategy)]]>
Regarding the infrequent updates of cumulative Strategy: it might work if we assume that the huge amount of strategies are pure, i.e. 100% selecting a single action (otherwise we would need more frequent updates).

Statistics: Posted by proud2bBot — Wed Mar 20, 2013 4:12 pm


]]>
2013-03-20T09:53:14+00:00 2013-03-20T09:53:14+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2410&p=3429#p3429 <![CDATA[Re: Positive Normalized Regret (Average Strategy)]]> spears wrote:

http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Online_algorithm


Reading your link, it turns out that I forgot the term with the mean in my formula. I updated my post for the next readers.

Statistics: Posted by Romesnil — Wed Mar 20, 2013 9:53 am


]]>
2013-03-20T07:14:38+00:00 2013-03-20T07:14:38+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2410&p=3426#p3426 <![CDATA[Re: Positive Normalized Regret (Average Strategy)]]>
I think I understand what you're saying, but how would using EV instead of regret reduce the memory requirements?

Something I was thinking about today: would it be feasible to update the average strategy in a secondary cycle? i.e. Just load the regret into our tree, do X amount of iterations, updating regrets as usual, but instead of loading/updating the AS, keep a list of the updates encoded as typical training inputs/outputs. After X iterations, save the regrets, then load AS and apply the updates.

For each update it would take (for me anyways) 1 Uint16 (for the hand bucket), and 15 doubles (for the action node updates), for a total of 122 bytes (not counting the list structure overhead). So, that's 8595 node updates for every megabyte of memory and about 8,801,162 updates for every gigabyte. That could be reduced to 32 bytes, if you encoded the updates into, say, Uint16's, reducing the floating point accuracy, giving you about 32k/33m updates per megabyte/gigabyte.

Has anybody tried using just Singles for their AS, and increasing their regret buckets ~1.3x?

Statistics: Posted by cantina — Wed Mar 20, 2013 7:14 am


]]>
2013-03-20T03:38:24+00:00 2013-03-20T03:38:24+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2410&p=3423#p3423 <![CDATA[Re: Positive Normalized Regret (Average Strategy)]]>
First, let me make clear what we expect from the CFRM:
  • We need information about the most profitable action for a bucket
  • We need to know how often each action should be performed
  • We want clearly bad decisions (folding the nuts in NLH on the flop for example) to be excluded as early to reduce the subtrees to investigate and thus speed up the algorithm
  • We want to be able to investigate non-max-EV decisions as they might improve the more the strategies change
  • We want to be able to shortcut brances of the tree - e.g. using sampling/probing/...

Current algorithms store the information in terms of cumulative regret and cumulative strategy (bullet 1+2) and use heuristics for the other bullet points. Now what about we still keep the cumulative strategy, but instead of having the regrets for each action/bucket, we store the cumulative EVs. Wouldn't the result beeing the same, i.e., the highest EV corresponds to the highest regret etc.? If so, having the EV instead of the regret would help us as its more valuable imo:

1. It can be used to calculate an approximation of the game value (within the abstraction)
2. We can better identify/distinguish clearly bad decisions from non-max-EV but +EV decisions. For instance, consider AA in a very short-stacked HU game where we can a) limp, b) call or c) minRaise. We should be able to see quickly, that both b&c are +EV and should be investigated further, while a is out of question
3. [not 100% sure of this] instead of evaluation a subtree 100%, we might be able to use its EV to a certain % as a shortcut.
4. [more a development aspect] its easier to debug as we should be able to see wrong EV values more easily than wrong regrets.

Does anyone thought about this before and especially is my assumption that EV and regrets are basically interchangeable correct?

Statistics: Posted by proud2bBot — Wed Mar 20, 2013 3:38 am


]]>
2013-03-19T16:04:56+00:00 2013-03-19T16:04:56+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2410&p=3413#p3413 <![CDATA[Re: Positive Normalized Regret (Average Strategy)]]> http://en.wikipedia.org/wiki/Algorithms ... _algorithm

Statistics: Posted by spears — Tue Mar 19, 2013 4:04 pm


]]>
2013-03-19T15:39:45+00:00 2013-03-19T15:39:45+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2410&p=3411#p3411 <![CDATA[Re: Positive Normalized Regret (Average Strategy)]]> Statistics: Posted by proud2bBot — Tue Mar 19, 2013 3:39 pm


]]>
2013-03-20T09:51:40+00:00 2013-03-19T09:23:34+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2410&p=3410#p3410 <![CDATA[Re: Positive Normalized Regret (Average Strategy)]]> proud2bBot wrote:

No, to calculate the variance, you'd need to have all the data available, which is not an option.

In fact to calculate the variance, you only need three extra variables. One is the variance itself, another one is the mean and the last one is the size of your dataset use to compute the variance and the mean. Each time, you want to compute your variance with an additional data, you have to apply the following formulas:
Code:
d = x - m
m = m + (d / (n +1))
v = 1/(n+1) * (x - m) * d + n/(n+1) * v

where v is your (new/old) variance, m you (new/old) mean, n the size of your dataset (before adding your new data) and x the data you want to add. d is temporary variable that you don't need to save.

Source : The link posted below by spears

Edit: Fix the formulas and add the source

Statistics: Posted by Romesnil — Tue Mar 19, 2013 9:23 am


]]>
2013-03-19T08:56:33+00:00 2013-03-19T08:56:33+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2410&p=3409#p3409 <![CDATA[Re: Positive Normalized Regret (Average Strategy)]]> Coffee4tw wrote:

I'm not all too familiar with this but here is a random idea:
Can you keep track of variance somehow? That way you'll find the points that change a lot over iterations and the ones that are pretty steady. That would only require one additional data point to store, right?



what exactly do you mean with "keep track" in this case?
You keep track of it already in your favorite tracking tool. In HEM its the value EV bb/100.

Statistics: Posted by winnie — Tue Mar 19, 2013 8:56 am


]]>
2013-03-18T18:52:16+00:00 2013-03-18T18:52:16+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2410&p=3381#p3381 <![CDATA[Re: Positive Normalized Regret (Average Strategy)]]> Statistics: Posted by proud2bBot — Mon Mar 18, 2013 6:52 pm


]]>
2013-03-17T15:38:10+00:00 2013-03-17T15:38:10+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2410&p=3363#p3363 <![CDATA[Re: Positive Normalized Regret (Average Strategy)]]> Can you keep track of variance somehow? That way you'll find the points that change a lot over iterations and the ones that are pretty steady. That would only require one additional data point to store, right?

Statistics: Posted by Coffee4tw — Sun Mar 17, 2013 3:38 pm


]]>
2013-03-17T15:19:52+00:00 2013-03-17T15:19:52+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2410&p=3360#p3360 <![CDATA[Re: Positive Normalized Regret (Average Strategy)]]> 1. We definitely need the regrets, so if we chose between CS and CR, it should be the latter
2. Currently we have both, CS and CR for the reason that CR is changing more often. For instance, between 2 Nash Equilibrium points, we can have something like after 10M iterations regrets are {-1, 1}, after 20M {1,-1}, after 30M {-1,1} and so on. If we stop at a certain point, we only see the snapshot of this point in the regret, but not the development/fluctuation as indicated in the CS
3. If we assume that our algorithm runs long enough, we will see that dominated actions will steadily decrease their regret, dominating actions will increase, but the fluctuating ones will stay within a small boundary. The question is whether we can exploit this without having to introduce data structures that are equal or bigger than CS...

Statistics: Posted by proud2bBot — Sun Mar 17, 2013 3:19 pm


]]>
2013-03-16T23:13:27+00:00 2013-03-16T23:13:27+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2410&p=3321#p3321 <![CDATA[Re: Positive Normalized Regret (Average Strategy)]]> Statistics: Posted by cantina — Sat Mar 16, 2013 11:13 pm


]]>
2013-03-16T23:00:08+00:00 2013-03-16T23:00:08+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2410&p=3320#p3320 <![CDATA[Re: Positive Normalized Regret (Average Strategy)]]> Statistics: Posted by proud2bBot — Sat Mar 16, 2013 11:00 pm


]]>
2013-03-16T22:46:42+00:00 2013-03-16T22:46:42+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2410&p=3318#p3318 <![CDATA[Positive Normalized Regret (Average Strategy)]]> Statistics: Posted by cantina — Sat Mar 16, 2013 10:46 pm


]]>