Haha, well, I played the above strategy, after ~ 130m iterations, against my other strategy (EHS2 based; ~ 130m iterations) and it lost by about -0.20bb/hand after about 300k games.
So, maybe it needs awhile to converge, or maybe there's a bug, or maybe it just sucks. Something I did notice is that the upper/lower bounds on the average EVs seem to be shrinking, which would shift the buckets into a narrower range. I'm going to let it run a day or two longer and see if it improves.
Two things:
- The upper and lower EV bounds are constantly changing with the strategy, this causes hands to change buckets. Those bounds are also the outlier hands (i.e. the hands that are encountered infrequently). The EVs of hands shrink as the strategy converges to EQ, but while the common hands get updated immediately, the outliers get updated last. So, there is a non-linear, non-uniform distribution occurring, whereby the common hands are grouped exponentially closer together until the strategy, by luck, updates or finds a new bound.
- Along with the strategy itself changing, which defines a hand's EV, due to variance, the EV of a hand is initially fickle. i.e. It's true value at showdown takes many trials to determine. Metrics like EHS2 do this work beforehand, so that variance is nullified when defining a hand's bucket.
Ultimately, you have hands that initially jump around in value a lot, or are grouped in a non-uniform way, making updating regrets a noisy business.
Improvement attempts:
Instead of using the absolute min/max EV bounds to normalize a hand's EV (to determine it's appropriate regret bucket), I'm instead using the min/max bucket EV, which is the (weighted) average EV of hands placed into the upper and lower regret buckets. If these bounds aren't defined yet or the hand has no assigned EV, which is the case at the beginning, EHS2 is used to decide the regret bucket.
I'm also updating everything in a weighted fashion now, so newer updates take precedence over the older values.
I shall call this technique: Dynamically Updating Multiple Buckets with Average Strategy Sampling (DUMBASS).
My DUMBASS bucket selection method now looks like so:
Code:
int bucket = (ev_observations(hand) == 0 || lower_bucket_ev_count == 0 || upper_bucket_ev_count == 0) ? Math.Round(max_buckets * ehs2(hand)) : (ev_observations(hand) > 100000) ? Math.Round(max_buckets * ((0.99999 * (ev(hand) - lower_bucket_ev) / (upper_bucket_ev - lower_bucket_ev)) + (0.00001 * ehs2(hand)))) : Math.Round(max_buckets * (((1 - (1 / ev_observations(hand)) * (ev(hand) - lower_bucket_ev) / (upper_bucket_ev - lower_bucket_ev)) + (1 / ev_observations(hand) * ehs2(hand))));