I hope not to derail the discussion from the variance idea, but I just thought about a different approach and didn't want to open a new thread.
First, let me make clear what we expect from the CFRM:
- We need information about the most profitable action for a bucket
- We need to know how often each action should be performed
- We want clearly bad decisions (folding the nuts in NLH on the flop for example) to be excluded as early to reduce the subtrees to investigate and thus speed up the algorithm
- We want to be able to investigate non-max-EV decisions as they might improve the more the strategies change
- We want to be able to shortcut brances of the tree - e.g. using sampling/probing/...
Current algorithms store the information in terms of cumulative regret and cumulative strategy (bullet 1+2) and use heuristics for the other bullet points. Now what about we still keep the cumulative strategy, but instead of having the regrets for each action/bucket, we store the cumulative EVs. Wouldn't the result beeing the same, i.e., the highest EV corresponds to the highest regret etc.? If so, having the EV instead of the regret would help us as its more valuable imo:
1. It can be used to calculate an approximation of the game value (within the abstraction)
2. We can better identify/distinguish clearly bad decisions from non-max-EV but +EV decisions. For instance, consider AA in a very short-stacked HU game where we can a) limp, b) call or c) minRaise. We should be able to see quickly, that both b&c are +EV and should be investigated further, while a is out of question
3. [not 100% sure of this] instead of evaluation a subtree 100%, we might be able to use its EV to a certain % as a shortcut.
4. [more a development aspect] its easier to debug as we should be able to see wrong EV values more easily than wrong regrets.
Does anyone thought about this before and especially is my assumption that EV and regrets are basically interchangeable correct?