Nasher wrote:
I thought I would mention this to you guys because in other areas I've stated that I just let my threads collide when doing updates with CFRM.
I found, at least for my setup, this can have disastrous consequences. I don't know how exactly, but some of the more frequently accessed cumulative regrets were becoming (maximally) negative as a result of collision (which would be impossible otherwise because you're only adding positive numbers). As soon as I implemented sync locking for the updates, it didn't happen again.
First of all thank you for sharing this. I will add an occasional check for negative cumulative strategies in addition to my checks for NaN and +/- Inf
How did you come up with that observation? Did you check for it because ... You know ... It's impossible to happen? Or did you observe some degenerations somewhere in the preflop strategy? Good work!
I am curious about more details:
(1) You stated this happens to nodes that are frequently visited. Do you know whether that happens to actions that:
- are intuitively a strategic reasonable choice or
- are intuitively a strategic bad choice
- completely randomly
(2) At what point of your simulation did you observe this phenomenon? After a few iterations or after rather plenty iterations? If known: Would you mind sharing the amount of total iterations completed?
(3) Just for clearance: you are using Doubles for the cumulative regret and the average strategy?
EDIT: (4) What do you mean by maximaly negative? Double.NegativeInfinity?
EDIT: Just a quick thought on numerical fuckups:
Assume a very long path with a sampling probabilities p_opp, p_train. In the Showdown nodes utilities are scaled by the opponent player's sampling probability (return u/p_opp). Now we have 3 possible cases:
Case A: p_opp is 'big' enough to not cause problems - a reasonable value is return
Case B: p_opp is 0.0 - impossible, since this node would not be sampled then
Case C: p_opp is slightly (really really slightly larger than 0.0). Now a very large (but still reasonable) value is returned.
In a node's ev computation we have something like
ev = (sum over sampled ev_i) / sampledActions
Since the ev_i are very big now you might geht Infinity in the estimated node's ev and then, during regret update (ev_i - Infinity), substract Infinity which would lead to -Infinity in the cumulative profile
Even if the value of ev is not Infinity yet - since the buckets are (relatively frequently) accessed, they might sum up huge negative values (since ev_i - ev : speak estimated ev of an action minus the sampled ev of that node = sampled regret : can be negative)
But all that does not explain why you are not facing the problem when using syncs ... Maybe you just run lucky this time
My advice would be to check the return value in Showdown-nodes and the sampled ev in decision nodes for reasonable values