When running CFRM (I'm using ASS, where the problem is bigger than in regular CFRM I guess, but there its still existent), we obtain the e-Nash solution by looking at the average strategy (derived from the cumulative strategy stored in the nodes). In practice, I find nodes where after 1B iterations, its a clear decision, but the strategy does for example 99.1% Action X and 0.9% Action Y, because at the beginning of the learning process, Y seems to be good but as the convergence progresses, it showed that X is better. I wonder if we can somehow clean up these results, either during the process or after it. The latter case is easy: given we have a situation like that (e.g., AA in my 25bb game is raised as first action 99.2% and called 0.8%), we make sure the dominant action is over a certain threshold (e.g. 99%) and all other actions have a large negative cumulative regret. However, this makes our strategy a bit more "readable" and probably also a bit better, but we miss a lot of spots close to the threshold (e.g. 98.9%, while the regrets for all others are hugely negative). My idea is that it would be better to value regrets/strategies higher, that are added in a more recent iteration (obv. not a single one). For instance, in our example, there are 1B iterations and almost all calls come from the first ones where we didn't have a clue how we play on later decisions. If we could find a function that values more recent results slightly higher (continuously), our strategy should converge faster; however we cannot overdo this as otherwise we would probably make a mistake when switching back and forth between two equilibrium points. The question I have is does someone implement such an approach and which function/heuristic are you using? Or has anyone thoughts on the general approach, i.e., do you think it will work/won't work? It's not only the 0.x% difference that makes me interested in this optimization, but also speeding up the algorithm itself: if we have the AA situation described above, the Call node is always evaluated too, as long its % is larger than 0,1%, so finding out faster that calling is worse than raising enables us to cut the complete call-branch from further inspection.
|