Hi !
I'm very aware of Cepheus, Libratus and DeepStack performances and techniques, so I'm not asking about these : I'm not interested in distributed computation for now.
Instead, I'm wondering what would be the state of art CFR algorithm / bucketting techniques / exploitability estimation for smaller abstractions, like the one PioSolver and such tools use to compute strategies on laptops.
CFRM AlgorithmI have currently a CS-CFRM implementation because it was easy for my toy games, but now it's not enough. It seems that the latest implementations (CFR+, Libratus, DeepStack) are designed for heavy distributed computing so it won't fit, Pure CFR is only for heads-up (or at least its good performance relies on two players zero sum games if I remember well), and I read every other CFR variants papers but some time ago and I can't remember what was what, how it performed, and who published it
If you're working on not-distributed abstractions, what would be your choice ?
BuckettingI have standard EHS/EHS^2 and OCHS implementations, I planed to implement Hand Strength Distribution but I think I read it performs worse than OCHS (right?).
But I also remember there was a "recursive buckets transition probability vectors" with L2 distance bucketting technique that was more efficient, is that right (and where) ? Are there better (not distributed) techniques ?
ExploitabilityHere I'm not talking about the exploitability of an abstracted game strategy IN the full game. I'm interested in estimating the exploitability of my converging strategies into the abstracted game itself.
I know I can fix a player's strategy and make the other players converge to the best response, but as I'm looking for the exploitability to estimate the convergence, I don't want to have to monitor the exploitability computation convergence (it's endless).
So of course I had a look to "Accelerating Best Response Calculation in Large Extensive Games", and
http://poker-ai.org/archive/www.pokerai ... =64&t=4265 . But it's a bit old and it seems more for computing exploitability of the abstracted game in the full game (I guess).
How do you do this ? Am I doomed to compute a full best-response the recursive way (still propagating relevant vectors
) if I don't want to sample ?
I know I could find some answers by re-reading all the papers I read the last years (and re-reading the whole forum again), but if you have 30s to offer me guidance, I'll owe you one
PS : for the few who used my repository, it's offline now but it'll be back in the coming weeks