spears wrote:
Fossana wrote:
I figured out one of the issues. In my best response calculations, when I dealt a river card, I was dividing the EV of each hand by 48 (52 - 4 cards on board), but you need to divide it by 44 for some reason (52 - 4 cards on board - 4 cards in player's hands). Sort of makes sense but at the same time it doesn't.
Does that solve the (lack of) convergence problem?
It converges to a lower level of exploitability now but it still gets stuck pretty quickly. I’ve compared my best response calculations with piosolver and I seem to be getting the same EVs, but dividing by 44 instead of 48 when dealing the river can’t possibly be right. Dividing by 44 must be making up for some other mistake in my best response code.
I may implement vanilla cfr just to test my best response when the opponent’s strategy isn’t uniform mixed like it is before any cfr iterations.
When calculating best response, do you need to weight the probability of cards being dealt by the opponent’s strategy? Like if you get to a node where the opponent gets to that node with the majority of their Ax combos, then having an equal chance of dealing an A as other cards doesn’t make much sense.