I don't think I fully understand how CFRM is supposed to work. I wasn't able to get the whole procedure from Alberta papers but maybe I missed some crucial one, please feel free to link me if there is something which describes it well.
I will describe my understanding of the algorithm (first CFRM and then MCFRM) and ask questions throughout. Feel free to comment on anything/criticize and especially answer my questions
Let's say I want to solve the following simple game:
-2players
-action start on the turn, let's assume it's: Kh 8s 2c 2d
-starting ranges are small so one bucket = one combo (to make it all easier to think about)
-first player can bet (the pot) or check on the turn and river
-second player can only check back, call or fold
Now, before first iteration I need to set some arbitrary strategies to start from.
Let's say 1st player bets top 40% of his range in every node and check the rest and 2nd player calls 40% of his range.
Q1)is this starting strategy correct or do I need to have every hand with some probability in every path ? (so say KsKd bets with probability 50% and checks with probability 50% on the flop ?)
Now I wan to do a step for 1st player. My understanding is that I need to pick a bucket (is that correct or do I do that for all buckets in one step ?), let's say it's 6h5h. Current strategy is to check it on both streets. How do I go about calculating CFR ? My understanding is that I try all possible paths (so bet and check) and the difference between current strategy and most exploitive (vs current 2nd player strategy) is CFR
Q2) is that understanding correct ?
Q3) if so what do on river nodes ? (there are 48 of them). Do I choose the best course of action for given hand ? Or current strategy in those nodes ?
So now when I calculate the CFR (I surely made many mistakes in my description but at this point assume I have it to be able to ask more questions
). It turned out that it's profitable to bet 6h5h on both turn and all rivers. So now my understanding is that I take this new (max exploitive) strategy and calculate average (weighted average with weight 1/n for nth step) of it and current strategy ?
Q4) is that correct way ?
Q5) do I change all the nodes ? (so turn and all rivers) or just the turn ?
Now, after first step strategy for 6h5h should be bet 50% and check 50% on the turn and all rivers (because average) and I was to repeat that step it would be 66% for bet and 33% for check.
This procedure seem kinda expensive for big trees (calculating max exploitive path for a bucket is not fast) but in one Alberta paper the author claimed he got 700iterations/second on some ancient hardware. This make me think I don't understand fully what he is doing.
It turned longer than expected so I will leave questions about MCFRM for now as I would like to make sure my understanding of basic CFRM is sufficient.
Any advice/comments much appreciated !