Poker-AI.org

Re: Best Response Sampling

2015-03-01T10:40:40+00:00

Just saw this thread again, the problem with OP's approach is: you will really get a best response, but it's not the unabstracted game best response, it is the abstract game best response. In other words you will get the best response which you can represent with the buckets which you used in your CFRM, the EV of that will also only be viable in your abstraction, it might be much worse in the real game.

You will have to do the unabstracted best response in order to know your real game exploitability.

Statistics: Posted by HontoNiBaka — Sun Mar 01, 2015 10:40 am

Re: Best Response Sampling

2014-10-07T01:34:21+00:00

Why is implementing a best response ugly? You will walk the real game tree in BR anyway, so the abstraction you used in CFRM doesn't matter much.

Statistics: Posted by HontoNiBaka — Tue Oct 07, 2014 1:34 am

Re: Best Response Sampling

2014-05-12T15:59:38+00:00

Yes, you'll get a best response. Thats basically how CFRM works: if you learn both player, they are readjusting to a best response versus each other, which leads to a nash equilibrium.

Statistics: Posted by proud2bBot — Mon May 12, 2014 3:59 pm

Re: Best Response Sampling

2014-05-12T08:24:26+00:00

Thanks for your answer spears.
I tested it with khun poker and it seems to work there at least. I'm asking because I really want to avoid implementing best response for imperfect recall bucketing holdem, because that's ugly.
Imo it totally makes sense that cfrm should converge to a best response if we train only one player. But I hope someone can confirm it? Or am I missing the obvious very easy way to test it, without calculating the real best response within the abstraction?

Statistics: Posted by flopnflush — Mon May 12, 2014 8:24 am

Re: Best Response Sampling

2014-05-01T09:27:30+00:00

Yes, I think I did that a long time ago. Very easy to test though...

Statistics: Posted by spears — Thu May 01, 2014 9:27 am

Best Response Sampling

2014-04-30T12:12:37+00:00

Hey,
if we use Monte-Carlo-CFR, and train only one player while holding the strategy of the opponent constant, will it converge to a best response?

Statistics: Posted by flopnflush — Wed Apr 30, 2014 12:12 pm