Poker-AI.org Poker AI and Botting Discussion Forum 2019-02-06T12:45:06+00:00 http://poker-ai.org/phpbb/feed.php?f=24&t=3102 2019-02-06T12:45:06+00:00 2019-02-06T12:45:06+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3102&p=7860#p7860 <![CDATA[Re: Why does the regret strategy not converge to nash direct]]> spears wrote:

Quote:
If we use CFR to calculate our regrets for each decision and then recompute strategy based on those regrets. After billions of iterations, why is the final strategy that we used not one that converges to the nash equilibrium? It's said instead that the average of the strategies do


I assume you are looking for an intuitive explanation rather than a mathematical proof. I find it easier to think about fictitious play FP than CFR.
- Player 1 finds best response (BR) against player 2. Player 1 strategy becomes average of BR and old strategy
- Player 2 finds BR against player 1. Player 2 strategy becomes average of BR and old strategy
- Repeat

The nash equilibrium cannot be a best response because BR is pure and exploitable. The averaging process ensures that given enough iterations the right mix of pure BR is built up to provide the best possible defence. On each successive iteration the change to the average strategy gets smaller. Work through rock, paper scissors on paper to illustrate it to yourself. Hopefully you could apply the intuition found in FP to CFR.


This is intuitive and I used to imagine it like that, but I wonder why in CFR+ for example the regrets also approach a nash equilibrium most of the time.

Statistics: Posted by HontoNiBaka — Wed Feb 06, 2019 12:45 pm


]]>
2017-12-29T16:07:33+00:00 2017-12-29T16:07:33+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3102&p=7498#p7498 <![CDATA[Re: Why does the regret strategy not converge to nash direct]]> HontoNiBaka wrote:

There are well known abstractions, but you can still use your own. I usually use a modified EM abstraction.
You will probably need to use an abstraction, except if you only want to solve the river or turn. If you want to solve the whole game, you will need card abstraction.

The reason why CFR+ isn't well suited for abstractions is, that most of the state of the art ones are imperfect recall. So cards that are in different buckets on the flop, might be in the same bucket on the turn again. I think it can probably be done with CFR+, but you would somehow have to map the probabilities for each bucket back and forth on every new street, which is slow and hard to code. And even that I am not even 100% sure it would work.


I was hoping I could skip abstraction work if I only stuck to one or two bet sizes from the flop onwards. But I'll have to look into it it seems.

One thing I want to mention is that Libratus used MCCFR with no abstraction and 10+ sizes for preflop and flop rounds and then used their subgame solver taking into account the abstraction (although with fewer bet sizes) for the CFR+ strategy to compare with. So I think then it is possible to use CFR+ in conjunction with a pre-computed abstraction at least with their unique nested solver method that they outlined.

Statistics: Posted by mediacalc — Fri Dec 29, 2017 4:07 pm


]]>
2017-12-27T07:35:55+00:00 2017-12-27T07:35:55+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3102&p=7497#p7497 <![CDATA[Re: Why does the regret strategy not converge to nash direct]]> You will probably need to use an abstraction, except if you only want to solve the river or turn. If you want to solve the whole game, you will need card abstraction.

The reason why CFR+ isn't well suited for abstractions is, that most of the state of the art ones are imperfect recall. So cards that are in different buckets on the flop, might be in the same bucket on the turn again. I think it can probably be done with CFR+, but you would somehow have to map the probabilities for each bucket back and forth on every new street, which is slow and hard to code. And even that I am not even 100% sure it would work.

Statistics: Posted by HontoNiBaka — Wed Dec 27, 2017 7:35 am


]]>
2017-12-27T01:05:39+00:00 2017-12-27T01:05:39+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3102&p=7496#p7496 <![CDATA[Re: Why does the regret strategy not converge to nash direct]]> HontoNiBaka wrote:

It's not necesasrily the best algorithm, for instance it's not easy to use if you want to use abstraction techniques, in that case Monte Carlo methods are probably better.

If you don't want to use card abstraction, CFR+ is probably the best algorithm. It can use averaging though, the average still converges faster than the regrets. The reason why averaging is often (not always) ommited, is because you basically only need half the RAM if you don't have to keep track of the average strategies. That is important in cases like solving the full FL game, but if you have smaller games and you have enough RAM anyway, I would still use the averages.


These abstraction techniques you talk about are bucketing by some form of EHS or some EMD calculations right? As in they're well defined and standard and they aren't "unique" per se for each person making an AI? What are the reasons for one using abstraction vs not using it in a game like 6max?

Statistics: Posted by mediacalc — Wed Dec 27, 2017 1:05 am


]]>
2017-12-25T19:15:35+00:00 2017-12-25T19:15:35+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3102&p=7495#p7495 <![CDATA[Re: Why does the regret strategy not converge to nash direct]]> mediacalc wrote:

HontoNiBaka wrote:
I thought CFR+ is currently the best algorithm to use for solving poker, and that it never uses averaging. Am I incorrect here?


It's not necesasrily the best algorithm, for instance it's not easy to use if you want to use abstraction techniques, in that case Monte Carlo methods are probably better.

If you don't want to use card abstraction, CFR+ is probably the best algorithm. It can use averaging though, the average still converges faster than the regrets. The reason why averaging is often (not always) ommited, is because you basically only need half the RAM if you don't have to keep track of the average strategies. That is important in cases like solving the full FL game, but if you have smaller games and you have enough RAM anyway, I would still use the averages.

Statistics: Posted by HontoNiBaka — Mon Dec 25, 2017 7:15 pm


]]>
2017-12-21T23:57:28+00:00 2017-12-21T23:57:28+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3102&p=7487#p7487 <![CDATA[Re: Why does the regret strategy not converge to nash direct]]> spears wrote:

Quote:
If we use CFR to calculate our regrets for each decision and then recompute strategy based on those regrets. After billions of iterations, why is the final strategy that we used not one that converges to the nash equilibrium? It's said instead that the average of the strategies do


I assume you are looking for an intuitive explanation rather than a mathematical proof. I find it easier to think about fictitious play FP than CFR.
- Player 1 finds best response (BR) against player 2. Player 1 strategy becomes average of BR and old strategy
- Player 2 finds BR against player 1. Player 2 strategy becomes average of BR and old strategy
- Repeat

The nash equilibrium cannot be a best response because BR is pure and exploitable. The averaging process ensures that given enough iterations the right mix of pure BR is built up to provide the best possible defence. On each successive iteration the change to the average strategy gets smaller. Work through rock, paper scissors on paper to illustrate it to yourself. Hopefully you could apply the intuition found in FP to CFR.


Great! That explanation was intuitive indeed, it makes sense to me now

Statistics: Posted by mediacalc — Thu Dec 21, 2017 11:57 pm


]]>
2017-12-21T08:28:28+00:00 2017-12-21T08:28:28+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3102&p=7486#p7486 <![CDATA[Re: Why does the regret strategy not converge to nash direct]]> Quote:

If we use CFR to calculate our regrets for each decision and then recompute strategy based on those regrets. After billions of iterations, why is the final strategy that we used not one that converges to the nash equilibrium? It's said instead that the average of the strategies do


I assume you are looking for an intuitive explanation rather than a mathematical proof. I find it easier to think about fictitious play FP than CFR.
- Player 1 finds best response (BR) against player 2. Player 1 strategy becomes average of BR and old strategy
- Player 2 finds BR against player 1. Player 2 strategy becomes average of BR and old strategy
- Repeat

The nash equilibrium cannot be a best response because BR is pure and exploitable. The averaging process ensures that given enough iterations the right mix of pure BR is built up to provide the best possible defence. On each successive iteration the change to the average strategy gets smaller. Work through rock, paper scissors on paper to illustrate it to yourself. Hopefully you could apply the intuition found in FP to CFR.

Statistics: Posted by spears — Thu Dec 21, 2017 8:28 am


]]>
2017-12-21T00:55:32+00:00 2017-12-21T00:55:32+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3102&p=7485#p7485 <![CDATA[Re: Why does the regret strategy not converge to nash direct]]> HontoNiBaka wrote:

The reason why the average strategy converges and regrets don't is basically just math. There is a proof in either the Polaris or Hyperborean paper by the University of Alberta.
CFR+ regrets often converge to a Nash Equilibrium, because they use a different form of regret matching. I think noone really knows why it's so much better, they just found out by testing it.


I feared as much. I have no statistical experience so I'm unable to understand those proofs. :?

You say CFR+ often converges, which is something that the researcher said also. I thought CFR+ is currently the best algorithm to use for solving poker, and that it never uses averaging. Am I incorrect here?

Statistics: Posted by mediacalc — Thu Dec 21, 2017 12:55 am


]]>
2017-12-21T00:04:40+00:00 2017-12-21T00:04:40+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3102&p=7484#p7484 <![CDATA[Re: Why does the regret strategy not converge to nash direct]]> CFR+ regrets often converge to a Nash Equilibrium, because they use a different form of regret matching. I think noone really knows why it's so much better, they just found out by testing it.

Statistics: Posted by HontoNiBaka — Thu Dec 21, 2017 12:04 am


]]>
2017-12-20T22:53:11+00:00 2017-12-20T22:53:11+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=3102&p=7483#p7483 <![CDATA[Why does the regret strategy not converge to nash directly?]]>
The question is based on the comment by Michael Johanson, a researcher from UoA with published papers on this:
"...It then recomputes its strategy so that it takes actions with probabilities proportional to their positive regret... It repeats this process for billions of games. So you have this long sequence of strategies that it was using on each game. Counter-intuitively, that sequence of strategies does not necessarily converge to anything useful (although it sometimes does so in practice, now, with the new CFR+ algorithm we describe in the Science paper)."

And what about the CFR+ algorithm allows the final strategy to be used?

Cheers!

Statistics: Posted by mediacalc — Wed Dec 20, 2017 10:53 pm


]]>