Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 4:45 pm

All times are UTC




Post new topic Reply to topic  [ 10 posts ] 
Author Message
PostPosted: Wed Dec 20, 2017 10:53 pm 
Offline
Junior Member

Joined: Sat May 27, 2017 2:35 pm
Posts: 33
If we use CFR to calculate our regrets for each decision and then recompute strategy based on those regrets. After billions of iterations, why is the final strategy that we used not one that converges to the nash equilibrium? It's said instead that the average of the strategies do

The question is based on the comment by Michael Johanson, a researcher from UoA with published papers on this:
"...It then recomputes its strategy so that it takes actions with probabilities proportional to their positive regret... It repeats this process for billions of games. So you have this long sequence of strategies that it was using on each game. Counter-intuitively, that sequence of strategies does not necessarily converge to anything useful (although it sometimes does so in practice, now, with the new CFR+ algorithm we describe in the Science paper)."

And what about the CFR+ algorithm allows the final strategy to be used?

Cheers!


Top
 Profile  
 
PostPosted: Thu Dec 21, 2017 12:04 am 
Offline
Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267
The reason why the average strategy converges and regrets don't is basically just math. There is a proof in either the Polaris or Hyperborean paper by the University of Alberta.
CFR+ regrets often converge to a Nash Equilibrium, because they use a different form of regret matching. I think noone really knows why it's so much better, they just found out by testing it.


Top
 Profile  
 
PostPosted: Thu Dec 21, 2017 12:55 am 
Offline
Junior Member

Joined: Sat May 27, 2017 2:35 pm
Posts: 33
HontoNiBaka wrote:
The reason why the average strategy converges and regrets don't is basically just math. There is a proof in either the Polaris or Hyperborean paper by the University of Alberta.
CFR+ regrets often converge to a Nash Equilibrium, because they use a different form of regret matching. I think noone really knows why it's so much better, they just found out by testing it.


I feared as much. I have no statistical experience so I'm unable to understand those proofs. :?

You say CFR+ often converges, which is something that the researcher said also. I thought CFR+ is currently the best algorithm to use for solving poker, and that it never uses averaging. Am I incorrect here?


Top
 Profile  
 
PostPosted: Thu Dec 21, 2017 8:28 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
Quote:
If we use CFR to calculate our regrets for each decision and then recompute strategy based on those regrets. After billions of iterations, why is the final strategy that we used not one that converges to the nash equilibrium? It's said instead that the average of the strategies do


I assume you are looking for an intuitive explanation rather than a mathematical proof. I find it easier to think about fictitious play FP than CFR.
- Player 1 finds best response (BR) against player 2. Player 1 strategy becomes average of BR and old strategy
- Player 2 finds BR against player 1. Player 2 strategy becomes average of BR and old strategy
- Repeat

The nash equilibrium cannot be a best response because BR is pure and exploitable. The averaging process ensures that given enough iterations the right mix of pure BR is built up to provide the best possible defence. On each successive iteration the change to the average strategy gets smaller. Work through rock, paper scissors on paper to illustrate it to yourself. Hopefully you could apply the intuition found in FP to CFR.


Top
 Profile  
 
PostPosted: Thu Dec 21, 2017 11:57 pm 
Offline
Junior Member

Joined: Sat May 27, 2017 2:35 pm
Posts: 33
spears wrote:
Quote:
If we use CFR to calculate our regrets for each decision and then recompute strategy based on those regrets. After billions of iterations, why is the final strategy that we used not one that converges to the nash equilibrium? It's said instead that the average of the strategies do


I assume you are looking for an intuitive explanation rather than a mathematical proof. I find it easier to think about fictitious play FP than CFR.
- Player 1 finds best response (BR) against player 2. Player 1 strategy becomes average of BR and old strategy
- Player 2 finds BR against player 1. Player 2 strategy becomes average of BR and old strategy
- Repeat

The nash equilibrium cannot be a best response because BR is pure and exploitable. The averaging process ensures that given enough iterations the right mix of pure BR is built up to provide the best possible defence. On each successive iteration the change to the average strategy gets smaller. Work through rock, paper scissors on paper to illustrate it to yourself. Hopefully you could apply the intuition found in FP to CFR.


Great! That explanation was intuitive indeed, it makes sense to me now


Top
 Profile  
 
PostPosted: Mon Dec 25, 2017 7:15 pm 
Offline
Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267
mediacalc wrote:
HontoNiBaka wrote:
I thought CFR+ is currently the best algorithm to use for solving poker, and that it never uses averaging. Am I incorrect here?


It's not necesasrily the best algorithm, for instance it's not easy to use if you want to use abstraction techniques, in that case Monte Carlo methods are probably better.

If you don't want to use card abstraction, CFR+ is probably the best algorithm. It can use averaging though, the average still converges faster than the regrets. The reason why averaging is often (not always) ommited, is because you basically only need half the RAM if you don't have to keep track of the average strategies. That is important in cases like solving the full FL game, but if you have smaller games and you have enough RAM anyway, I would still use the averages.


Top
 Profile  
 
PostPosted: Wed Dec 27, 2017 1:05 am 
Offline
Junior Member

Joined: Sat May 27, 2017 2:35 pm
Posts: 33
HontoNiBaka wrote:

It's not necesasrily the best algorithm, for instance it's not easy to use if you want to use abstraction techniques, in that case Monte Carlo methods are probably better.

If you don't want to use card abstraction, CFR+ is probably the best algorithm. It can use averaging though, the average still converges faster than the regrets. The reason why averaging is often (not always) ommited, is because you basically only need half the RAM if you don't have to keep track of the average strategies. That is important in cases like solving the full FL game, but if you have smaller games and you have enough RAM anyway, I would still use the averages.


These abstraction techniques you talk about are bucketing by some form of EHS or some EMD calculations right? As in they're well defined and standard and they aren't "unique" per se for each person making an AI? What are the reasons for one using abstraction vs not using it in a game like 6max?


Top
 Profile  
 
PostPosted: Wed Dec 27, 2017 7:35 am 
Offline
Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267
There are well known abstractions, but you can still use your own. I usually use a modified EM abstraction.
You will probably need to use an abstraction, except if you only want to solve the river or turn. If you want to solve the whole game, you will need card abstraction.

The reason why CFR+ isn't well suited for abstractions is, that most of the state of the art ones are imperfect recall. So cards that are in different buckets on the flop, might be in the same bucket on the turn again. I think it can probably be done with CFR+, but you would somehow have to map the probabilities for each bucket back and forth on every new street, which is slow and hard to code. And even that I am not even 100% sure it would work.


Top
 Profile  
 
PostPosted: Fri Dec 29, 2017 4:07 pm 
Offline
Junior Member

Joined: Sat May 27, 2017 2:35 pm
Posts: 33
HontoNiBaka wrote:
There are well known abstractions, but you can still use your own. I usually use a modified EM abstraction.
You will probably need to use an abstraction, except if you only want to solve the river or turn. If you want to solve the whole game, you will need card abstraction.

The reason why CFR+ isn't well suited for abstractions is, that most of the state of the art ones are imperfect recall. So cards that are in different buckets on the flop, might be in the same bucket on the turn again. I think it can probably be done with CFR+, but you would somehow have to map the probabilities for each bucket back and forth on every new street, which is slow and hard to code. And even that I am not even 100% sure it would work.


I was hoping I could skip abstraction work if I only stuck to one or two bet sizes from the flop onwards. But I'll have to look into it it seems.

One thing I want to mention is that Libratus used MCCFR with no abstraction and 10+ sizes for preflop and flop rounds and then used their subgame solver taking into account the abstraction (although with fewer bet sizes) for the CFR+ strategy to compare with. So I think then it is possible to use CFR+ in conjunction with a pre-computed abstraction at least with their unique nested solver method that they outlined.


Top
 Profile  
 
PostPosted: Wed Feb 06, 2019 12:45 pm 
Offline
Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267
spears wrote:
Quote:
If we use CFR to calculate our regrets for each decision and then recompute strategy based on those regrets. After billions of iterations, why is the final strategy that we used not one that converges to the nash equilibrium? It's said instead that the average of the strategies do


I assume you are looking for an intuitive explanation rather than a mathematical proof. I find it easier to think about fictitious play FP than CFR.
- Player 1 finds best response (BR) against player 2. Player 1 strategy becomes average of BR and old strategy
- Player 2 finds BR against player 1. Player 2 strategy becomes average of BR and old strategy
- Repeat

The nash equilibrium cannot be a best response because BR is pure and exploitable. The averaging process ensures that given enough iterations the right mix of pure BR is built up to provide the best possible defence. On each successive iteration the change to the average strategy gets smaller. Work through rock, paper scissors on paper to illustrate it to yourself. Hopefully you could apply the intuition found in FP to CFR.


This is intuitive and I used to imagine it like that, but I wonder why in CFR+ for example the regrets also approach a nash equilibrium most of the time.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
cron
Powered by phpBB® Forum Software © phpBB Group