Poker-AI.org • View topic - Why does the regret strategy not converge to nash directly?

View unanswered posts | View active topics

Board index » Public Forums » AI Research

All times are UTC

Why does the regret strategy not converge to nash directly?

Page 1 of 1

[ 10 posts ]

Print view

Previous topic | Next topic

Author

Message

mediacalc

Post subject: Why does the regret strategy not converge to nash directly?

Posted: Wed Dec 20, 2017 10:53 pm

Junior Member

Joined: Sat May 27, 2017 2:35 pm
Posts: 33

If we use CFR to calculate our regrets for each decision and then recompute strategy based on those regrets. After billions of iterations, why is the final strategy that we used not one that converges to the nash equilibrium? It's said instead that the average of the strategies do

The question is based on the comment by Michael Johanson, a researcher from UoA with published papers on this:
"...It then recomputes its strategy so that it takes actions with probabilities proportional to their positive regret... It repeats this process for billions of games. So you have this long sequence of strategies that it was using on each game. Counter-intuitively, that sequence of strategies does not necessarily converge to anything useful (although it sometimes does so in practice, now, with the new CFR+ algorithm we describe in the Science paper)."

And what about the CFR+ algorithm allows the final strategy to be used?

Cheers!

Top

HontoNiBaka

Post subject: Re: Why does the regret strategy not converge to nash direct

Posted: Thu Dec 21, 2017 12:04 am

Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267

The reason why the average strategy converges and regrets don't is basically just math. There is a proof in either the Polaris or Hyperborean paper by the University of Alberta.
CFR+ regrets often converge to a Nash Equilibrium, because they use a different form of regret matching. I think noone really knows why it's so much better, they just found out by testing it.

Top

mediacalc

Post subject: Re: Why does the regret strategy not converge to nash direct

Posted: Thu Dec 21, 2017 12:55 am

Junior Member

Joined: Sat May 27, 2017 2:35 pm
Posts: 33

HontoNiBaka wrote:

I feared as much. I have no statistical experience so I'm unable to understand those proofs.

You say CFR+ often converges, which is something that the researcher said also. I thought CFR+ is currently the best algorithm to use for solving poker, and that it never uses averaging. Am I incorrect here?

Top

spears

Post subject: Re: Why does the regret strategy not converge to nash direct

Posted: Thu Dec 21, 2017 8:28 am

Site Admin

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642

Quote:

I assume you are looking for an intuitive explanation rather than a mathematical proof. I find it easier to think about fictitious play FP than CFR.
- Player 1 finds best response (BR) against player 2. Player 1 strategy becomes average of BR and old strategy
- Player 2 finds BR against player 1. Player 2 strategy becomes average of BR and old strategy
- Repeat

The nash equilibrium cannot be a best response because BR is pure and exploitable. The averaging process ensures that given enough iterations the right mix of pure BR is built up to provide the best possible defence. On each successive iteration the change to the average strategy gets smaller. Work through rock, paper scissors on paper to illustrate it to yourself. Hopefully you could apply the intuition found in FP to CFR.

Top

mediacalc

Post subject: Re: Why does the regret strategy not converge to nash direct

Posted: Thu Dec 21, 2017 11:57 pm

Junior Member

Joined: Sat May 27, 2017 2:35 pm
Posts: 33

spears wrote:

Quote:

Great! That explanation was intuitive indeed, it makes sense to me now

Top

HontoNiBaka

Post subject: Re: Why does the regret strategy not converge to nash direct

Posted: Mon Dec 25, 2017 7:15 pm

Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267

mediacalc wrote:

HontoNiBaka wrote:

I thought CFR+ is currently the best algorithm to use for solving poker, and that it never uses averaging. Am I incorrect here?

It's not necesasrily the best algorithm, for instance it's not easy to use if you want to use abstraction techniques, in that case Monte Carlo methods are probably better.

If you don't want to use card abstraction, CFR+ is probably the best algorithm. It can use averaging though, the average still converges faster than the regrets. The reason why averaging is often (not always) ommited, is because you basically only need half the RAM if you don't have to keep track of the average strategies. That is important in cases like solving the full FL game, but if you have smaller games and you have enough RAM anyway, I would still use the averages.

Top

mediacalc

Post subject: Re: Why does the regret strategy not converge to nash direct

Posted: Wed Dec 27, 2017 1:05 am

Junior Member

Joined: Sat May 27, 2017 2:35 pm
Posts: 33

HontoNiBaka wrote:

These abstraction techniques you talk about are bucketing by some form of EHS or some EMD calculations right? As in they're well defined and standard and they aren't "unique" per se for each person making an AI? What are the reasons for one using abstraction vs not using it in a game like 6max?

Top

HontoNiBaka

Post subject: Re: Why does the regret strategy not converge to nash direct

Posted: Wed Dec 27, 2017 7:35 am

Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267

There are well known abstractions, but you can still use your own. I usually use a modified EM abstraction.
You will probably need to use an abstraction, except if you only want to solve the river or turn. If you want to solve the whole game, you will need card abstraction.

The reason why CFR+ isn't well suited for abstractions is, that most of the state of the art ones are imperfect recall. So cards that are in different buckets on the flop, might be in the same bucket on the turn again. I think it can probably be done with CFR+, but you would somehow have to map the probabilities for each bucket back and forth on every new street, which is slow and hard to code. And even that I am not even 100% sure it would work.

Top

mediacalc

Post subject: Re: Why does the regret strategy not converge to nash direct

Posted: Fri Dec 29, 2017 4:07 pm

Junior Member

Joined: Sat May 27, 2017 2:35 pm
Posts: 33

HontoNiBaka wrote:

I was hoping I could skip abstraction work if I only stuck to one or two bet sizes from the flop onwards. But I'll have to look into it it seems.

One thing I want to mention is that Libratus used MCCFR with no abstraction and 10+ sizes for preflop and flop rounds and then used their subgame solver taking into account the abstraction (although with fewer bet sizes) for the CFR+ strategy to compare with. So I think then it is possible to use CFR+ in conjunction with a pre-computed abstraction at least with their unique nested solver method that they outlined.

Top

HontoNiBaka

Post subject: Re: Why does the regret strategy not converge to nash direct

Posted: Wed Feb 06, 2019 12:45 pm

Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267

spears wrote:

Quote:

This is intuitive and I used to imagine it like that, but I wonder why in CFR+ for example the regrets also approach a nash equilibrium most of the time.

Top

Page 1 of 1

[ 10 posts ]

Board index » Public Forums » AI Research

All times are UTC

Who is online

Users browsing this forum: No registered users and 1 guest

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum