Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 2:06 pm

All times are UTC




Post new topic Reply to topic  [ 6 posts ] 
Author Message
PostPosted: Mon Sep 04, 2017 7:56 pm 
Offline
Junior Member

Joined: Wed Dec 04, 2013 12:40 am
Posts: 49
Hi !

I'm very aware of Cepheus, Libratus and DeepStack performances and techniques, so I'm not asking about these : I'm not interested in distributed computation for now.

Instead, I'm wondering what would be the state of art CFR algorithm / bucketting techniques / exploitability estimation for smaller abstractions, like the one PioSolver and such tools use to compute strategies on laptops.

CFRM Algorithm

I have currently a CS-CFRM implementation because it was easy for my toy games, but now it's not enough. It seems that the latest implementations (CFR+, Libratus, DeepStack) are designed for heavy distributed computing so it won't fit, Pure CFR is only for heads-up (or at least its good performance relies on two players zero sum games if I remember well), and I read every other CFR variants papers but some time ago and I can't remember what was what, how it performed, and who published it :P
If you're working on not-distributed abstractions, what would be your choice ?


Bucketting

I have standard EHS/EHS^2 and OCHS implementations, I planed to implement Hand Strength Distribution but I think I read it performs worse than OCHS (right?).
But I also remember there was a "recursive buckets transition probability vectors" with L2 distance bucketting technique that was more efficient, is that right (and where) ? Are there better (not distributed) techniques ?

Exploitability

Here I'm not talking about the exploitability of an abstracted game strategy IN the full game. I'm interested in estimating the exploitability of my converging strategies into the abstracted game itself.
I know I can fix a player's strategy and make the other players converge to the best response, but as I'm looking for the exploitability to estimate the convergence, I don't want to have to monitor the exploitability computation convergence (it's endless).
So of course I had a look to "Accelerating Best Response Calculation in Large Extensive Games", and http://poker-ai.org/archive/www.pokerai ... =64&t=4265 . But it's a bit old and it seems more for computing exploitability of the abstracted game in the full game (I guess).
How do you do this ? Am I doomed to compute a full best-response the recursive way (still propagating relevant vectors ;) ) if I don't want to sample ?


I know I could find some answers by re-reading all the papers I read the last years (and re-reading the whole forum again), but if you have 30s to offer me guidance, I'll owe you one ;)

PS : for the few who used my repository, it's offline now but it'll be back in the coming weeks ;)


Top
 Profile  
 
PostPosted: Wed Sep 06, 2017 10:20 pm 
Offline
Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267
When it comes to abstraction, the potential aware of The uni of Alberta is the best that I know for the flop and turn, you should be able to find the paper on Alberta's site.

You can not compute the real game exploitability of your NL strategy. The game is way too big. The Alberta paper actually can be used to compute the real game exploitability, it was used for the real game of FL and could also be used for NL, but there is no computer powerful enough to do it.


Top
 Profile  
 
PostPosted: Thu Sep 07, 2017 8:32 am 
Offline
Junior Member

Joined: Wed Dec 04, 2013 12:40 am
Posts: 49
Yes! It was the abstraction I was looking for and so wrongly described :roll: Thx !

About exploitability indeed the real game one is out of scope, so how do you estimate the exploitability / convergence progress for the abstracted game only ?

For CFR, what would be your choice ? After all maybe vanilla CFR+ updating players one by one is the way to go..? What about the old PCS ..?


Top
 Profile  
 
PostPosted: Fri Sep 08, 2017 9:27 pm 
Offline
Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267
Personaly I do not measure the exploitability, I just use roughly the same amount of iterations per mb of my abstraction as some ACPC bots did. Better convergence in the abstract game can sometimes mean worse performance in the real game and it's very hard to use Alberta's best response with imperfect recall buckets. But if you use perfect recall buckets, you can use the same technique as they used for FL. In the end what they describe works on any game tree, so if it's FL or NL doesn't matter.

I implemented the thing where you let one strategy converge to a BR with the help of CFR, but it takes a long time untill it converges.

As for CFR, I use Pure CFR. Afaik CFR+ does not work well with imperfect recall buckets either. Are you planning on creating a multiway bot with CFR?


Top
 Profile  
 
PostPosted: Sat Sep 09, 2017 8:07 am 
Offline
Junior Member

Joined: Wed Dec 04, 2013 12:40 am
Posts: 49
I'm not into botting now, I gave it a shot some years ago and it's just too much work for a very irregular side project (4 years I'm into poker AIs and not so much achieved :P ).
So I decided to leave the table scrapping / automation / CFR translation to real game / opponent modeling parts and develop study tools for short stack HU / spin&go, to have a reachable goal. Also NNs seem to be everywhere in modern bots and even if the technique highly interests me (and I do understand it on the paper) it requires much additional work.

I already have results that I'll discuss in another thread with my current CS implementation and an abstraction with a customizable bet tree on the preflop, check / push / fold on the imperfect recall bucketted flop.

I was surprised to see that PioSolver offers a very quick estimation of exploitability so I guessed there's some technique for it I'm not aware of, still wondering :P

I'll dig into Pure CFR, I ignored it because it was meant for two players if I remember well and required another walk for three - but that's not a blocker after all. BTW do you use it for more than 2 players ?

Thank you for your advices, much appreciated :)


Top
 Profile  
 
PostPosted: Sun Sep 10, 2017 7:25 pm 
Offline
Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267
I only use it for HU situations. If I were to use any CFR for multiway situations, I would try to somehow split the game into HU situations like BU vs BB, BU vs SB etc. and use simple rules for multiway spots. I think the trees just get way too big for more than 2 players, 3 players would probably be the maximum and even there the strategy would already suffer a lot.

Pio does not use card abstraction, so I assume they use the technique from the Alberta paper. It should be relativelly fast, because NL with betting abstractions starting from the flop is much smaller than the full FL game. Even with compression FL has a strategy of a few terrabytes, while normal pio sims will have like 16GB, so walking that tree should not take too long.
If you use perfect recall buckets or unabstracted cards, I think that is the best way to do it.

And what Pio shows is also not the real game exploitability, just the one in the abstract game.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 6 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group