Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 4:02 pm

All times are UTC




Post new topic Reply to topic  [ 13 posts ] 
Author Message
PostPosted: Tue Jan 10, 2017 4:34 pm 
Offline
Junior Member

Joined: Tue Dec 13, 2016 4:11 am
Posts: 13
DeepStack: Expert-Level Artificial Intelligence in No-Limit
by Matej Moravc, Martin Schmid, Neil Burch, Viliam Lisy, Dustin Morrill, Nolan Bard, Trevor Davis, Kevin Waugh, Michael Johanson, Michael Bowling

Artificial intelligence has seen a number of breakthroughs in recent years, with games often serving as significant milestones. A common feature of games with these successes is that they involve information symmetry among the players, where all players have identical information. This property of perfect information,though, is far more common in games than in real-world problems. Poker is the quintessential game of imperfect information, and it has been a longstanding challenge problem in artificial intelligence. In this paper we introduce DeepStack, a new algorithm for imperfect information settings such as poker. It combines recursive reasoning to handle information asymmetry, decomposition to focus computation on the relevant decision, and a form of intuition about arbitrary poker situations that is automatically learned from selfplay
games using deep learning. In a study involving dozens of participants and 44,000 hands of poker, DeepStack becomes the first computer program to beat professional poker players in heads-up no-limit Texas hold’em. Furthermore, we show this approach dramatically reduces worst-case exploitability compared to the abstraction paradigm that has been favored for over a decade.

Link: https://arxiv.org/pdf/1701.01724.pdf


Top
 Profile  
 
PostPosted: Thu Jan 12, 2017 6:47 pm 
Offline
Junior Member

Joined: Tue Mar 05, 2013 2:24 pm
Posts: 11
Excited to finally see them going for the much more scalable approach of online solving combined with trained models for look-ahead, instead of sticking to precomputed strategies.

One thing confuses me though, they say that they ignore the opponent's actual action when doing the recalc. Does that mean they ignore the opponent's bet size as well, and then just map it to one of the "2 or 3 bet/raise actions" post-recalc? Why not consider the actual size as an optional path?

Also for their own bets, I don't see any mention of bet sizing, which leads me to believe they used the same ½P, P, 2P, All-in sizings they used for training the networks(?) Again it sounds like they're leaving an unnecessary amount of chips on the table.

Either way, impressive results!


Top
 Profile  
 
PostPosted: Thu Jan 12, 2017 10:51 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
Quote:
One thing confuses me though, they say that they ignore the opponent's actual action when doing the recalc. Does that mean they ignore the opponent's bet size as well, and then just map it to one of the "2 or 3 bet/raise actions" post-recalc? Why not consider the actual size as an optional path?


Good question. As usual there is more than one thing that confuses me in UoA papers. The overall idea of doing a simulation over the next few actions and then using an estimate of expected values to represent the remainder of the game has been used in game playing programs since the beginning of time. Clever to use it in poker.


Top
 Profile  
 
PostPosted: Fri Jan 20, 2017 10:44 am 
Offline
Junior Member

Joined: Mon Aug 08, 2016 9:37 pm
Posts: 13
Report my post from another 3d:

The test was not done in the best way, only top 3 of 30 players went itm so the humans were encouraged to gamble and not to play their real A-game in a cash table. Anyway 45bb is truly amazing.

I would try to reproduce the deep-stack algorithm. The bulk of the cost is to reproduce 10M size training set of random situation solved by cfr for training the network with 3500 hidden units. They've runned 6144 cpu for 11 days. I've estimated that would cost 50k euro. I can at best do 1M samples with 5k of investement so i was thinking to start solving some poker game less deep, like husng were i can reproduce 3-5M samples or start doing some data expansion using multiple examples from the same solved game but it's inappropriate for deep-stack resolving mechanism.

pulser wrote:
Excited to finally see them going for the much more scalable approach of online solving combined with trained models for look-ahead, instead of sticking to precomputed strategies.

One thing confuses me though, they say that they ignore the opponent's actual action when doing the recalc. Does that mean they ignore the opponent's bet size as well, and then just map it to one of the "2 or 3 bet/raise actions" post-recalc? Why not consider the actual size as an optional path?

Also for their own bets, I don't see any mention of bet sizing, which leads me to believe they used the same ½P, P, 2P, All-in sizings they used for training the networks(?) Again it sounds like they're leaving an unnecessary amount of chips on the table.

Either way, impressive results!


The value-function (the neural network precedent trained) return the counterfactual utility approximation of any possible hands for the opponents taking as input only the pot-size and the deep-stack range. So during the simulation that algorithm doesn't consider the precedent action or the size but only the pot-size.
The abstraction is implicit and continuos in the network that produce the value-function but they don't map anything in an explicit way.
In other words the value-function is a method to give a value being in a certain position during the game. The exploatability of deep-stack goes to zero (so his strategy converge to a Nash equilibrium) if the aproximation error of the network goes to zero. This is not possible but judging from the test the error is small enough to can't be exploatable by humans.


Top
 Profile  
 
PostPosted: Sun Jan 22, 2017 12:51 pm 
Offline
Junior Member

Joined: Tue Mar 05, 2013 2:24 pm
Posts: 11
AlephZero wrote:
The value-function (the neural network precedent trained) return the counterfactual utility approximation of any possible hands for the opponents taking as input only the pot-size and the deep-stack range. So during the simulation that algorithm doesn't consider the precedent action or the size but only the pot-size.
The abstraction is implicit and continuos in the network that produce the value-function but they don't map anything in an explicit way
With regards to the look-ahead network computed from the next public state, I totally understand. However, my comment was concerning the live resolve of the current street. I don't see how the opponent's action can be ignored there.

Lets say the opponent bets 2/3 pot on the turn. DeepStack can't possibly ignore that action and just consider the pot, stack size and resulting sub-game after the bet, or can it?

In this case, my understanding was that they solved the entire street (the turn) with CFRM, using the look-ahead network to get regrets or value for the river. With the inputs to the simulation being the opponent's regrets, DeepStack's range, the pot and the stack size. Then after the solve, they select an action based on whatever the opponent actually did.


Top
 Profile  
 
PostPosted: Sun Jan 22, 2017 1:06 pm 
Offline
Junior Member

Joined: Mon Aug 08, 2016 9:37 pm
Posts: 13
pulser wrote:
With regards to the look-ahead network computed from the next public state, I totally understand. However, my comment was concerning the live resolve of the current street. I don't see how the opponent's action can be ignored there.

Lets say the opponent bets 2/3 pot on the turn. DeepStack can't possibly ignore that action and just consider the pot, stack size and resulting sub-game after the bet, or can it?

In this case, my understanding was that they solved the entire street (the turn) with CFRM, using the look-ahead network to get regrets or value for the river. With the inputs to the simulation being the opponent's regrets, DeepStack's range, the pot and the stack size. Then after the solve, they select an action based on whatever the opponent actually did.


I think, but it's just my opinion in this moment and their paper is not clear, that deep stack use the counterfactual value for the pot after-call in case of call action, the pot + raise in case of raise or the actual pot for a fold. In other words a call took in a situation described by pot+call call amount, a raise in pot+raise amount. In your example starting pot is 1, pot considered for a call is 1+2/3+2/3=7/3 and 1+2/3+raise ammount for a raise. I think that the utility is easily reconstructed from the sub-games counterfactual utility of the player less the player contribute to the starting pot. Just a my supposition, i don't completely understand the deep-stack algotithm yet.


Top
 Profile  
 
PostPosted: Sat Jan 28, 2017 10:47 am 
Offline
Junior Member

Joined: Mon Jan 19, 2015 4:58 pm
Posts: 15
pulser wrote:
With regards to the look-ahead network computed from the next public state, I totally understand. However, my comment was concerning the live resolve of the current street. I don't see how the opponent's action can be ignored there.

Lets say the opponent bets 2/3 pot on the turn. DeepStack can't possibly ignore that action and just consider the pot, stack size and resulting sub-game after the bet, or can it?


I do not think they ignore the action, they're just not explicitly using the action. Instead they after the opp has acted they update opp's range according to "Opponent Ranges in Re-Solving" section in Appendix. Then they use that range (& theirs) to solve what is remaining of the street. At least that's the way I see it.

What I am bothered about is that they imply that they implemented the CFR-D on GPU which sounds like quite a daunting task tbh...

_________________
Let's drop conventional languages and talk C++ finally.


Top
 Profile  
 
PostPosted: Tue Mar 21, 2017 3:34 pm 
Offline
New Member

Joined: Tue Mar 21, 2017 3:25 pm
Posts: 1
I guess Artificial Intelligence is out of question nowadays, I suppose almost everybody believes in it and consider it as a miracle!

_________________
http://bigpaperwriter.com/blog/meet-artificial-intelligence-and-what-is-behind-the-curtains


Top
 Profile  
 
PostPosted: Sun Oct 15, 2017 8:32 am 
Offline
New Member

Joined: Mon Oct 09, 2017 10:22 am
Posts: 2
For those who do not want to read the paper:
Here is the website of deepstack
https://www.deepstack.ai
including a nice overview talk by Michael Bowling and some videos of games that deepstack played against humans.


Top
 Profile  
 
PostPosted: Sat Oct 21, 2017 6:24 pm 
Offline
Junior Member

Joined: Mon Aug 08, 2016 9:37 pm
Posts: 13
Here a lua-torch implementation for leduc poker https://github.com/lifrordi/DeepStack-Leduc


Top
 Profile  
 
PostPosted: Sat Mar 17, 2018 5:37 pm 
Offline
Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267
It seems, that they do ignore the opponent action. In the video on the DeepStack site, M. Bowling said, that the counterfactual values of the last resolve are an upper bound to the counterfactual values after the opponent action, so they can be used.

One thing, that I don't udnerstand though is, how do they innitialize the counterfactual values in the root? They said, that they initialize it to the value of being dealt the hand, but what does that mean? I kind of assume that they use counterfactual values, that they computed from a full cfr solution from one of their earlier bots or something, but on the other hands that would be weird.


Top
 Profile  
 
PostPosted: Tue Mar 20, 2018 5:35 pm 
Offline
New Member

Joined: Tue Mar 20, 2018 1:54 pm
Posts: 4
HontoNiBaka wrote:
It seems, that they do ignore the opponent action. In the video on the DeepStack site, M. Bowling said, that the counterfactual values of the last resolve are an upper bound to the counterfactual values after the opponent action, so they can be used.

One thing, that I don't udnerstand though is, how do they innitialize the counterfactual values in the root? They said, that they initialize it to the value of being dealt the hand, but what does that mean? I kind of assume that they use counterfactual values, that they computed from a full cfr solution from one of their earlier bots or something, but on the other hands that would be weird.


These are just values at the root of the game computed with the same algorithm, so for example for Kuhn poker these would be [-1/3, -1/9, 7/18].


Top
 Profile  
 
PostPosted: Sat Mar 24, 2018 4:10 pm 
Offline
Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267
Yea makes sense, thx.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 13 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
cron
Powered by phpBB® Forum Software © phpBB Group