Poker-AI.org :: View topic - Efficient MCCFR in Games with Many Player Actions (2012)

Poker-AI.org http://poker-ai.org/phpbb/

Efficient MCCFR in Games with Many Player Actions (2012) http://poker-ai.org/phpbb/viewtopic.php?f=25&t=2398	Page 1 of 1

Author:	Joshua [ Fri Mar 15, 2013 12:57 pm ]
Post subject:	Efficient MCCFR in Games with Many Player Actions (2012)
PROPERTIES type = "article" title = "Efficient Monte Carlo Counterfactual Regret Minimization in Games with Many Player Actions" author = "GIBSON, Richard and BURCH, Neil and LANCTOT, Marc and SZAFRON, Duane" year = "2012" organization = "University of Alberta - Department of Computing Science" url = "http://poker.cs.ualberta.ca/publications/NIPS12.pdf" language = "en" size = "9" SUBJECT abstract = "Counterfactual Regret Minimization (CFR) is a popular, iterative algorithm for computing strategies in extensive-form games. The Monte Carlo CFR (MCCFR) variants reduce the per iteration time cost of CFR by traversing a smaller, sampled portion of the tree. The previous most effective instances of MCCFR can still be very slow in games with many player actions since they sample every action for a given player. In this paper, we present a new MCCFR algorithm, Average Strategy Sampling (AS), that samples a subset of the player’s actions according to the player’s average strategy. Our new algorithm is inspired by a new, tighter bound on the number of iterations required by CFR to converge to a given solution quality. In addition, we prove a similar, tighter bound for AS and other popular MCCFR variants. Finally, we validate our work by demonstrating that AS converges faster than previous MCCFR algorithms in both no-limit poker and Bluff."

Author:

Nose [ Sat Jun 08, 2013 1:53 pm ]

Post subject:

Re: Efficient MCCFR in Games with Many Player Actions (2012)

Question about the given implementation (p5).

Line 9: Shouldn't q be multiplied with the frequency of performing action a given infoset I?

(ie.

Code:

s(I,a) <- s(I,a) + (SIGMA(I,a) * q

instead of

Code:

s(I,a) <- s(I,a) + (SIGMA(I,a) / q

)

referring to cumulative profile-formula (attached as picture)

btw @ admins: a (La)TeX-Plugin ([TEX]-BB-Code respectively) would be nice to create this kind of formula

also when looking at line 5 of the algorithm - does anyone know the rationale of dividing the utility by the probability of reaching terminal node z instead of multiplying with it?

say, why

Code:

if(h \in Z) then return u_i(h)/q end if

EDIT: Attached algorithm for your convenience

Attachments:

File comment: MCCFR (AS)

as-mccfr.jpg [ 64.14 KiB | Viewed 28797 times ]

File comment: Cumulative profile

cum_regret.jpg [ 5 KiB | Viewed 28809 times ]

Author:	cantina [ Sun Jun 09, 2013 12:58 am ]
Post subject:	Re: Efficient MCCFR in Games with Many Player Actions (2012)
You can multiply your regret and cumulative updates by Q, or divide just the regret update and the leaf node utility.

Author:	Nose [ Mon Jun 10, 2013 4:50 am ]
Post subject:	Re: Efficient MCCFR in Games with Many Player Actions (2012)
Thanks I have another question: Assume the following scenario: Let Player i be the SB. Right after the hole cards where dealt ASSampling decides the sample two cases: SB checks and SB min-bets. Assume in both cases ASSampling decides to let BB check behind/ call. In each case a chance node (dealing flop cards) follows. According to the specification different community cards are dealt in each scenario whereas the outcome of all branches are summed up in order to obtain the regret for each action. Does anyone know about the impact to the convergence rate if you ensure that each chance node returns the same community cards each iteration? ASSampling will then be a little closer to CS than to pure, random MC

Author:	Isildur11 [ Sun Jun 30, 2013 9:22 pm ]
Post subject:	Re: Efficient MCCFR in Games with Many Player Actions (2012)
"for games with mores than 2 players, a second tree walk is required and we omit these details", it is not clear what that means ?

Author:	cantina [ Sun Jun 30, 2013 9:38 pm ]
Post subject:	Re: Efficient MCCFR in Games with Many Player Actions (2012)
Nose wrote: Does anyone know about the impact to the convergence rate if you ensure that each chance node returns the same community cards each iteration? Unless I'm misunderstanding your question, all action nodes are supposed to use the same community cards (relative to the round) when doing an iteration of CFRM.

Author:	cantina [ Sun Jun 30, 2013 9:48 pm ]
Post subject:	Re: Efficient MCCFR in Games with Many Player Actions (2012)
Isildur11 wrote: "for games with mores than 2 players, a second tree walk is required and we omit these details", it is not clear what that means ? Can you show more of the quote? It may mean they need to train players separately.

Author:	Isildur11 [ Sun Jun 30, 2013 10:02 pm ]
Post subject:	Re: Efficient MCCFR in Games with Many Player Actions (2012)
Quote: At opponent nodes, we also update the cumulative proﬁle (line 9) for reasons that we describe in a previous paper [2, Algorithm 1]. For games with more than two players, a second tree walk is required and we omit these details. and Quote: For future work, we would like to apply AS to games with many player actions and with more than two players weird

Page 1 of 1	All times are UTC
Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/