Poker-AI.org :: View topic - Restricted Nash Response

WalkTree( position p, history h ):
# Handling non-player nodes
if player( h ) == chance:
sample action a according to \sigma_{chance}(h)
return WalkTree( p, h + a )
elif h == terminal:
return utility

# Computing the CFR strategy
strategy s( h ) = regretMatching( h )

# Handle the player which is optimized against
if player( h ) == player_with_data:
rho = data_precision( h )
if U(0,1) < rho:
s( h ) = data( h )

# "Normal" ES CFR
if player( h ) == p:
average_value = 0;
for action a in possible_actions( h )
values( a ) = WalkTree( p, h+a )
average_value += s( h )( a ) * values( a )
regret( h )( a ) += values( a ) - average_value

elif player( h ) !=p:
for action a in possible_actions( h ):
average_strategy( h )( a ) += s( h )( a )

sample action a according to s( h )
return WalkTree( p, h+a )

Author:	HontoNiBaka [ Fri Aug 02, 2013 2:09 pm ]
Post subject:	Restricted Nash Response
Anyone implemented the restricted nash response, described in the polaris paper? I get the general idea, but I must do something wrong, because the results are strange.

Author:	cantina [ Fri Aug 02, 2013 6:42 pm ]
Post subject:	Re: Restricted Nash Response
I've used DBR, and something I call DBC (data biased clone).

Author:	Hipp [ Thu Aug 29, 2013 10:37 pm ]
Post subject:	Re: Restricted Nash Response
Guys, do you think that it is possible to compute RNR or DBR for both players simultaneously ? I don't mean parallel computation solution , but single recursive function for both players.

Author:	HontoNiBaka [ Fri Aug 30, 2013 5:59 pm ]
Post subject:	Re: Restricted Nash Response
What is DBR? I implemented RNR a few weeks ago btw. I dont see how you could do it for 2 players simultanously though.

Author:	Hipp [ Fri Aug 30, 2013 6:52 pm ]
Post subject:	Re: Restricted Nash Response
DBR : http://poker.cs.ualberta.ca/publications/AISTATS09.pdf Why do you think that it is impossible? Why standard CFM can do it this way, and DBR or RNR not? I haven't implement it yet, but how i understand it now, the only change is geting action probabilities for information set from opponent model sometimes (with probability p) and from regrets with probability 1-p. Do any of you achieved good results with opponent model based on <20k hands ? I mean better result than standard eNash ?

Poker-AI.org http://poker-ai.org/phpbb/

Restricted Nash Response http://poker-ai.org/phpbb/viewtopic.php?f=24&t=2546	Page 1 of 1

Author:	DreamInBinary [ Sat Feb 07, 2015 3:29 pm ]
Post subject:	Re: Restricted Nash Response
Hi guys, I've been working on RNR/DBR for a while now, but some results don't make complete sense, hence I would like to ask for you advice. The discussion is about HUNL Hold'em. I have an EQ strategy and a skewed strategy SKEW. Using my DBR I obtained SKEW_BB_DBR by taking the BB strategy of SKEW and optimizing against it with external sampling. The SKEW_BB_DBR beats SKEW by more than EQ does, which makes sense. I did a similar test by taking the BB strategy of EQ and optimizing against it to get EQ_BB_DBR. My idea was that since EQ is equilibrium the BR will be also eq, so EQ vs EQ_BB_DBR should be break-even. However, I have failed to get this result and weirdly EQ beats EQ_BB_DBR by a small, but significant margin. Taking all this into account I started suspecting that I might be off because of wrong regret/ average strategy updates. The DBR solving happens along the lines of the following pseudo-code: Code: WalkTree( position p, history h ): # Handling non-player nodes if player( h ) == chance: sample action a according to \sigma_{chance}(h) return WalkTree( p, h + a ) elif h == terminal: return utility # Computing the CFR strategy strategy s( h ) = regretMatching( h ) # Handle the player which is optimized against if player( h ) == player_with_data: rho = data_precision( h ) if U(0,1) < rho: s( h ) = data( h ) # "Normal" ES CFR if player( h ) == p: average_value = 0; for action a in possible_actions( h ) values( a ) = WalkTree( p, h+a ) average_value += s( h )( a ) * values( a ) regret( h )( a ) += values( a ) - average_value elif player( h ) !=p: for action a in possible_actions( h ): average_strategy( h )( a ) += s( h )( a ) sample action a according to s( h ) return WalkTree( p, h+a ) Could you please have a look at it and let me know if you spot anything wrong. Cheers,

Author:	DreamInBinary [ Thu Mar 05, 2015 10:35 am ]
Post subject:	Re: Restricted Nash Response
Hello everyone, I'm still struggling to get my DBR working and now reduced myself to purely replicating paper results. In the process I noticed that http://mlanctot.info/files/papers/mcrnr.pdf uses outcome sampling which is known to be inferior to ES. Is there any reason behind this choice? Did anyone succeed in running DBR-ES? Cheers,

Page 1 of 1	All times are UTC
Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/