Poker-AI.org
http://poker-ai.org/phpbb/

Restricted Nash Response
http://poker-ai.org/phpbb/viewtopic.php?f=24&t=2546
Page 1 of 1

Author:  HontoNiBaka [ Fri Aug 02, 2013 2:09 pm ]
Post subject:  Restricted Nash Response

Anyone implemented the restricted nash response, described in the polaris paper? I get the general idea, but I must do something wrong, because the results are strange.

Author:  cantina [ Fri Aug 02, 2013 6:42 pm ]
Post subject:  Re: Restricted Nash Response

I've used DBR, and something I call DBC (data biased clone).

Author:  Hipp [ Thu Aug 29, 2013 10:37 pm ]
Post subject:  Re: Restricted Nash Response

Guys, do you think that it is possible to compute RNR or DBR for both players simultaneously ?
I don't mean parallel computation solution , but single recursive function for both players.

Author:  HontoNiBaka [ Fri Aug 30, 2013 5:59 pm ]
Post subject:  Re: Restricted Nash Response

What is DBR?

I implemented RNR a few weeks ago btw. I dont see how you could do it for 2 players simultanously though.

Author:  Hipp [ Fri Aug 30, 2013 6:52 pm ]
Post subject:  Re: Restricted Nash Response

DBR : http://poker.cs.ualberta.ca/publications/AISTATS09.pdf

Why do you think that it is impossible? Why standard CFM can do it this way, and DBR or RNR not?
I haven't implement it yet, but how i understand it now, the only change is geting action probabilities for information set from opponent model sometimes (with probability p) and from regrets with probability 1-p.
Do any of you achieved good results with opponent model based on <20k hands ? I mean better result than standard eNash ?

Author:  DreamInBinary [ Sat Feb 07, 2015 3:29 pm ]
Post subject:  Re: Restricted Nash Response

Hi guys,


I've been working on RNR/DBR for a while now, but some results don't make complete sense, hence I would like to ask for you advice. The discussion is about HUNL Hold'em.

I have an EQ strategy and a skewed strategy SKEW. Using my DBR I obtained SKEW_BB_DBR by taking the BB strategy of SKEW and optimizing against it with external sampling. The SKEW_BB_DBR beats SKEW by more than EQ does, which makes sense.

I did a similar test by taking the BB strategy of EQ and optimizing against it to get EQ_BB_DBR. My idea was that since EQ is equilibrium the BR will be also eq, so EQ vs EQ_BB_DBR should be break-even. However, I have failed to get this result and weirdly EQ beats EQ_BB_DBR by a small, but significant margin.

Taking all this into account I started suspecting that I might be off because of wrong regret/ average strategy updates. The DBR solving happens along the lines of the following pseudo-code:

Code:
WalkTree( position p, history h ):
    # Handling non-player nodes
    if player( h ) == chance:
        sample action a according to \sigma_{chance}(h)
        return WalkTree( p, h + a )
    elif h == terminal:
        return utility
       
    # Computing the CFR strategy
    strategy s( h ) = regretMatching( h )
   
    # Handle the player which is optimized against
    if player( h ) == player_with_data:
        rho = data_precision( h )
        if U(0,1) < rho:
            s( h ) = data( h )
       
    # "Normal" ES CFR   
    if player( h ) == p:
        average_value = 0;
        for action a in possible_actions( h )
            values( a ) = WalkTree( p, h+a )
            average_value += s( h )( a ) * values( a )
        regret( h )( a ) += values( a ) - average_value
       
    elif player( h ) !=p:
        for action a in possible_actions( h ):
            average_strategy( h )( a ) += s( h )( a )
           
        sample action a according to s( h )
        return WalkTree( p, h+a )


Could you please have a look at it and let me know if you spot anything wrong.

Cheers,

Author:  DreamInBinary [ Thu Mar 05, 2015 10:35 am ]
Post subject:  Re: Restricted Nash Response

Hello everyone,

I'm still struggling to get my DBR working and now reduced myself to purely replicating paper results. In the process I noticed that http://mlanctot.info/files/papers/mcrnr.pdf uses outcome sampling which is known to be inferior to ES. Is there any reason behind this choice? Did anyone succeed in running DBR-ES?

Cheers,

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/