Poker-AI.org

Re: Restricted Nash Response

2015-03-05T10:35:32+00:00

Hello everyone,

I'm still struggling to get my DBR working and now reduced myself to purely replicating paper results. In the process I noticed that http://mlanctot.info/files/papers/mcrnr.pdf uses outcome sampling which is known to be inferior to ES. Is there any reason behind this choice? Did anyone succeed in running DBR-ES?

Cheers,

Statistics: Posted by DreamInBinary — Thu Mar 05, 2015 10:35 am

Re: Restricted Nash Response

2015-02-07T15:29:41+00:00

Hi guys,

I've been working on RNR/DBR for a while now, but some results don't make complete sense, hence I would like to ask for you advice. The discussion is about HUNL Hold'em.

I have an EQ strategy and a skewed strategy SKEW. Using my DBR I obtained SKEW_BB_DBR by taking the BB strategy of SKEW and optimizing against it with external sampling. The SKEW_BB_DBR beats SKEW by more than EQ does, which makes sense.

I did a similar test by taking the BB strategy of EQ and optimizing against it to get EQ_BB_DBR. My idea was that since EQ is equilibrium the BR will be also eq, so EQ vs EQ_BB_DBR should be break-even. However, I have failed to get this result and weirdly EQ beats EQ_BB_DBR by a small, but significant margin.

Taking all this into account I started suspecting that I might be off because of wrong regret/ average strategy updates. The DBR solving happens along the lines of the following pseudo-code:

Code:

WalkTree( position p, history h ):
    # Handling non-player nodes
    if player( h ) == chance:
        sample action a according to \sigma_{chance}(h)
        return WalkTree( p, h + a )
    elif h == terminal:
        return utility
        
    # Computing the CFR strategy
    strategy s( h ) = regretMatching( h )
    
    # Handle the player which is optimized against
    if player( h ) == player_with_data:
        rho = data_precision( h )
        if U(0,1) < rho:
            s( h ) = data( h )
        
    # "Normal" ES CFR   
    if player( h ) == p:
        average_value = 0;
        for action a in possible_actions( h )
            values( a ) = WalkTree( p, h+a )
            average_value += s( h )( a ) * values( a )
        regret( h )( a ) += values( a ) - average_value
        
    elif player( h ) !=p:
        for action a in possible_actions( h ):
            average_strategy( h )( a ) += s( h )( a )
            
        sample action a according to s( h )
        return WalkTree( p, h+a )

Could you please have a look at it and let me know if you spot anything wrong.

Cheers,

Statistics: Posted by DreamInBinary — Sat Feb 07, 2015 3:29 pm

Re: Restricted Nash Response

2013-08-30T18:52:11+00:00

DBR : http://poker.cs.ualberta.ca/publications/AISTATS09.pdf

Why do you think that it is impossible? Why standard CFM can do it this way, and DBR or RNR not?
I haven't implement it yet, but how i understand it now, the only change is geting action probabilities for information set from opponent model sometimes (with probability p) and from regrets with probability 1-p.
Do any of you achieved good results with opponent model based on <20k hands ? I mean better result than standard eNash ?

Statistics: Posted by Hipp — Fri Aug 30, 2013 6:52 pm

Re: Restricted Nash Response

2013-08-30T17:59:06+00:00

What is DBR?

I implemented RNR a few weeks ago btw. I dont see how you could do it for 2 players simultanously though.

Statistics: Posted by HontoNiBaka — Fri Aug 30, 2013 5:59 pm

Re: Restricted Nash Response

2013-08-29T22:37:50+00:00

Guys, do you think that it is possible to compute RNR or DBR for both players simultaneously ?
I don't mean parallel computation solution , but single recursive function for both players.

Statistics: Posted by Hipp — Thu Aug 29, 2013 10:37 pm

Re: Restricted Nash Response

2013-08-02T18:42:27+00:00

I've used DBR, and something I call DBC (data biased clone).

Statistics: Posted by cantina — Fri Aug 02, 2013 6:42 pm

Restricted Nash Response

2013-08-02T14:09:35+00:00

Anyone implemented the restricted nash response, described in the polaris paper? I get the general idea, but I must do something wrong, because the results are strange.

Statistics: Posted by HontoNiBaka — Fri Aug 02, 2013 2:09 pm