Poker-AI.org • View topic - Restricted Nash Response

View unanswered posts | View active topics

Board index » Public Forums » AI Research

All times are UTC

Restricted Nash Response

Page 1 of 1

[ 7 posts ]

Print view

Previous topic | Next topic

Author

Message

HontoNiBaka

Post subject: Restricted Nash Response

Posted: Fri Aug 02, 2013 2:09 pm

Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267

Anyone implemented the restricted nash response, described in the polaris paper? I get the general idea, but I must do something wrong, because the results are strange.

Top

cantina

Post subject: Re: Restricted Nash Response

Posted: Fri Aug 02, 2013 6:42 pm

Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437

I've used DBR, and something I call DBC (data biased clone).

Top

Hipp

Post subject: Re: Restricted Nash Response

Posted: Thu Aug 29, 2013 10:37 pm

Junior Member

Joined: Fri Jul 05, 2013 9:57 pm
Posts: 15

Guys, do you think that it is possible to compute RNR or DBR for both players simultaneously ?
I don't mean parallel computation solution , but single recursive function for both players.

Top

HontoNiBaka

Post subject: Re: Restricted Nash Response

Posted: Fri Aug 30, 2013 5:59 pm

Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267

What is DBR?

I implemented RNR a few weeks ago btw. I dont see how you could do it for 2 players simultanously though.

Top

Hipp

Post subject: Re: Restricted Nash Response

Posted: Fri Aug 30, 2013 6:52 pm

Junior Member

Joined: Fri Jul 05, 2013 9:57 pm
Posts: 15

DBR : http://poker.cs.ualberta.ca/publications/AISTATS09.pdf

Why do you think that it is impossible? Why standard CFM can do it this way, and DBR or RNR not?
I haven't implement it yet, but how i understand it now, the only change is geting action probabilities for information set from opponent model sometimes (with probability p) and from regrets with probability 1-p.
Do any of you achieved good results with opponent model based on <20k hands ? I mean better result than standard eNash ?

Top

DreamInBinary

Post subject: Re: Restricted Nash Response

Posted: Sat Feb 07, 2015 3:29 pm

Junior Member

Joined: Mon Jan 19, 2015 4:58 pm
Posts: 15

Hi guys,

I've been working on RNR/DBR for a while now, but some results don't make complete sense, hence I would like to ask for you advice. The discussion is about HUNL Hold'em.

I have an EQ strategy and a skewed strategy SKEW. Using my DBR I obtained SKEW_BB_DBR by taking the BB strategy of SKEW and optimizing against it with external sampling. The SKEW_BB_DBR beats SKEW by more than EQ does, which makes sense.

I did a similar test by taking the BB strategy of EQ and optimizing against it to get EQ_BB_DBR. My idea was that since EQ is equilibrium the BR will be also eq, so EQ vs EQ_BB_DBR should be break-even. However, I have failed to get this result and weirdly EQ beats EQ_BB_DBR by a small, but significant margin.

Taking all this into account I started suspecting that I might be off because of wrong regret/ average strategy updates. The DBR solving happens along the lines of the following pseudo-code:

Code:

WalkTree( position p, history h ):
    # Handling non-player nodes
    if player( h ) == chance:
        sample action a according to \sigma_{chance}(h)
        return WalkTree( p, h + a )
    elif h == terminal:
        return utility
        
    # Computing the CFR strategy
    strategy s( h ) = regretMatching( h )
    
    # Handle the player which is optimized against
    if player( h ) == player_with_data:
        rho = data_precision( h )
        if U(0,1) < rho:
            s( h ) = data( h )
        
    # "Normal" ES CFR   
    if player( h ) == p:
        average_value = 0;
        for action a in possible_actions( h )
            values( a ) = WalkTree( p, h+a )
            average_value += s( h )( a ) * values( a )
        regret( h )( a ) += values( a ) - average_value
        
    elif player( h ) !=p:
        for action a in possible_actions( h ):
            average_strategy( h )( a ) += s( h )( a )
            
        sample action a according to s( h )
        return WalkTree( p, h+a )

Could you please have a look at it and let me know if you spot anything wrong.

Cheers,

_________________
Let's drop conventional languages and talk C++ finally.

Top

DreamInBinary

Post subject: Re: Restricted Nash Response

Posted: Thu Mar 05, 2015 10:35 am

Junior Member

Joined: Mon Jan 19, 2015 4:58 pm
Posts: 15

Hello everyone,

I'm still struggling to get my DBR working and now reduced myself to purely replicating paper results. In the process I noticed that http://mlanctot.info/files/papers/mcrnr.pdf uses outcome sampling which is known to be inferior to ES. Is there any reason behind this choice? Did anyone succeed in running DBR-ES?

Cheers,

_________________
Let's drop conventional languages and talk C++ finally.

Top

Page 1 of 1

[ 7 posts ]

Board index » Public Forums » AI Research

All times are UTC

Who is online

Users browsing this forum: No registered users and 2 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum