Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 12:29 pm

All times are UTC




Post new topic Reply to topic  [ 7 posts ] 
Author Message
 Post subject: Restricted Nash Response
PostPosted: Fri Aug 02, 2013 2:09 pm 
Offline
Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267
Anyone implemented the restricted nash response, described in the polaris paper? I get the general idea, but I must do something wrong, because the results are strange.


Top
 Profile  
 
PostPosted: Fri Aug 02, 2013 6:42 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
I've used DBR, and something I call DBC (data biased clone).


Top
 Profile  
 
PostPosted: Thu Aug 29, 2013 10:37 pm 
Offline
Junior Member

Joined: Fri Jul 05, 2013 9:57 pm
Posts: 15
Guys, do you think that it is possible to compute RNR or DBR for both players simultaneously ?
I don't mean parallel computation solution , but single recursive function for both players.


Top
 Profile  
 
PostPosted: Fri Aug 30, 2013 5:59 pm 
Offline
Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267
What is DBR?

I implemented RNR a few weeks ago btw. I dont see how you could do it for 2 players simultanously though.


Top
 Profile  
 
PostPosted: Fri Aug 30, 2013 6:52 pm 
Offline
Junior Member

Joined: Fri Jul 05, 2013 9:57 pm
Posts: 15
DBR : http://poker.cs.ualberta.ca/publications/AISTATS09.pdf

Why do you think that it is impossible? Why standard CFM can do it this way, and DBR or RNR not?
I haven't implement it yet, but how i understand it now, the only change is geting action probabilities for information set from opponent model sometimes (with probability p) and from regrets with probability 1-p.
Do any of you achieved good results with opponent model based on <20k hands ? I mean better result than standard eNash ?


Top
 Profile  
 
PostPosted: Sat Feb 07, 2015 3:29 pm 
Offline
Junior Member

Joined: Mon Jan 19, 2015 4:58 pm
Posts: 15
Hi guys,


I've been working on RNR/DBR for a while now, but some results don't make complete sense, hence I would like to ask for you advice. The discussion is about HUNL Hold'em.

I have an EQ strategy and a skewed strategy SKEW. Using my DBR I obtained SKEW_BB_DBR by taking the BB strategy of SKEW and optimizing against it with external sampling. The SKEW_BB_DBR beats SKEW by more than EQ does, which makes sense.

I did a similar test by taking the BB strategy of EQ and optimizing against it to get EQ_BB_DBR. My idea was that since EQ is equilibrium the BR will be also eq, so EQ vs EQ_BB_DBR should be break-even. However, I have failed to get this result and weirdly EQ beats EQ_BB_DBR by a small, but significant margin.

Taking all this into account I started suspecting that I might be off because of wrong regret/ average strategy updates. The DBR solving happens along the lines of the following pseudo-code:

Code:
WalkTree( position p, history h ):
    # Handling non-player nodes
    if player( h ) == chance:
        sample action a according to \sigma_{chance}(h)
        return WalkTree( p, h + a )
    elif h == terminal:
        return utility
       
    # Computing the CFR strategy
    strategy s( h ) = regretMatching( h )
   
    # Handle the player which is optimized against
    if player( h ) == player_with_data:
        rho = data_precision( h )
        if U(0,1) < rho:
            s( h ) = data( h )
       
    # "Normal" ES CFR   
    if player( h ) == p:
        average_value = 0;
        for action a in possible_actions( h )
            values( a ) = WalkTree( p, h+a )
            average_value += s( h )( a ) * values( a )
        regret( h )( a ) += values( a ) - average_value
       
    elif player( h ) !=p:
        for action a in possible_actions( h ):
            average_strategy( h )( a ) += s( h )( a )
           
        sample action a according to s( h )
        return WalkTree( p, h+a )


Could you please have a look at it and let me know if you spot anything wrong.

Cheers,

_________________
Let's drop conventional languages and talk C++ finally.


Top
 Profile  
 
PostPosted: Thu Mar 05, 2015 10:35 am 
Offline
Junior Member

Joined: Mon Jan 19, 2015 4:58 pm
Posts: 15
Hello everyone,

I'm still struggling to get my DBR working and now reduced myself to purely replicating paper results. In the process I noticed that http://mlanctot.info/files/papers/mcrnr.pdf uses outcome sampling which is known to be inferior to ES. Is there any reason behind this choice? Did anyone succeed in running DBR-ES?

Cheers,

_________________
Let's drop conventional languages and talk C++ finally.


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
cron
Powered by phpBB® Forum Software © phpBB Group