Poker-AI.org Poker AI and Botting Discussion Forum 2015-03-05T10:35:32+00:00 http://poker-ai.org/phpbb/feed.php?f=24&t=2546 2015-03-05T10:35:32+00:00 2015-03-05T10:35:32+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2546&p=6616#p6616 <![CDATA[Re: Restricted Nash Response]]>
I'm still struggling to get my DBR working and now reduced myself to purely replicating paper results. In the process I noticed that http://mlanctot.info/files/papers/mcrnr.pdf uses outcome sampling which is known to be inferior to ES. Is there any reason behind this choice? Did anyone succeed in running DBR-ES?

Cheers,

Statistics: Posted by DreamInBinary — Thu Mar 05, 2015 10:35 am


]]>
2015-02-07T15:29:41+00:00 2015-02-07T15:29:41+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2546&p=6530#p6530 <![CDATA[Re: Restricted Nash Response]]>

I've been working on RNR/DBR for a while now, but some results don't make complete sense, hence I would like to ask for you advice. The discussion is about HUNL Hold'em.

I have an EQ strategy and a skewed strategy SKEW. Using my DBR I obtained SKEW_BB_DBR by taking the BB strategy of SKEW and optimizing against it with external sampling. The SKEW_BB_DBR beats SKEW by more than EQ does, which makes sense.

I did a similar test by taking the BB strategy of EQ and optimizing against it to get EQ_BB_DBR. My idea was that since EQ is equilibrium the BR will be also eq, so EQ vs EQ_BB_DBR should be break-even. However, I have failed to get this result and weirdly EQ beats EQ_BB_DBR by a small, but significant margin.

Taking all this into account I started suspecting that I might be off because of wrong regret/ average strategy updates. The DBR solving happens along the lines of the following pseudo-code:

Code:
WalkTree( position p, history h ):
    # Handling non-player nodes
    if player( h ) == chance:
        sample action a according to \sigma_{chance}(h)
        return WalkTree( p, h + a )
    elif h == terminal:
        return utility
       
    # Computing the CFR strategy
    strategy s( h ) = regretMatching( h )
   
    # Handle the player which is optimized against
    if player( h ) == player_with_data:
        rho = data_precision( h )
        if U(0,1) < rho:
            s( h ) = data( h )
       
    # "Normal" ES CFR   
    if player( h ) == p:
        average_value = 0;
        for action a in possible_actions( h )
            values( a ) = WalkTree( p, h+a )
            average_value += s( h )( a ) * values( a )
        regret( h )( a ) += values( a ) - average_value
       
    elif player( h ) !=p:
        for action a in possible_actions( h ):
            average_strategy( h )( a ) += s( h )( a )
           
        sample action a according to s( h )
        return WalkTree( p, h+a )


Could you please have a look at it and let me know if you spot anything wrong.

Cheers,

Statistics: Posted by DreamInBinary — Sat Feb 07, 2015 3:29 pm


]]>
2013-08-30T18:52:11+00:00 2013-08-30T18:52:11+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2546&p=4761#p4761 <![CDATA[Re: Restricted Nash Response]]> http://poker.cs.ualberta.ca/publications/AISTATS09.pdf

Why do you think that it is impossible? Why standard CFM can do it this way, and DBR or RNR not?
I haven't implement it yet, but how i understand it now, the only change is geting action probabilities for information set from opponent model sometimes (with probability p) and from regrets with probability 1-p.
Do any of you achieved good results with opponent model based on <20k hands ? I mean better result than standard eNash ?

Statistics: Posted by Hipp — Fri Aug 30, 2013 6:52 pm


]]>
2013-08-30T17:59:06+00:00 2013-08-30T17:59:06+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2546&p=4759#p4759 <![CDATA[Re: Restricted Nash Response]]>
I implemented RNR a few weeks ago btw. I dont see how you could do it for 2 players simultanously though.

Statistics: Posted by HontoNiBaka — Fri Aug 30, 2013 5:59 pm


]]>
2013-08-29T22:37:50+00:00 2013-08-29T22:37:50+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2546&p=4744#p4744 <![CDATA[Re: Restricted Nash Response]]> I don't mean parallel computation solution , but single recursive function for both players.

Statistics: Posted by Hipp — Thu Aug 29, 2013 10:37 pm


]]>
2013-08-02T18:42:27+00:00 2013-08-02T18:42:27+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2546&p=4598#p4598 <![CDATA[Re: Restricted Nash Response]]> Statistics: Posted by cantina — Fri Aug 02, 2013 6:42 pm


]]>
2013-08-02T14:09:35+00:00 2013-08-02T14:09:35+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2546&p=4596#p4596 <![CDATA[Restricted Nash Response]]> Statistics: Posted by HontoNiBaka — Fri Aug 02, 2013 2:09 pm


]]>