Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 1:25 pm

All times are UTC




Post new topic Reply to topic  [ 1 post ] 
Author Message
PostPosted: Fri Apr 12, 2013 12:21 am 
Offline
Junior Member

Joined: Thu Apr 11, 2013 10:13 pm
Posts: 22
I'm working on a python implementation of most of the published CFR algorithms (I know it won't scale, but it's intended for research on toy problems). I've implemented and tested CFR on both half-street Kuhn and Leduc poker, where I compared the resulting strategies vs. the open_cfr (1B iterations) results and they're sufficiently close (< 0.0001 per probability per infoset) that I'm pretty sure it's right.

Since I used a public tree to implement vanilla CFR, I decided to do public chance sampling next. It seems like it should be so simple: instead of having a global iterations count, you have a visits count for each node. You use that in the counterfactual regret update and then replace the board node enumeration with a simple random child node choice.

The code still works for HS Kuhn, which is a good sanity check since there are no board cards. However, when I run it on Leduc, it gets stuck around exploitability values of (0.35, 0.10) for players 1 and 2.

The MC papers talk about sampling the chance event proportionally to the likelihood, but since we're enumerating all holecards, it seems to me that uniform random is okay. I've tried doing a full Bayesian update of the probability of cards in the deck before sampling and I seem to be getting the same results.

Am I missing something here? Is there some other adjustment I need to make to the algorithm?

Code for my PCS implementation is below:

Code:
class PublicChanceSamplingCFR(CounterfactualRegretMinimizer):
    def __init__(self, rules):
        CounterfactualRegretMinimizer.__init__(self, rules)
        self.init_helper(self.tree.root)

    def init_helper(self, node):
        node.visits = 0
        try:
            for child in node.children:
                self.init_helper(child)
        except AttributeError:
            return

    def cfr_helper(self, root, reachprobs):
        root.visits += 1
        return CounterfactualRegretMinimizer.cfr_helper(self, root, reachprobs)

    def cfr_boardcard_node(self, root, reachprobs):
        bc = random.choice(root.children)
        return self.cfr_helper(bc, reachprobs)

    def cfr_regret_update(self, root, action_payoffs, ev):
        for action,subpayoff in enumerate(action_payoffs):
            if subpayoff is None:
                continue
            for hc,winnings in subpayoff[root.player].iteritems():
                immediate_regret = max(winnings - ev[hc], 0)
                infoset = self.rules.infoset_format(root.player, hc, root.board, root.bet_history)
                prev_regret = self.regret[root.player][infoset][action]
                self.regret[root.player][infoset][action] = 1.0 / (root.visits + 1) * (root.visits * prev_regret + immediate_regret)



Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 1 post ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group