Poker-AI.org • View topic - Public Chance Sampling -- am I missing something?

View unanswered posts | View active topics

Board index » Public Forums » AI Research

All times are UTC

Public Chance Sampling -- am I missing something?

Page 1 of 1

[ 1 post ]

Print view

Previous topic | Next topic

Author

Message

longshot

Post subject: Public Chance Sampling -- am I missing something?

Posted: Fri Apr 12, 2013 12:21 am

Junior Member

Joined: Thu Apr 11, 2013 10:13 pm
Posts: 22

I'm working on a python implementation of most of the published CFR algorithms (I know it won't scale, but it's intended for research on toy problems). I've implemented and tested CFR on both half-street Kuhn and Leduc poker, where I compared the resulting strategies vs. the open_cfr (1B iterations) results and they're sufficiently close (< 0.0001 per probability per infoset) that I'm pretty sure it's right.

Since I used a public tree to implement vanilla CFR, I decided to do public chance sampling next. It seems like it should be so simple: instead of having a global iterations count, you have a visits count for each node. You use that in the counterfactual regret update and then replace the board node enumeration with a simple random child node choice.

The code still works for HS Kuhn, which is a good sanity check since there are no board cards. However, when I run it on Leduc, it gets stuck around exploitability values of (0.35, 0.10) for players 1 and 2.

The MC papers talk about sampling the chance event proportionally to the likelihood, but since we're enumerating all holecards, it seems to me that uniform random is okay. I've tried doing a full Bayesian update of the probability of cards in the deck before sampling and I seem to be getting the same results.

Am I missing something here? Is there some other adjustment I need to make to the algorithm?

Code for my PCS implementation is below:

Code:

class PublicChanceSamplingCFR(CounterfactualRegretMinimizer):
    def __init__(self, rules):
        CounterfactualRegretMinimizer.__init__(self, rules)
        self.init_helper(self.tree.root)

    def init_helper(self, node):
        node.visits = 0
        try:
            for child in node.children:
                self.init_helper(child)
        except AttributeError:
            return

    def cfr_helper(self, root, reachprobs):
        root.visits += 1
        return CounterfactualRegretMinimizer.cfr_helper(self, root, reachprobs)

    def cfr_boardcard_node(self, root, reachprobs):
        bc = random.choice(root.children)
        return self.cfr_helper(bc, reachprobs)

    def cfr_regret_update(self, root, action_payoffs, ev):
        for action,subpayoff in enumerate(action_payoffs):
            if subpayoff is None:
                continue
            for hc,winnings in subpayoff[root.player].iteritems():
                immediate_regret = max(winnings - ev[hc], 0)
                infoset = self.rules.infoset_format(root.player, hc, root.board, root.bet_history)
                prev_regret = self.regret[root.player][infoset][action]
                self.regret[root.player][infoset][action] = 1.0 / (root.visits + 1) * (root.visits * prev_regret + immediate_regret)

Top

Page 1 of 1

[ 1 post ]

Board index » Public Forums » AI Research

All times are UTC

Who is online

Users browsing this forum: No registered users and 2 guests

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum