Poker-AI.org
http://poker-ai.org/phpbb/

Public chance sampling in leduc
http://poker-ai.org/phpbb/viewtopic.php?f=24&t=2474
Page 1 of 1

Author:  longshot [ Sat Apr 27, 2013 9:06 pm ]
Post subject:  Public chance sampling in leduc

I'm working on my PCS implementation and it doesn't look like it's converging to the expected EV values. I'm thinking I must be sampling incorrectly or somehow updating my regret values incorrectly.

Here's my code: https://github.com/tansey/pycfr/blob/master/pokercfr.py (PCS is at the bottom of the file)

I'm using a public tree implementation, and calculating both players' values simultaneously. It seems like you still need to enumerate the actions and holecards the same way, calculate the terminal payoffs the same way, and update the regret the same way (i.e. just adding the CFR). So I just subclass my CounterfactualRegretMinimizer and change how it handles board nodes:

Code:
class PublicChanceSamplingCFR(CounterfactualRegretMinimizer):
    def __init__(self, rules):
        CounterfactualRegretMinimizer.__init__(self, rules)

    def cfr(self):
        # Sample all board cards to be used
        self.board = random.sample(self.rules.deck, sum([x.boardcards for x in self.rules.roundinfo]))
        # Set the top card of the deck
        self.top_card = 0
        # Call the standard CFR algorithm
        self.cfr_helper(self.tree.root, [{(): 1} for _ in range(self.rules.players)])

    def cfr_boardcard_node(self, root, reachprobs):
        # Number of community cards dealt this round
        num_dealt = len(root.children[0].board) - len(root.board)
        # Possible combinations of community cards
        prevlen = len(reachprobs[0].keys()[0])
        possible_deals = float(choose(len(root.deck) - prevlen,root.todeal))
        # Find the child that matches the sampled board card(s)
        for bc in root.children:
            if self.boardmatch(num_dealt, bc):
                # Deal the card(s)
                self.top_card += num_dealt
                # Update the probabilities for each HC to be 1/N for N possible boardcard deals
                next_reachprobs = [{ hc: reachprobs[player][hc] / len(root.children) for hc in bc.holecards[player] } for player in range(self.rules.players)]
                # Perform normal CFR
                results = self.cfr_helper(bc, next_reachprobs)
                # Put the cards back in the deck (so we use the same sampled cards for every trajectory)
                self.top_card -= num_dealt
                # Return the payoffs
                return results
        raise Exception('Sampling from impossible board card')

    def boardmatch(self, num_dealt, node):
        # Checks if this node is a match for the sampled board card(s)
        for next_card in range(self.top_card, self.top_card + num_dealt):
            if self.board[next_card] not in node.board:
                return False
        return True

    def cfr_strategy_update(self, root, reachprobs):
        # Update the strategies and regrets for each infoset
        for hc in root.holecards[root.player]:
            infoset = self.rules.infoset_format(root.player, hc, root.board, root.bet_history)
            # Get the current CFR
            prev_cfr = self.counterfactual_regret[root.player][infoset]
            # Get the total positive CFR
            sumpos_cfr = float(sum([max(0,x) for x in prev_cfr]))
            if sumpos_cfr == 0:
                # Default strategy is equal probability
                probs = self.equal_probs(root)
            else:
                # Use the strategy that's proportional to accumulated positive CFR
                probs = [max(0,x) / sumpos_cfr for x in prev_cfr]
            # Use the updated strategy as our current strategy
            self.current_profile.strategies[root.player].policy[infoset] = probs
            # Update the weighted policy probabilities (used to recover the average strategy)
            for i in range(3):
                self.action_reachprobs[root.player][infoset][i] += reachprobs[root.player][hc] * probs[i]
            if sum(self.action_reachprobs[root.player][infoset]) == 0:
                # Default strategy is equal weight
                self.profile.strategies[root.player].policy[infoset] = self.equal_probs(root)
            else:
                # Recover the weighted average strategy
                self.profile.strategies[root.player].policy[infoset] = [self.action_reachprobs[root.player][infoset][i] / sum(self.action_reachprobs[root.player][infoset]) for i in range(3)]
        # Return and use the current CFR strategy
        return self.current_profile.strategies[root.player]


Does anything stick out as obviously wrong to anyone?

It seems like I'm doing a very similar approach to what I see in the open_cfr code, but after 100k iterations on mine it's still stuck at Exploitability0=-0.04 vs. -0.06 for open_cfr. Given that PCS should have lower variance, I think I must be doing something wrong here. Unfortunately, the C# code on the old forum isn't really helpful since it doesn't cover flop games.

Edit: Updated the code with comments.

Author:  longshot [ Mon Apr 29, 2013 3:20 am ]
Post subject:  Re: Public chance sampling in leduc

I fixed it. I discovered that the sampling approach is more like "sample and calculate everything as if this were the game". There are two things I was not doing correctly given that:

1. You should not weight chance events by their probability of occurring.

2. You should only consider holecards that have not been sampled in the board. This may seem obvious, but in the public tree traversal that means you need to actively filter out holecards that will contain boardcards in the future. I was doing the filtering only after the card was dealt, meaning I was incorrectly propagating 0 regret and ev back up to those holecards.

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/