Poker-AI.org http://poker-ai.org/phpbb/ |
|
Fictitious Play http://poker-ai.org/phpbb/viewtopic.php?f=24&t=2509 |
Page 1 of 1 |
Author: | cantina [ Thu May 30, 2013 7:11 pm ] |
Post subject: | Fictitious Play |
I'm curious, has anyone used Fictitious Play to find NE? It seems like the benefits would be less memory requirements, assuming you could do it via sampling (i.e. not needing regrets, and not having to store the entire best-response strategy at each iteration as in the reference below). One question I have is: how do you apply EV smoothing/conversion (i.e. converting negative EVs)? Using the pure best response (as Spears suggests in the reference) doesn't seem to be the right method (for MC sampling). Reference: http://www.poker-ai.org/archive/pokerai ... view=print |
Author: | spears [ Sun Jun 02, 2013 4:16 pm ] |
Post subject: | Re: Fictitious Play |
As nobody else is answering I'll give it a try. IIRC Dudziak, Fellows, Marv, Mcmahan & Gordon all used FP. I used FP successfully on toy problems like Kuhn or 1 card poker. The Best Response player used sampling and chose the best response action. The Equilibrium player held a probability distribution of all possible cards, which was updated on each turn according to the current EQ strategy. Clearly this method doesn't scale to full size poker, because you don't know the distribution of buckets in advance. So here is an idea: Sample cards for both BR and EQ players, but instead of adding 1 to the number of visits for each best response action, add the EQ player reach probability. This is very like the difference between regret and counterfactual regret in CFRM I never tried the above idea. But I did a few experiments with techniques that used values other than 1 for the number of visits and they showed a lot of promise. I've seen smoothing in the literature, but I think it is a refinement. If you can't get FP to work without it, I think you're doing something wrong. Also the minimum regret strategy you calculate in CFRM is usually a pure strategy. This method should reduce the memory requirement a bit from CFRM, but I think runtime might increase, because on each iteration you are only improving the BR player, not both. |
Author: | cantina [ Mon Jun 17, 2013 8:26 am ] |
Post subject: | Re: Fictitious Play |
Thanks for the response, Spears. I took a break from this to try some other ideas. What do you mean by: "sample cards from BR and EQ players"? Are you saying when calculating EV update the BR node with just Zed (probability of reaching leaf), not Zed * Utility? I'm doing sampling like I would with External Sampling CFRM, would what you suggest work with that as well? |
Author: | spears [ Mon Jun 17, 2013 3:53 pm ] |
Post subject: | Re: Fictitious Play |
Nasher wrote: What do you mean by: "sample cards from BR and EQ players"? I mean sample cards from both BR and EQ players - in contrast with my experiments to date that just sampled from the BR player and used a distribution over all possible cards for the EQ player. Nasher wrote: Are you saying when calculating EV update the BR node with just Zed (probability of reaching leaf), not Zed * Utility? How do we calculate the EV at a node? The ev at a node is the sum of evs of the actions following that node * probability of the action. BR players ev = - EQ players ev. Once we know the BR what do we do? If EQ player had a distribution over all possible cards like my experiments did, you just add 1 to the number of times that BR action has been taken. When BR player and EQ player change places you calculate the probability of EQ player taking that action by dividing the number of times that action has been taken by the total number of times all the actions have been taken. But if EQ player just has a sampled pair, not a distribution, I'm guessing you add the probability of him getting to that node to the visit count, not 1. This is analagous to using the counterfactual regret rather than the regret in CFRM. - Adding up the utilities instead of the BR counts won't work because you can't get a probability from that. - If you hack amax's original cfrm code and post it I will look at it. I will not work with VB. - Yanks don't say Zed, you can't be a Brit or Irish or you would know Gnasher was a dog, so you must be antipodean - right? Nasher wrote: I'm doing sampling like I would with External Sampling CFRM, would what you suggest work with that as well? Dunno. Suggest you get a basic variant working first - analogous to MBJ's original CFRM paper where hands and boards are dealt out and strength buckets calculated.
|
Author: | cantina [ Thu Jun 20, 2013 8:32 pm ] |
Post subject: | Re: Fictitious Play |
Wouldn't the action with the highest utility at any given node be the BR against the opponent? Your conclusion based on 'Zed' is that I'm an Ausie? Crikey, I'm just using the terminology the U of A guys used to describe the product of probabilities to a leaf node, phonetically pronounced. |
Author: | spears [ Fri Jun 21, 2013 7:20 am ] |
Post subject: | Re: Fictitious Play |
Nasher wrote: Wouldn't the action with the highest utility at any given node be the BR against the opponent? Of course. Nasher wrote: Your conclusion based on 'Zed' is that I'm an Ausie? Crikey, I'm just using the terminology the U of A guys used to describe the product of probabilities to a leaf node, phonetically pronounced. Blimey, struth, sorry mate. |
Author: | spears [ Fri Jun 21, 2013 12:14 pm ] |
Post subject: | Re: Fictitious Play |
This is some code from my experiments for Kuhn/One Card Poker. It works. Code: public class BestResponse {
static Random random = new Random(); /* best response ev ------------------------------------------------------------------------- */ static double bestResponseEv(Action[] root, double[][] rootProbabilities) { double result = 0; int noCards = rootProbabilities.length; for (int brCard = 0; brCard < noCards; brCard++) { double[] eqProbabilities = rootProbabilities[brCard]; double ev = bestResponseEv(root, brCard, eqProbabilities, noCards); result += ev; } return result; } static double bestResponseEv(Action[] node, int brCard, double[] eqProbabilities, int noCards) { double maxEv = -Double.MAX_VALUE; Action maxAction = null; for (Action action : node) { double ev = bestResponseEv(action, brCard, eqProbabilities, noCards); if(ev > maxEv) { maxEv = ev; maxAction = action; } } maxAction.timesTaken[brCard]++; return maxEv; } private static double bestResponseEv(Action action, int brCard, double[] eqProbabilities, int noCards) { switch (action.type) { case NONTERMINAL: return -equilibriumEv(action.endNode, brCard, eqProbabilities, noCards); case FOLD: return foldEv(action, eqProbabilities, noCards); case SHOWDOWN: return showdownEv(action, brCard, eqProbabilities, noCards); } return unreachable(); } /* equilibrium ev --------------------------------------------------------------------------- */ static double equilibriumEv(Action[] root, double[][] rootProbabilities) { double result = 0; int noCards = rootProbabilities.length; for (int brCard = 0; brCard < noCards; brCard++) { double[] eqProbabilities = rootProbabilities[brCard]; double ev = equilibriumEv(root, brCard, eqProbabilities, noCards); result += ev; } return result; } static double equilibriumEv(Action[] node, int brCard, double[] eqProbabilities, int noCards) { double result = 0; double[][] equilibriumStrategies = equilibriumStrategies(node, noCards); for (int a = 0; a < node.length; a++) { Action action = node[a]; double[] strategy = equilibriumStrategies[a]; double[] actionProbabilities = Matrix.product(strategy, eqProbabilities); double endEv = equilibriumEv(action, brCard, actionProbabilities, noCards); result += endEv; } return result; } private static double equilibriumEv(Action action, int brCard, double[] eqProbabilities, int noCards) { double ev = 0; switch (action.type) { case NONTERMINAL: ev = -bestResponseEv(action.endNode, brCard, eqProbabilities, noCards); return ev; case FOLD: ev = foldEv(action, eqProbabilities, noCards); return ev; case SHOWDOWN: ev = -showdownEv(action, brCard, eqProbabilities, noCards); return ev; } return unreachable(); } /* ev --------------------------------------------------------------------------------------- */ private static double showdownEv(Action action, int brCard, double[] eqProbabilities, int noCards) { double result = 0; for (int v = 0; v < noCards; v++) { result += Math.signum(brCard - v) * action.payoff * eqProbabilities[v]; } return result; } private static double foldEv(Action action, double[] eqProbabilities, int noCards) { double result = 0; for (int i = 0; i < noCards; i++) { result += eqProbabilities[i] * action.payoff; } return result; } /* ------------------------------------------------------------------------------------------ */ private static double[][] equilibriumStrategies(Action[] node, int noCards) { int noActions = node.length; double[][] result = new double[noActions][noCards]; for (int c = 0; c < noCards; c++) { long sum = 0; for (int a = 0; a < noActions; a++) { sum += node[a].timesTaken[c]; } for (int a = 0; a < noActions; a++) { result[a][c] = (double)node[a].timesTaken[c] / (double)sum; } } return result; } private static double unreachable() { throw new RuntimeException("Unreachable"); } } |
Page 1 of 1 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/ |