Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 2:22 pm

All times are UTC




Post new topic Reply to topic  [ 7 posts ] 
Author Message
 Post subject: Fictitious Play
PostPosted: Thu May 30, 2013 7:11 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
I'm curious, has anyone used Fictitious Play to find NE? It seems like the benefits would be less memory requirements, assuming you could do it via sampling (i.e. not needing regrets, and not having to store the entire best-response strategy at each iteration as in the reference below). One question I have is: how do you apply EV smoothing/conversion (i.e. converting negative EVs)? Using the pure best response (as Spears suggests in the reference) doesn't seem to be the right method (for MC sampling).

Reference:
http://www.poker-ai.org/archive/pokerai ... view=print


Top
 Profile  
 
 Post subject: Re: Fictitious Play
PostPosted: Sun Jun 02, 2013 4:16 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
As nobody else is answering I'll give it a try.

IIRC Dudziak, Fellows, Marv, Mcmahan & Gordon all used FP.

I used FP successfully on toy problems like Kuhn or 1 card poker. The Best Response player used sampling and chose the best response action. The Equilibrium player held a probability distribution of all possible cards, which was updated on each turn according to the current EQ strategy. Clearly this method doesn't scale to full size poker, because you don't know the distribution of buckets in advance. So here is an idea: Sample cards for both BR and EQ players, but instead of adding 1 to the number of visits for each best response action, add the EQ player reach probability. This is very like the difference between regret and counterfactual regret in CFRM

I never tried the above idea. But I did a few experiments with techniques that used values other than 1 for the number of visits and they showed a lot of promise. I've seen smoothing in the literature, but I think it is a refinement. If you can't get FP to work without it, I think you're doing something wrong. Also the minimum regret strategy you calculate in CFRM is usually a pure strategy.

This method should reduce the memory requirement a bit from CFRM, but I think runtime might increase, because on each iteration you are only improving the BR player, not both.


Top
 Profile  
 
 Post subject: Re: Fictitious Play
PostPosted: Mon Jun 17, 2013 8:26 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Thanks for the response, Spears. I took a break from this to try some other ideas. What do you mean by: "sample cards from BR and EQ players"? Are you saying when calculating EV update the BR node with just Zed (probability of reaching leaf), not Zed * Utility? I'm doing sampling like I would with External Sampling CFRM, would what you suggest work with that as well?


Top
 Profile  
 
 Post subject: Re: Fictitious Play
PostPosted: Mon Jun 17, 2013 3:53 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
Nasher wrote:
What do you mean by: "sample cards from BR and EQ players"?

I mean sample cards from both BR and EQ players - in contrast with my experiments to date that just sampled from the BR player and used a distribution over all possible cards for the EQ player.

Nasher wrote:
Are you saying when calculating EV update the BR node with just Zed (probability of reaching leaf), not Zed * Utility?

How do we calculate the EV at a node?
The ev at a node is the sum of evs of the actions following that node * probability of the action. BR players ev = - EQ players ev.

Once we know the BR what do we do?
If EQ player had a distribution over all possible cards like my experiments did, you just add 1 to the number of times that BR action has been taken. When BR player and EQ player change places you calculate the probability of EQ player taking that action by dividing the number of times that action has been taken by the total number of times all the actions have been taken.

But if EQ player just has a sampled pair, not a distribution, I'm guessing you add the probability of him getting to that node to the visit count, not 1. This is analagous to using the counterfactual regret rather than the regret in CFRM.

- Adding up the utilities instead of the BR counts won't work because you can't get a probability from that.
- If you hack amax's original cfrm code and post it I will look at it. I will not work with VB.
- Yanks don't say Zed, you can't be a Brit or Irish or you would know Gnasher was a dog, so you must be antipodean - right?

Nasher wrote:
I'm doing sampling like I would with External Sampling CFRM, would what you suggest work with that as well?
Dunno. Suggest you get a basic variant working first - analogous to MBJ's original CFRM paper where hands and boards are dealt out and strength buckets calculated.


Top
 Profile  
 
 Post subject: Re: Fictitious Play
PostPosted: Thu Jun 20, 2013 8:32 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Wouldn't the action with the highest utility at any given node be the BR against the opponent?

Your conclusion based on 'Zed' is that I'm an Ausie? :) Crikey, I'm just using the terminology the U of A guys used to describe the product of probabilities to a leaf node, phonetically pronounced. ;)


Top
 Profile  
 
 Post subject: Re: Fictitious Play
PostPosted: Fri Jun 21, 2013 7:20 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
Nasher wrote:
Wouldn't the action with the highest utility at any given node be the BR against the opponent?

Of course.

Nasher wrote:
Your conclusion based on 'Zed' is that I'm an Ausie? :) Crikey, I'm just using the terminology the U of A guys used to describe the product of probabilities to a leaf node, phonetically pronounced. ;)

Blimey, struth, sorry mate.


Top
 Profile  
 
 Post subject: Re: Fictitious Play
PostPosted: Fri Jun 21, 2013 12:14 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
This is some code from my experiments for Kuhn/One Card Poker. It works.

Code:
public class BestResponse {
   static Random random = new Random();

   
   /* best response ev ------------------------------------------------------------------------- */
   static double bestResponseEv(Action[] root,  double[][] rootProbabilities) {
      double result = 0;
      int noCards = rootProbabilities.length;
      for (int brCard = 0; brCard < noCards; brCard++) {
         double[] eqProbabilities = rootProbabilities[brCard];
         double ev = bestResponseEv(root, brCard, eqProbabilities, noCards);
         result += ev;
      }
      return result;
   }
   
   static double bestResponseEv(Action[] node, int brCard, double[] eqProbabilities, int noCards) {
      double maxEv = -Double.MAX_VALUE;
      Action maxAction = null;
      
      for (Action action : node) {
         double ev = bestResponseEv(action, brCard, eqProbabilities, noCards);
   
         if(ev > maxEv) {
            maxEv = ev;
            maxAction = action;
         }
      }
      
      maxAction.timesTaken[brCard]++;
      
      return maxEv;
   }

   private static double bestResponseEv(Action action, int brCard, double[] eqProbabilities, int noCards) {
      switch (action.type) {
         case NONTERMINAL:
            return -equilibriumEv(action.endNode, brCard, eqProbabilities, noCards);
      
         case FOLD:
            return foldEv(action, eqProbabilities, noCards);
         
         case SHOWDOWN:
            return showdownEv(action, brCard, eqProbabilities, noCards);
      }
      return unreachable();
   }
   
   /* equilibrium ev --------------------------------------------------------------------------- */

   static double equilibriumEv(Action[] root, double[][] rootProbabilities) {
      double result = 0;
      int noCards = rootProbabilities.length;
      for (int brCard = 0; brCard < noCards; brCard++) {
         double[] eqProbabilities = rootProbabilities[brCard];
         double ev = equilibriumEv(root, brCard, eqProbabilities, noCards);
         result += ev;
      }
      return result;
   }
   
   static double equilibriumEv(Action[] node, int brCard, double[] eqProbabilities, int noCards) {
      double result = 0;
      double[][] equilibriumStrategies = equilibriumStrategies(node, noCards);
      for (int a = 0; a < node.length; a++) {
         Action action = node[a];
         double[] strategy = equilibriumStrategies[a];
         double[] actionProbabilities = Matrix.product(strategy, eqProbabilities);
         double endEv = equilibriumEv(action, brCard, actionProbabilities, noCards);
         result += endEv;
      }
      return result;
   }


   private static double equilibriumEv(Action action, int brCard, double[] eqProbabilities, int noCards) {
      double ev = 0;
      switch (action.type) {
         case NONTERMINAL:
            ev = -bestResponseEv(action.endNode, brCard, eqProbabilities, noCards);
            return ev;
         
         case FOLD:
            ev = foldEv(action, eqProbabilities, noCards);
            return ev;
      
         case SHOWDOWN:
            ev = -showdownEv(action, brCard, eqProbabilities, noCards);
            return ev;

      }
      return unreachable();
   }


   /* ev --------------------------------------------------------------------------------------- */
   
   private static double showdownEv(Action action, int brCard, double[] eqProbabilities, int noCards) {
      double result = 0;
      for (int v = 0; v < noCards; v++) {
         result += Math.signum(brCard - v) * action.payoff * eqProbabilities[v];
      }
      return result;
   }
   
   private static double foldEv(Action action, double[] eqProbabilities, int noCards) {
      double result = 0;
      for (int i = 0; i < noCards; i++) {
         result += eqProbabilities[i] * action.payoff;
      }
      return result;
   }
   
   
   /* ------------------------------------------------------------------------------------------ */
   

   
   private static double[][] equilibriumStrategies(Action[] node, int noCards) {
      int noActions = node.length;
      double[][] result = new double[noActions][noCards];
      for (int c = 0; c < noCards; c++) {
         long sum = 0;
         for (int a = 0; a < noActions; a++) {
            sum += node[a].timesTaken[c];
         }
         for (int a = 0; a < noActions; a++) {
            result[a][c] = (double)node[a].timesTaken[c] / (double)sum;
         }
         
      }
      return result;
   }
      
   private static double unreachable() {
      throw new RuntimeException("Unreachable");
   }
}





Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 7 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group