Poker-AI.org Poker AI and Botting Discussion Forum 2013-09-06T18:37:13+00:00 http://poker-ai.org/phpbb/feed.php?f=24&t=2576 2013-09-06T18:37:13+00:00 2013-09-06T18:37:13+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2576&p=4860#p4860 <![CDATA[Re: opencfr and recusive function]]>
When I pass a table to a fonction I pass the pointer...

That's the point :D

Regards,

Statistics: Posted by MrNice — Fri Sep 06, 2013 6:37 pm


]]>
2013-09-06T18:22:54+00:00 2013-09-06T18:22:54+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2576&p=4858#p4858 <![CDATA[Re: opencfr and recusive function]]>
I have simplified the function to easily understand what's goin on :

Code:
#include <stdio.h>
#include <stdlib.h>

#include <iostream>

void recursive_function(int a[2], int tracker)
{
   std::cout<<"[+] Starting function\n";
   if(tracker == 3)
   {
      return;
   }
   if(tracker == 2)
   {
      a[0] = 5;
      a[1]=6;
   }
   if(tracker<2)
   {
      recursive_function(a, tracker+1);
   }
   std::cout<<"A at "<<tracker<<" : "<<a[0]<<" et "<<a[1]<<"\n";
   std::cout<<"[-] End function\n";

}

int main(int argc, char* argv[])
{
   int a[2];
   
   for(int i=0; i<2; i++)
   {
      a[i]=0;
   }
   recursive_function(a,0);
}



And the results are :
Code:
[+] Starting function
[+] Starting function
[+] Starting function
A at 2 : 5 et 6
[-] End function
A at 1 : 5 et 6
[-] End function
A at 0 : 5 et 6
[-] End function


So if you have an explication... I tough I should get 5 and 6 for the last step and 0 for previous but that's not the case :(

Statistics: Posted by MrNice — Fri Sep 06, 2013 6:22 pm


]]>
2013-09-06T13:32:44+00:00 2013-09-06T13:32:44+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2576&p=4852#p4852 <![CDATA[opencfr and recusive function]]>
in opencfr, there is a portion of the code that I don't undersant: When the algo is a a terminal node and EV has been calculated, how EV is back propagated to the previous node ????

any idea ?

It's seems that the EV are back propagated via the function parameter EV but that's possible... It is ?

Following the upgrade_regret function :
Code:
static void update_regret(leduc::sequence u, int buckets[2][2], int hole[2], int board,
   int result, double reach[2], double chance, double ev[2], double cfr[2],
   regret_strategy strat[2]) {

  if (u.is_terminal()) {
   
    /* sequence is terminal */
    int amount = u.win_amount();
      
    if (u.is_fold()) {
         
      if (u.who_folded() == 0) {
            
            ev[0] = -amount*reach[1]*chance;
            ev[1] = amount*reach[0]*chance;
      } else {
            
            ev[0] = amount*reach[1]*chance;
            ev[1] = -amount*reach[0]*chance;
      }
         
    } else {
         
      /* sequence is a showdown */
      ev[0] = result*reach[1]*amount*chance;
      ev[1] = -result*reach[0]*amount*chance;
    }
      
  } else if (reach[0] < EPSILON && reach[1] < EPSILON) {
   
    /* cutoff, do nothing */
    ev[0] = ev[1] = 0;
      
  } else {
   
    /* some convience variables */
    int player   = u.whose_turn();
    int opponent = leduc::opposite_player(player);
    int round    = u.get_round();
      
      /* player is using regret minimizing strategy */
      //Get probability for the 3 possible actions
      double * average_probability = strat[player].get_average_probability(u, buckets[player][round]);
      double * regret = strat[player].get_regret(u, buckets[player][round]);
   
      /* get the probabilty tuple for each player */
      double probability[3];
      strat[player].get_probability(u, buckets[player][round], probability);   
   
      /* first average the strategy for the player */
      
      
      for(int i=0; i<3; ++i) {
         
         average_probability[i] += reach[player]*probability[i];
      }
     
      /* now compute the regret on each of our actions */
      double expected = 0, sum = 0;
      double old_reach = reach[player];
      double delta_regret[3];
      for(int i=0; i<3; ++i) {
         
         if (u.can_do_action(i)) {
            
            reach[player] = old_reach*probability[i];
            update_regret(u.do_action(i), buckets, hole, board, result, reach, chance, ev, cfr, strat);
       
            delta_regret[i] = ev[player];
            //compute EV of the strategy = current expected value
            expected += ev[player]*probability[i];
            sum      += ev[opponent];
         }
      }
   
      /* restore reachability value */
      reach[player] = old_reach;
     
      /* subtract off expectation to get regret for each action*/
      for(int i=0; i<3; ++i) {
         
         if (u.can_do_action(i)) {
            
            delta_regret[i] -= expected;
            //regret for each action
            regret[i]       += delta_regret[i];
            cfr[player]     += max(0., delta_regret[i]);
         }
      }
   
      /* set return value */
      ev[player]   = expected;
      ev[opponent] = sum;   
   }
}

Statistics: Posted by MrNice — Fri Sep 06, 2013 1:32 pm


]]>