Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Wed Apr 24, 2019 3:56 am

All times are UTC




Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next
Author Message
 Post subject: Public chance sampling
PostPosted: Mon May 06, 2013 1:04 am 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
I was thinking to implement PCS (http://webdocs.cs.ualberta.ca/~johanson/publications/poker/2012-aamas-pcs/2012-aamas-pcs.pdf). As far as I understand, the variables my_i and my_-i are vectors of size (48 choose 2) where each entry represents one of the possible hole cards we can have given the board and indicates how often we have this hand in our range given the history. The strategy sigma is similarly a matrix with (48 choose 2) lines and one column for each action. Is this so far correct?
No what I don't get is the following: how do we - given we have the two reachability vectors and the hand strength of each hand - perform the efficient terminal node evaluation?


Top
 Profile  
 
PostPosted: Tue May 07, 2013 6:36 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 546
I haven't checked but there might be something in here http://poker-ai.org/archive/www.pokerai ... 22&p=42973


Top
 Profile  
 
PostPosted: Tue May 07, 2013 11:17 am 
Offline
New Member

Joined: Sat Mar 09, 2013 4:56 pm
Posts: 5
RiverSolver2.zip has the fast evaluators.


Top
 Profile  
 
PostPosted: Thu May 09, 2013 9:49 pm 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
Thanks spears and amax. I looked into the code and I think I figured it out how its calculated. However, if my thoughts are right there is no handling of ties in the code, which makes the EV incorrect. For instance, assume we have a 100% range and we have a hand which never wins, but ties in 50% (and loses the remaining 50%). If we just look at wins and loses, we'd assume that the EV is 0, but we have actually a positive EV due to the ties.
It looks however that the implementation is for holdem, so I wonder if this case is handled differently, if the EV is not accurate or if I just didnt understand the algorithm completely?


Top
 Profile  
 
PostPosted: Fri May 10, 2013 6:17 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 546
I haven't read amax's code but this make sense to me:

http://webdocs.cs.ualberta.ca/~bowling/papers/11ijcai-rgbr.pdf wrote:
Example 3. Suppose we can sort each players’ information
sets by “rank”, and the utility only depends upon the
relative ordering of the players’ ranks. This is exactly the
situation that occurs in poker. For the moment, let us assume
the distribution of the players’ ranks are independent.
In this case, evaluating each of our information sets requires
only O(n) work. We know that our weakest information set
will be weaker than some of the opponent’s hands, equal to
some, and better than some. We keep indices into the opponent’s
ordered list of information sets to mark where these
changes occur. To evaluate our information set, we only need
to know the total probability of the opponent’s information
sets in these three sections. After we evaluate one of our
information sets and move to a stronger one, we just adjust
these two indices up one step in rank.


Top
 Profile  
 
PostPosted: Fri May 10, 2013 7:22 am 
Offline
Senior Member

Joined: Mon Mar 11, 2013 10:24 pm
Posts: 216
yes, that was my understanding too: you need to know which combos win, lose and tie. In the code however, only wins and loses are handled (including card removal):

Code:

      public override double[] TrainPublicChanceSampling(int trainplayer, PublicIteration iteration, double[] p, double[] op)
      {
         var pholes = iteration.GetHoles(trainplayer);
         var oholes = iteration.GetHoles(trainplayer ^ 1);

         var ev = new double[p.Length];

         var wincr = new double[52];

         double winsum = 0;
         int j = 0;

         for (int i = 0; i < p.Length; i++)
         {
            while (oholes[j].Rank < pholes[i].Rank)
            {
               winsum += op[j];
               wincr[oholes[j].Card1] += op[j];
               wincr[oholes[j].Card2] += op[j];
               j++;
            }

            ev[i] = (winsum - wincr[pholes[i].Card1] - wincr[pholes[i].Card2]) * value;
         }

         var losecr = new double[52];
         double losesum = 0;
         j = op.Length - 1;

         for (int i = p.Length - 1; i >= 0; i--)
         {
            while (oholes[j].Rank > pholes[i].Rank)
            {
               losesum += op[j];
               losecr[oholes[j].Card1] += op[j];
               losecr[oholes[j].Card2] += op[j];
               j--;
            }

            ev[i] -= (losesum - losecr[pholes[i].Card1] - losecr[pholes[i].Card2]) * value;
         }

         return ev;
      }


Top
 Profile  
 
PostPosted: Fri May 10, 2013 8:16 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 546
I've not put much effort into this, but observe that if negative ev indicates a loss and positive ev indicates a win then that code would be right.


Top
 Profile  
 
PostPosted: Fri May 10, 2013 3:24 pm 
Offline
New Member

Joined: Sat Mar 09, 2013 4:56 pm
Posts: 5
There's code there that verifies it's correct. (ev of tie = 0)


Top
 Profile  
 
PostPosted: Sat Apr 15, 2017 10:52 am 
Offline
Junior Member

Joined: Mon Jan 19, 2015 4:58 pm
Posts: 13
I have been interested in vectorising CFR for a while now and finally got PCS implemented. However, I have troubles on implementing an efficient terminal node evaluator in vector form.

Currently I have just the naive form:
terminal_ev = ( non_conflict_matrix * payoffs ) dot reach_opp

Is vector form for the efficient version even possible?

_________________
Let's drop conventional languages and talk C++ finally.


Top
 Profile  
 
PostPosted: Tue Apr 18, 2017 8:12 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 546
Not sure if this answers your question:
- Ahead of time I find the vector of integers that sorts ranks into ascending order.
- At runtime I use this sort vector to rearrange opponents probabilities.
- Then, in Scala

Code:
  val ranks = Array(1,2,3,4,5)
  val oppReach = Array(0.1,0.2,0.3,0.1,0.2)
  val payoff = 1.3
 
  val strengths = Array.ofDim[Double](ranks.length)
  strengths(0) = oppReach(0) - oppReach.sum
  for(i <- 1 until ranks.length) strengths(i) = strengths(i - 1) + oppReach(i - 1) + oppReach(i)
 
  val evs = strengths.map { s => s * payoff }


Last edited by spears on Tue Apr 18, 2017 4:25 pm, edited 1 time in total.
To fix sign errors in code


Top
 Profile  
 
PostPosted: Tue Apr 18, 2017 10:22 am 
Offline
Junior Member

Joined: Mon Jan 19, 2015 4:58 pm
Posts: 13
Thanks spears that does shed some light on the matter. Also your proposition I think could be vectorised further by replacing the for loop with something along the lines:
Code:
strengths = oppreach[0:-1].cumsum() - oppreach[1:]

assuming Scala allows for element-wise substraction and Python-like subindexing of arrays.

Correct me if I'm wrong, but your code seems to assume:
1. oppReach is indexed by ranks ie. own and opponent rank vectors are the same
2. Ranks do not repeat. Maybe oppReach summed over the same ranks?
3. The code does not allow for dependent ranks. Eg. suppose if own rank is 5 then the opp can't have 5 and 3

The 1,2 I see how to handle, but I am puzzled by the third one. I wonder is something like that would work:
Code:
  for(i <- 1 until ranks.length) strengths(i) = strengths(i - 1) + oppReach(i - 1) - oppReach(i) - sum_{i conflicts with j} oppReach(j)

_________________
Let's drop conventional languages and talk C++ finally.


Top
 Profile  
 
PostPosted: Tue Apr 18, 2017 12:34 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 546
DreamInBinary wrote:
... I think could be vectorised further by replacing the for loop with something along the lines:
Code:
strengths = oppreach[0:-1].cumsum() - oppreach[1:]

assuming Scala allows for element-wise substraction and Python-like subindexing of arrays.

Sorry, can't figure out Python fast enough to comment.

DreamInBinary wrote:

Correct me if I'm wrong, but your code seems to assume:
1. oppReach is indexed by ranks ie. own and opponent rank vectors are the same
2. Ranks do not repeat. Maybe oppReach summed over the same ranks?
3. The code does not allow for dependent ranks. Eg. suppose if own rank is 5 then the opp can't have 5 and 3

The 1,2 I see how to handle, but I am puzzled by the third one. I wonder is something like that would work:
Code:
  for(i <- 1 until ranks.length) strengths(i) = strengths(i - 1) + oppReach(i - 1) - oppReach(i) - sum_{i conflicts with j} oppReach(j)


You've got the idea. Ranks repeat a lot and potentially this allows us to shorten the loop though I didn't do that myself. I deal with conflicts approximately and only when oppReach is very uneven.


Top
 Profile  
 
PostPosted: Wed May 17, 2017 6:23 pm 
Offline
Junior Member

Joined: Mon Jan 19, 2015 4:58 pm
Posts: 13
A month has passed and I can conclude that its very tricky to make it parallel and efficient at the same time. However, I have dropped the parallelism and am using it as a serial evaluator. Plenty of things to parallelise on anyway.

Has anyone tried extending this to more than 2 players?

_________________
Let's drop conventional languages and talk C++ finally.


Top
 Profile  
 
PostPosted: Sun Apr 14, 2019 7:39 pm 
Offline
Junior Member

Joined: Tue Jun 28, 2016 7:12 pm
Posts: 11
Hi guys. Tried to find a topic for my question and looks like this is correct one))

So my question is: how to correctly use non zero starting pot when calculating CFR (i am using CS variant) from flop?

Lets take a look on a push/checkdown tree from flop, where first player can check or push, and second can check or push on 1st players check. If they both checked - they are checking turn and river till showdown.

So if both players have stacks 100 at the start of flop and zero pot (yeah, i know this is impossible, but for example) the utilities in terminal nodes are 100 or 0. 100 when there was All In and Call. 0 when they both checked, or one went All In and other folded.
And this type of tree after lot of iterations ends up with around zero best responses for both players. This looks ok.

But what if starting pot equals 40 for example? How i need to change utilities and what best response i must achieve after training?

If 1st player went All In and second folds - what is utility here? 40 for 1st and 0 for 2nd? But this is a zero sum game, so sum of utilities in each terminal node must be zero. Or not?
Also i thought that best response here (after training) is 20 for both, because with simmetrical tree and non zero starting bank - both players must print money here in long distance and logic (and commercial solvers) says that there EV (and best response) is a half of starting bank. But my experiments ends with smth like BR1=31 and BR2 = 29. This maybe a result of incorrect utilities - thats why i'm looking for help.


Top
 Profile  
 
PostPosted: Mon Apr 15, 2019 10:07 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 546
Quote:
So my question is: how to correctly use non zero starting pot when calculating CFR (i am using CS variant) from flop?

Quote:
If 1st player went All In and second folds - what is utility here? 40 for 1st and 0 for 2nd?


You just need to figure out a convention for how the pot grows and how winnings are allocated. So here is one suggestion, from many possibilities

- Assume starting stacks are both 100
- Increment pot by 2 * amount to call when call or raise occurs.
- At start of flop pot is 2*40
- Player 1 goes all in, ie bets 60. Player 2 calls. Pot is now 200. Player 2 has best hand, gets half of pot, ie 100. Terminal utility is +100 and goes to last player to act, player 2
- or
- Player 1 goes all in, ie bets 60. Player 2 folds. Pot is still 80. Player 2 loses half of pot ie -40. Terminal utility is -40 and goes to last player to act, player 2


Top
 Profile  
 
PostPosted: Thu Apr 18, 2019 3:12 pm 
Offline
Junior Member

Joined: Tue Jun 28, 2016 7:12 pm
Posts: 11
Yeah, thanks!
I finally solved this problem and my CFR algo now provides close to famous solvers results.


Top
 Profile  
 
PostPosted: Sat Apr 20, 2019 8:51 am 
Offline
Junior Member

Joined: Tue Jun 28, 2016 7:12 pm
Posts: 11
Got a new question :D

Lets assume that we have HU game, we have a big (for example FULL) tree of this game and we calculated GTO for it. Perfect situation. So we have a strategy for every situation in this tree.
But in this tree in some opponent node there are few actions, and one of them (like bet with sizing x100500, or fold with nuts only) has 0% probability for opponent to choose during GTO play.
And what we need to do if our opponent during real game choosed this 0% prob action?
In our tree, if we used CS CFR algo we visited this node maybe one time, got very negative regret for opponent, set probability to zero and then didn't visit this node again, so we have no strategy for this node and all nodes after it.
But logic says that if opponent choosed so unliked action - we must exploit him so much in this situation if we play GTO (and we have this FULL tree with calculated GTO).
But how we can do this?


Top
 Profile  
 
PostPosted: Sun Apr 21, 2019 8:47 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 546
FlashPlayer wrote:
Got a new question :D

Lets assume that we have HU game, we have a big (for example FULL) tree of this game and we calculated GTO for it. Perfect situation. So we have a strategy for every situation in this tree.
But in this tree in some opponent node there are few actions, and one of them (like bet with sizing x100500, or fold with nuts only) has 0% probability for opponent to choose during GTO play.
And what we need to do if our opponent during real game choosed this 0% prob action?
In our tree, if we used CS CFR algo we visited this node maybe one time, got very negative regret for opponent, set probability to zero and then didn't visit this node again, so we have no strategy for this node and all nodes after it.
But logic says that if opponent choosed so unliked action - we must exploit him so much in this situation if we play GTO (and we have this FULL tree with calculated GTO).
But how we can do this?


I don't really know, but maybe this might help.

There are two cases:
1. Villain takes an action that is legal with some hand but not the one he holds. This is non optimal so he will lose in the long term against a hero that continues to play according to the actions that villain takes
2. Villain takes an action that is illegal with any hand. Does this ever happen? If it does, I'd be inclined to proceed assuming he took the most infrequently taken action.

It might be useful to look at https://www.youtube.com/watch?v=qndXrHcV1sM

Libratus suggests you don't really have to exploit to win conclusively. Nevertheless, my opinion is that you have to:
1. Record your opponents action frequencies and showdown hands, then determine his strategy from that info. This takes too many hands to be useful, so you could accelerate it by seeing if the statistics you do collect correspond to previously identified patterns of poor play
2. Exploit his strategy, while not being too exploitable yourself. I've done some experiments on toy games that suggest you can do this by finding the pure strategy that exploits his strategy, and mixing it with your NE strategy. The results I got suggest that small deviations from NE pay off quite well. I guess you could monitor villain statistics for signs that he reacts to your exploitation too. University of Alberta did some work on this - Data biased robust counter strategies. From the work I did on toy games I don't completely agree with their conclusion that you have to recompute NE for every opponent - but I could be wrong.


Top
 Profile  
 
PostPosted: Sun Apr 21, 2019 11:06 am 
Offline
Junior Member

Joined: Tue Jun 28, 2016 7:12 pm
Posts: 11
spears wrote:
2. Villain takes an action that is illegal with any hand. Does this ever happen? If it does, I'd be inclined to proceed assuming he took the most infrequently taken action.


Thanks. Can we discuss this option closely?
Illegal action here means that every hand in GTO strategy has zero probability to choose this action. And if my opponent is too far from GTO play and choosed this zero probability action - i can't do anything with my GTO because i have no strategy for this case. Because zero probability of action means that zero range of opponent will choose this action and nothing can be calculated then here for my GTO strategy (utilities, cfr values etc).
Thats why don't understand how you want to proceed play here? If we have no strategy for this case.

Core problem here is that despite the fact that we have full tree and calculatate GTO strategy - there might be unreachable for GTO nodes. But in real play villain obviously can reach any node. Or i missed something?

This is very cofusing case for me because from one side i have full GTO strategy, but there are situations when i can't do anything with opponent's play.


Top
 Profile  
 
PostPosted: Sun Apr 21, 2019 12:14 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 546
FlashPlayer wrote:
spears wrote:
2. Villain takes an action that is illegal with any hand. Does this ever happen? If it does, I'd be inclined to proceed assuming he took the most infrequently taken action.


Thanks. Can we discuss this option closely?
...
Thats why don't understand how you want to proceed play here?
...

You must play. If you don't play casino will assume you fold and I doubt that is optimal.
I'm assuming that we are talking about the case where villain is making a mistake, rather than taking an action that you haven't modelled.
Do you know for sure that there are modelled actions (other than folds) for which there are no hands that should play them?
Assuming there are, my suggestions:
1. Play assuming the closest case, ie assume villain took the least probable non zero action
2. Take a look at the game tree. Depending on the algorithm you used there might be a strategy for both you and villain remaining from earlier in the solution process when villains chance of taking this action was greater than zero.
3. Call to the end of the hand


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 23 posts ]  Go to page 1, 2  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group