Poker-AI.org http://poker-ai.org/phpbb/ |
|
Public chance sampling http://poker-ai.org/phpbb/viewtopic.php?f=24&t=2483 |
Page 1 of 2 |
Author: | proud2bBot [ Mon May 06, 2013 1:04 am ] |
Post subject: | Public chance sampling |
I was thinking to implement PCS (http://webdocs.cs.ualberta.ca/~johanson/publications/poker/2012-aamas-pcs/2012-aamas-pcs.pdf). As far as I understand, the variables my_i and my_-i are vectors of size (48 choose 2) where each entry represents one of the possible hole cards we can have given the board and indicates how often we have this hand in our range given the history. The strategy sigma is similarly a matrix with (48 choose 2) lines and one column for each action. Is this so far correct? No what I don't get is the following: how do we - given we have the two reachability vectors and the hand strength of each hand - perform the efficient terminal node evaluation? |
Author: | spears [ Tue May 07, 2013 6:36 am ] |
Post subject: | Re: Public chance sampling |
I haven't checked but there might be something in here http://poker-ai.org/archive/www.pokerai ... 22&p=42973 |
Author: | amax [ Tue May 07, 2013 11:17 am ] |
Post subject: | Re: Public chance sampling |
RiverSolver2.zip has the fast evaluators. |
Author: | proud2bBot [ Thu May 09, 2013 9:49 pm ] |
Post subject: | Re: Public chance sampling |
Thanks spears and amax. I looked into the code and I think I figured it out how its calculated. However, if my thoughts are right there is no handling of ties in the code, which makes the EV incorrect. For instance, assume we have a 100% range and we have a hand which never wins, but ties in 50% (and loses the remaining 50%). If we just look at wins and loses, we'd assume that the EV is 0, but we have actually a positive EV due to the ties. It looks however that the implementation is for holdem, so I wonder if this case is handled differently, if the EV is not accurate or if I just didnt understand the algorithm completely? |
Author: | spears [ Fri May 10, 2013 6:17 am ] |
Post subject: | Re: Public chance sampling |
I haven't read amax's code but this make sense to me: http://webdocs.cs.ualberta.ca/~bowling/papers/11ijcai-rgbr.pdf wrote: Example 3. Suppose we can sort each players’ information
sets by “rank”, and the utility only depends upon the relative ordering of the players’ ranks. This is exactly the situation that occurs in poker. For the moment, let us assume the distribution of the players’ ranks are independent. In this case, evaluating each of our information sets requires only O(n) work. We know that our weakest information set will be weaker than some of the opponent’s hands, equal to some, and better than some. We keep indices into the opponent’s ordered list of information sets to mark where these changes occur. To evaluate our information set, we only need to know the total probability of the opponent’s information sets in these three sections. After we evaluate one of our information sets and move to a stronger one, we just adjust these two indices up one step in rank. |
Author: | proud2bBot [ Fri May 10, 2013 7:22 am ] |
Post subject: | Re: Public chance sampling |
yes, that was my understanding too: you need to know which combos win, lose and tie. In the code however, only wins and loses are handled (including card removal): Code: public override double[] TrainPublicChanceSampling(int trainplayer, PublicIteration iteration, double[] p, double[] op) { var pholes = iteration.GetHoles(trainplayer); var oholes = iteration.GetHoles(trainplayer ^ 1); var ev = new double[p.Length]; var wincr = new double[52]; double winsum = 0; int j = 0; for (int i = 0; i < p.Length; i++) { while (oholes[j].Rank < pholes[i].Rank) { winsum += op[j]; wincr[oholes[j].Card1] += op[j]; wincr[oholes[j].Card2] += op[j]; j++; } ev[i] = (winsum - wincr[pholes[i].Card1] - wincr[pholes[i].Card2]) * value; } var losecr = new double[52]; double losesum = 0; j = op.Length - 1; for (int i = p.Length - 1; i >= 0; i--) { while (oholes[j].Rank > pholes[i].Rank) { losesum += op[j]; losecr[oholes[j].Card1] += op[j]; losecr[oholes[j].Card2] += op[j]; j--; } ev[i] -= (losesum - losecr[pholes[i].Card1] - losecr[pholes[i].Card2]) * value; } return ev; } |
Author: | spears [ Fri May 10, 2013 8:16 am ] |
Post subject: | Re: Public chance sampling |
I've not put much effort into this, but observe that if negative ev indicates a loss and positive ev indicates a win then that code would be right. |
Author: | amax [ Fri May 10, 2013 3:24 pm ] |
Post subject: | Re: Public chance sampling |
There's code there that verifies it's correct. (ev of tie = 0) |
Author: | DreamInBinary [ Sat Apr 15, 2017 10:52 am ] |
Post subject: | Re: Public chance sampling |
I have been interested in vectorising CFR for a while now and finally got PCS implemented. However, I have troubles on implementing an efficient terminal node evaluator in vector form. Currently I have just the naive form: terminal_ev = ( non_conflict_matrix * payoffs ) dot reach_opp Is vector form for the efficient version even possible? |
Author: | spears [ Tue Apr 18, 2017 8:12 am ] |
Post subject: | Re: Public chance sampling |
Not sure if this answers your question: - Ahead of time I find the vector of integers that sorts ranks into ascending order. - At runtime I use this sort vector to rearrange opponents probabilities. - Then, in Scala Code: val ranks = Array(1,2,3,4,5)
val oppReach = Array(0.1,0.2,0.3,0.1,0.2) val payoff = 1.3 val strengths = Array.ofDim[Double](ranks.length) strengths(0) = oppReach(0) - oppReach.sum for(i <- 1 until ranks.length) strengths(i) = strengths(i - 1) + oppReach(i - 1) + oppReach(i) val evs = strengths.map { s => s * payoff } |
Author: | DreamInBinary [ Tue Apr 18, 2017 10:22 am ] |
Post subject: | Re: Public chance sampling |
Thanks spears that does shed some light on the matter. Also your proposition I think could be vectorised further by replacing the for loop with something along the lines: Code: strengths = oppreach[0:-1].cumsum() - oppreach[1:] assuming Scala allows for element-wise substraction and Python-like subindexing of arrays. Correct me if I'm wrong, but your code seems to assume: 1. oppReach is indexed by ranks ie. own and opponent rank vectors are the same 2. Ranks do not repeat. Maybe oppReach summed over the same ranks? 3. The code does not allow for dependent ranks. Eg. suppose if own rank is 5 then the opp can't have 5 and 3 The 1,2 I see how to handle, but I am puzzled by the third one. I wonder is something like that would work: Code: for(i <- 1 until ranks.length) strengths(i) = strengths(i - 1) + oppReach(i - 1) - oppReach(i) - sum_{i conflicts with j} oppReach(j)
|
Author: | spears [ Tue Apr 18, 2017 12:34 pm ] |
Post subject: | Re: Public chance sampling |
DreamInBinary wrote: ... I think could be vectorised further by replacing the for loop with something along the lines: Code: strengths = oppreach[0:-1].cumsum() - oppreach[1:] assuming Scala allows for element-wise substraction and Python-like subindexing of arrays. Sorry, can't figure out Python fast enough to comment. DreamInBinary wrote: Correct me if I'm wrong, but your code seems to assume: 1. oppReach is indexed by ranks ie. own and opponent rank vectors are the same 2. Ranks do not repeat. Maybe oppReach summed over the same ranks? 3. The code does not allow for dependent ranks. Eg. suppose if own rank is 5 then the opp can't have 5 and 3 The 1,2 I see how to handle, but I am puzzled by the third one. I wonder is something like that would work: Code: for(i <- 1 until ranks.length) strengths(i) = strengths(i - 1) + oppReach(i - 1) - oppReach(i) - sum_{i conflicts with j} oppReach(j) You've got the idea. Ranks repeat a lot and potentially this allows us to shorten the loop though I didn't do that myself. I deal with conflicts approximately and only when oppReach is very uneven. |
Author: | DreamInBinary [ Wed May 17, 2017 6:23 pm ] |
Post subject: | Re: Public chance sampling |
A month has passed and I can conclude that its very tricky to make it parallel and efficient at the same time. However, I have dropped the parallelism and am using it as a serial evaluator. Plenty of things to parallelise on anyway. Has anyone tried extending this to more than 2 players? |
Author: | FlashPlayer [ Sun Apr 14, 2019 7:39 pm ] |
Post subject: | Re: Public chance sampling |
Hi guys. Tried to find a topic for my question and looks like this is correct one)) So my question is: how to correctly use non zero starting pot when calculating CFR (i am using CS variant) from flop? Lets take a look on a push/checkdown tree from flop, where first player can check or push, and second can check or push on 1st players check. If they both checked - they are checking turn and river till showdown. So if both players have stacks 100 at the start of flop and zero pot (yeah, i know this is impossible, but for example) the utilities in terminal nodes are 100 or 0. 100 when there was All In and Call. 0 when they both checked, or one went All In and other folded. And this type of tree after lot of iterations ends up with around zero best responses for both players. This looks ok. But what if starting pot equals 40 for example? How i need to change utilities and what best response i must achieve after training? If 1st player went All In and second folds - what is utility here? 40 for 1st and 0 for 2nd? But this is a zero sum game, so sum of utilities in each terminal node must be zero. Or not? Also i thought that best response here (after training) is 20 for both, because with simmetrical tree and non zero starting bank - both players must print money here in long distance and logic (and commercial solvers) says that there EV (and best response) is a half of starting bank. But my experiments ends with smth like BR1=31 and BR2 = 29. This maybe a result of incorrect utilities - thats why i'm looking for help. |
Author: | spears [ Mon Apr 15, 2019 10:07 am ] |
Post subject: | Re: Public chance sampling |
Quote: So my question is: how to correctly use non zero starting pot when calculating CFR (i am using CS variant) from flop? Quote: If 1st player went All In and second folds - what is utility here? 40 for 1st and 0 for 2nd? You just need to figure out a convention for how the pot grows and how winnings are allocated. So here is one suggestion, from many possibilities - Assume starting stacks are both 100 - Increment pot by 2 * amount to call when call or raise occurs. - At start of flop pot is 2*40 - Player 1 goes all in, ie bets 60. Player 2 calls. Pot is now 200. Player 2 has best hand, gets half of pot, ie 100. Terminal utility is +100 and goes to last player to act, player 2 - or - Player 1 goes all in, ie bets 60. Player 2 folds. Pot is still 80. Player 2 loses half of pot ie -40. Terminal utility is -40 and goes to last player to act, player 2 |
Author: | FlashPlayer [ Thu Apr 18, 2019 3:12 pm ] |
Post subject: | Re: Public chance sampling |
Yeah, thanks! I finally solved this problem and my CFR algo now provides close to famous solvers results. |
Author: | FlashPlayer [ Sat Apr 20, 2019 8:51 am ] |
Post subject: | Re: Public chance sampling |
Got a new question Lets assume that we have HU game, we have a big (for example FULL) tree of this game and we calculated GTO for it. Perfect situation. So we have a strategy for every situation in this tree. But in this tree in some opponent node there are few actions, and one of them (like bet with sizing x100500, or fold with nuts only) has 0% probability for opponent to choose during GTO play. And what we need to do if our opponent during real game choosed this 0% prob action? In our tree, if we used CS CFR algo we visited this node maybe one time, got very negative regret for opponent, set probability to zero and then didn't visit this node again, so we have no strategy for this node and all nodes after it. But logic says that if opponent choosed so unliked action - we must exploit him so much in this situation if we play GTO (and we have this FULL tree with calculated GTO). But how we can do this? |
Author: | spears [ Sun Apr 21, 2019 8:47 am ] |
Post subject: | Re: Public chance sampling |
FlashPlayer wrote: Got a new question Lets assume that we have HU game, we have a big (for example FULL) tree of this game and we calculated GTO for it. Perfect situation. So we have a strategy for every situation in this tree. But in this tree in some opponent node there are few actions, and one of them (like bet with sizing x100500, or fold with nuts only) has 0% probability for opponent to choose during GTO play. And what we need to do if our opponent during real game choosed this 0% prob action? In our tree, if we used CS CFR algo we visited this node maybe one time, got very negative regret for opponent, set probability to zero and then didn't visit this node again, so we have no strategy for this node and all nodes after it. But logic says that if opponent choosed so unliked action - we must exploit him so much in this situation if we play GTO (and we have this FULL tree with calculated GTO). But how we can do this? I don't really know, but maybe this might help. There are two cases: 1. Villain takes an action that is legal with some hand but not the one he holds. This is non optimal so he will lose in the long term against a hero that continues to play according to the actions that villain takes 2. Villain takes an action that is illegal with any hand. Does this ever happen? If it does, I'd be inclined to proceed assuming he took the most infrequently taken action. It might be useful to look at https://www.youtube.com/watch?v=qndXrHcV1sM Libratus suggests you don't really have to exploit to win conclusively. Nevertheless, my opinion is that you have to: 1. Record your opponents action frequencies and showdown hands, then determine his strategy from that info. This takes too many hands to be useful, so you could accelerate it by seeing if the statistics you do collect correspond to previously identified patterns of poor play 2. Exploit his strategy, while not being too exploitable yourself. I've done some experiments on toy games that suggest you can do this by finding the pure strategy that exploits his strategy, and mixing it with your NE strategy. The results I got suggest that small deviations from NE pay off quite well. I guess you could monitor villain statistics for signs that he reacts to your exploitation too. University of Alberta did some work on this - Data biased robust counter strategies. From the work I did on toy games I don't completely agree with their conclusion that you have to recompute NE for every opponent - but I could be wrong. |
Author: | FlashPlayer [ Sun Apr 21, 2019 11:06 am ] |
Post subject: | Re: Public chance sampling |
spears wrote: 2. Villain takes an action that is illegal with any hand. Does this ever happen? If it does, I'd be inclined to proceed assuming he took the most infrequently taken action. Thanks. Can we discuss this option closely? Illegal action here means that every hand in GTO strategy has zero probability to choose this action. And if my opponent is too far from GTO play and choosed this zero probability action - i can't do anything with my GTO because i have no strategy for this case. Because zero probability of action means that zero range of opponent will choose this action and nothing can be calculated then here for my GTO strategy (utilities, cfr values etc). Thats why don't understand how you want to proceed play here? If we have no strategy for this case. Core problem here is that despite the fact that we have full tree and calculatate GTO strategy - there might be unreachable for GTO nodes. But in real play villain obviously can reach any node. Or i missed something? This is very cofusing case for me because from one side i have full GTO strategy, but there are situations when i can't do anything with opponent's play. |
Author: | spears [ Sun Apr 21, 2019 12:14 pm ] |
Post subject: | Re: Public chance sampling |
FlashPlayer wrote: spears wrote: 2. Villain takes an action that is illegal with any hand. Does this ever happen? If it does, I'd be inclined to proceed assuming he took the most infrequently taken action. Thanks. Can we discuss this option closely? ... Thats why don't understand how you want to proceed play here? ... You must play. If you don't play casino will assume you fold and I doubt that is optimal. I'm assuming that we are talking about the case where villain is making a mistake, rather than taking an action that you haven't modelled. Do you know for sure that there are modelled actions (other than folds) for which there are no hands that should play them? Assuming there are, my suggestions: 1. Play assuming the closest case, ie assume villain took the least probable non zero action 2. Take a look at the game tree. Depending on the algorithm you used there might be a strategy for both you and villain remaining from earlier in the solution process when villains chance of taking this action was greater than zero. 3. Call to the end of the hand |
Page 1 of 2 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/ |