Poker-AI.org Poker AI and Botting Discussion Forum 2013-03-25T01:00:18+00:00 http://poker-ai.org/phpbb/feed.php?f=24&t=2422 2013-03-25T01:00:18+00:00 2013-03-25T01:00:18+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2422&p=3493#p3493 <![CDATA[Re: Statistics geeks here? Testing different bots]]> Statistics: Posted by proud2bBot — Mon Mar 25, 2013 1:00 am


]]>
2013-03-24T19:37:39+00:00 2013-03-24T19:37:39+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2422&p=3489#p3489 <![CDATA[Re: Statistics geeks here? Testing different bots]]>
When working with standard deviations to get confidence intervals, you usually have the properties of the normal distribution for the entire population. And that's where the problem lies here - what's the entire population? If you had access to a huge statistics database, the likes of pokertableratings.com, you could get a pretty good estimate, but then again, it's very possible that differences between bots are more significant than differences between humans, and differences between your bots are more significant than differences between different bots.

To work with a normal distribution, you would obviously need a mean and a standard deviation, but where do you get this from, then? I would suggest doing some empirical measurements as follows:

1) Run one version of your bot through a sample of 10000 random hands 1000 times and get the win-rate every time.
2) Run the second version of your bot through the *same* 1000 samples of 10000 hands - this will reduce random interference - and get the win-rate every time.
3) Calculate the mean & variance for the former 1000 samples and the latter 1000 samples.
4) Calculate Welch's t and use it to test the null hypothesis.

Step 3-4 can be done much easier by inputting all your values into SPSS (statistics software) and running the test on the measurements. Obviousy, the higher the sample size and the number of hands, the more significant the differences get and the higher the confidence as a consequence.

From your first post, I get that you want to have a continuous computation of the confidence while you are running your hands. You would proceed pretty much the same, except that the per-hand winnings/losses would be the value and every hand would be a sample. Obviously, the range of values would then vary from -stacksize to +stacksize and you would have huge variance. I'm not sure that you would ever get a difference between the distributions that's significant enough.

Statistics: Posted by Heuristics — Sun Mar 24, 2013 7:37 pm


]]>
2013-03-23T18:15:05+00:00 2013-03-23T18:15:05+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2422&p=3478#p3478 <![CDATA[Re: Statistics geeks here? Testing different bots]]> - Assume that variance applies in games between different bots
- Use standard statistical techniques to estimate the number of hands required for a given confidence

http://web.archive.org/web/200711111524 ... php?t=1872

Statistics: Posted by spears — Sat Mar 23, 2013 6:15 pm


]]>
2013-03-22T21:42:07+00:00 2013-03-22T21:42:07+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2422&p=3465#p3465 <![CDATA[Re: Statistics geeks here? Testing different bots]]>
For HUL I just played versus a baseline EQ for something like 64 million games of duplicate poker to get a good estimate. It took awhile, but it gave me a fairly accurate picture. This was with strategies where just a few millibets/hand meant a great deal.

Statistics: Posted by cantina — Fri Mar 22, 2013 9:42 pm


]]>
2013-03-22T21:27:55+00:00 2013-03-22T21:27:55+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2422&p=3463#p3463 <![CDATA[Re: Statistics geeks here? Testing different bots]]>
Regarding sample size: yes, I know, the problem is how to know in which range my real WR is within a certainty of say 99%. Currently, I just put out the WR every 10k hands and "watch it" converging, but a more mathematical solution would be nice.

Randomness is reduced in my current approach similarly: after each hand, hole cards are switched (the bots don't have a memory to exploit it and don't adjust).

I checked my Evaluator implementation and found that one method took all the time which could be avoided. So I changed it and can now play 1.7million hands per minute, which gives a solid result after not too much time. However, if someone knows how to get the confidence intervals right, I'd still appreciate the information.

Statistics: Posted by proud2bBot — Fri Mar 22, 2013 9:27 pm


]]>
2013-03-22T20:42:26+00:00 2013-03-22T20:42:26+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2422&p=3460#p3460 <![CDATA[Re: Statistics geeks here? Testing different bots]]>
1) When it comes to sample size, you will need a bigger one the smaller the difference in win rate is. If you are talking about win rates of less than 1BB/100, you will definitely need several hundred thousands of hands to have a certain confidence on random testing.

2) On the other hand, you could drastically reduce the impact of randomness by doing the following: Have the bots play each other for say 10k hands (I'd personally never go below 100k as a significant sample size, but anyway). Save each hand in an easy to store format. Run the test again, inverting the hands played. Each bot will have played through the exact same situations, so the result will be much more accurate I reckon.

Statistics: Posted by Heuristics — Fri Mar 22, 2013 8:42 pm


]]>
2013-03-22T19:22:08+00:00 2013-03-22T19:22:08+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2422&p=3457#p3457 <![CDATA[Re: Statistics geeks here? Testing different bots]]> http://poker.cs.ualberta.ca/publication ... -icgaj.pdf

Statistics: Posted by Coffee4tw — Fri Mar 22, 2013 7:22 pm


]]>
2013-03-22T17:45:34+00:00 2013-03-22T17:45:34+00:00 http://poker-ai.org/phpbb/viewtopic.php?t=2422&p=3453#p3453 <![CDATA[Statistics geeks here? Testing different bots]]>
Now I remembered statistics can help in these cases but I always hated it at university, so someone might have better input on if my idea is feasible and how to execute it: Lets assume we want to test that Bot1 is better than Bot2, so our hypothesis is WR-b1 > WR-b2. The actual results of a bot should follow a distribution which - given a large number of hands - can be approximated with a normal distribution. Given that, we should be able to calculate the confidence interval (e.g., for 99%) after every n hands and see a) if 0 is within it (which means we need to let it run longer or the change of the abstraction did not lead to a significant difference in the winrate), b) the upper bound is below zero (we screwed up and made the bot worse) or c) the lower bound is above zero (we found a better abstraction).
If we would use this approach, we could calculate the confidence interval after say each 10k games the bots played and if we find case b or c, end the evaluation, even if we run it for only a low number of test games.

Has anyone tried such an approach or ideas how to calculate the confidence interval given we don't have the exact variance of the population?

Statistics: Posted by proud2bBot — Fri Mar 22, 2013 5:45 pm


]]>