Poker-AI.org

Re: Statistics geeks here? Testing different bots

2013-03-25T01:00:18+00:00

Thanks for the replies. When I have a couple of minutes, I guess I will try a "stupid mans statistics": if we assume the worst case, the variance is limited by the amount of the stack size. So we could assume every hand is a coin flip, which is an upper bound to the true variance. Then, we should be able to find a number of games to sample after which we can say with 99.9% that our bot is a winner against the other one. Obviously, the number of hands required is way larger compared to calculating it with the "true" variance, but given we can run 100 million hands in an hour - which should be a solid sample size - we should get enough information out of it.

Statistics: Posted by proud2bBot — Mon Mar 25, 2013 1:00 am

Re: Statistics geeks here? Testing different bots

2013-03-24T19:37:39+00:00

I just wanted to comment on your idea of calculating the confidence of both win-rates really being significantly different.

When working with standard deviations to get confidence intervals, you usually have the properties of the normal distribution for the entire population. And that's where the problem lies here - what's the entire population? If you had access to a huge statistics database, the likes of pokertableratings.com, you could get a pretty good estimate, but then again, it's very possible that differences between bots are more significant than differences between humans, and differences between your bots are more significant than differences between different bots.

To work with a normal distribution, you would obviously need a mean and a standard deviation, but where do you get this from, then? I would suggest doing some empirical measurements as follows:

1) Run one version of your bot through a sample of 10000 random hands 1000 times and get the win-rate every time.
2) Run the second version of your bot through the *same* 1000 samples of 10000 hands - this will reduce random interference - and get the win-rate every time.
3) Calculate the mean & variance for the former 1000 samples and the latter 1000 samples.
4) Calculate Welch's t and use it to test the null hypothesis.

Step 3-4 can be done much easier by inputting all your values into SPSS (statistics software) and running the test on the measurements. Obviousy, the higher the sample size and the number of hands, the more significant the differences get and the higher the confidence as a consequence.

From your first post, I get that you want to have a continuous computation of the confidence while you are running your hands. You would proceed pretty much the same, except that the per-hand winnings/losses would be the value and every hand would be a sample. Obviously, the range of values would then vary from -stacksize to +stacksize and you would have huge variance. I'm not sure that you would ever get a difference between the distributions that's significant enough.

Statistics: Posted by Heuristics — Sun Mar 24, 2013 7:37 pm

Re: Statistics geeks here? Testing different bots

2013-03-23T18:15:05+00:00

- Use self-play over a very large number of hands to estimate the variance, preferably playing duplicate
- Assume that variance applies in games between different bots
- Use standard statistical techniques to estimate the number of hands required for a given confidence

http://web.archive.org/web/200711111524 ... php?t=1872

Statistics: Posted by spears — Sat Mar 23, 2013 6:15 pm

Re: Statistics geeks here? Testing different bots

2013-03-22T21:42:07+00:00

The U of A just published another paper recently about estimating strategy value.

For HUL I just played versus a baseline EQ for something like 64 million games of duplicate poker to get a good estimate. It took awhile, but it gave me a fairly accurate picture. This was with strategies where just a few millibets/hand meant a great deal.

Statistics: Posted by cantina — Fri Mar 22, 2013 9:42 pm

Re: Statistics geeks here? Testing different bots

2013-03-22T21:27:55+00:00

I know Divat, but I think it makes too many assumptions in its calculation which might introduce a bias, too.

Regarding sample size: yes, I know, the problem is how to know in which range my real WR is within a certainty of say 99%. Currently, I just put out the WR every 10k hands and "watch it" converging, but a more mathematical solution would be nice.

Randomness is reduced in my current approach similarly: after each hand, hole cards are switched (the bots don't have a memory to exploit it and don't adjust).

I checked my Evaluator implementation and found that one method took all the time which could be avoided. So I changed it and can now play 1.7million hands per minute, which gives a solid result after not too much time. However, if someone knows how to get the confidence intervals right, I'd still appreciate the information.

Statistics: Posted by proud2bBot — Fri Mar 22, 2013 9:27 pm

Re: Statistics geeks here? Testing different bots

2013-03-22T20:42:26+00:00

To comment on some of your thoughts:

1) When it comes to sample size, you will need a bigger one the smaller the difference in win rate is. If you are talking about win rates of less than 1BB/100, you will definitely need several hundred thousands of hands to have a certain confidence on random testing.

2) On the other hand, you could drastically reduce the impact of randomness by doing the following: Have the bots play each other for say 10k hands (I'd personally never go below 100k as a significant sample size, but anyway). Save each hand in an easy to store format. Run the test again, inverting the hands played. Each bot will have played through the exact same situations, so the result will be much more accurate I reckon.

Statistics: Posted by Heuristics — Fri Mar 22, 2013 8:42 pm

Re: Statistics geeks here? Testing different bots

2013-03-22T19:22:08+00:00

You might want to check out DIVAT, developed by University of Alberta.
http://poker.cs.ualberta.ca/publication ... -icgaj.pdf

Statistics: Posted by Coffee4tw — Fri Mar 22, 2013 7:22 pm

Statistics geeks here? Testing different bots

2013-03-22T17:45:34+00:00

Lets assume we have 2 bots with different implementations (e.g. betsize abstractions, bucketing, ...). How do we know which one outperforms the other? Currently, I use an evaluation method that lets both bots play vs each other for a larger number of hands and check the winrates. However, this can take very long and its hard to judge whether a Winrate of 0.2bb/100 is due to a better abstraction or just randomness even after several 100k hands (just let a single bot play against each other - both have the same strategy so they should play exactly BE from the EV, but even after many 100k games you will see a fluctuation in the winrate).

Now I remembered statistics can help in these cases but I always hated it at university, so someone might have better input on if my idea is feasible and how to execute it: Lets assume we want to test that Bot1 is better than Bot2, so our hypothesis is WR-b1 > WR-b2. The actual results of a bot should follow a distribution which - given a large number of hands - can be approximated with a normal distribution. Given that, we should be able to calculate the confidence interval (e.g., for 99%) after every n hands and see a) if 0 is within it (which means we need to let it run longer or the change of the abstraction did not lead to a significant difference in the winrate), b) the upper bound is below zero (we screwed up and made the bot worse) or c) the lower bound is above zero (we found a better abstraction).
If we would use this approach, we could calculate the confidence interval after say each 10k games the bots played and if we find case b or c, end the evaluation, even if we run it for only a low number of test games.

Has anyone tried such an approach or ideas how to calculate the confidence interval given we don't have the exact variance of the population?

Statistics: Posted by proud2bBot — Fri Mar 22, 2013 5:45 pm