When working with standard deviations to get confidence intervals, you usually have the properties of the normal distribution for the entire population. And that's where the problem lies here - what's the entire population? If you had access to a huge statistics database, the likes of pokertableratings.com, you could get a pretty good estimate, but then again, it's very possible that differences between bots are more significant than differences between humans, and differences between your bots are more significant than differences between different bots.
To work with a normal distribution, you would obviously need a mean and a standard deviation, but where do you get this from, then? I would suggest doing some empirical measurements as follows:
1) Run one version of your bot through a sample of 10000 random hands 1000 times and get the win-rate every time.
2) Run the second version of your bot through the *same* 1000 samples of 10000 hands - this will reduce random interference - and get the win-rate every time.
3) Calculate the mean & variance for the former 1000 samples and the latter 1000 samples.
4) Calculate Welch's t and use it to test the null hypothesis.
Step 3-4 can be done much easier by inputting all your values into SPSS (statistics software) and running the test on the measurements. Obviousy, the higher the sample size and the number of hands, the more significant the differences get and the higher the confidence as a consequence.
From your first post, I get that you want to have a continuous computation of the confidence while you are running your hands. You would proceed pretty much the same, except that the per-hand winnings/losses would be the value and every hand would be a sample. Obviously, the range of values would then vary from -stacksize to +stacksize and you would have huge variance. I'm not sure that you would ever get a difference between the distributions that's significant enough.Statistics: Posted by Heuristics — Sun Mar 24, 2013 7:37 pm
]]>