Poker-AI.org http://poker-ai.org/phpbb/ |
|
Iteratively Modeling Irregularly Distributed Data Streams? http://poker-ai.org/phpbb/viewtopic.php?f=24&t=2494 |
Page 1 of 1 |
Author: | cantina [ Thu May 16, 2013 1:54 am ] |
Post subject: | Iteratively Modeling Irregularly Distributed Data Streams? |
The title says it all... How do you deal with regressing a model from data that is not uniformly distributed? In poker, for example, you see a lot of hands with an average value, but comparatively fewer hands with a great value. If I were to train a model to recognize hand strength given some pattern in the cards as they're observed at random, the function that is learned would be "compacted" towards the frequently observed, average hands in a nonlinear fashion. <-- What is the best way to avoid this? What if the real distribution isn't completely known? What if the frequency of values changes over time? |
Author: | Magnum [ Thu May 16, 2013 3:31 am ] |
Post subject: | Re: Iteratively Modeling Irregularly Distributed Data Stream |
I think the exact method will depend on what type of model you are using. You can either generate duplicates of the "great hands" or remove a portion of the "average value hands". For a regression problem you might be able to do something like the following... 1. Find the distribution of your training data 2. Take a random sample of your training data, weighted by the inverse of this distribution afaik as long as your training data is uniform, you should be able to predict the same regardless of what the "real" distribution is. |
Author: | longshot [ Thu May 16, 2013 4:33 am ] |
Post subject: | Re: Iteratively Modeling Irregularly Distributed Data Stream |
I'm not really clear on what you're trying to predict here. What are the desired inputs and outputs of your regression model? |
Author: | cantina [ Thu May 16, 2013 5:22 am ] |
Post subject: | Re: Iteratively Modeling Irregularly Distributed Data Stream |
longshot wrote: What are the desired inputs and outputs of your regression model? The inputs are numbers from 0..1, outputs are numbers from 0..1. I thought about caching the stream, taking maybe 100k instances at a time, then using a model/training method that considered the global error rate of the data, like annealing or RPROP. |
Author: | cantina [ Thu May 16, 2013 5:30 am ] |
Post subject: | Re: Iteratively Modeling Irregularly Distributed Data Stream |
Magnum wrote: 1. Find the distribution of your training data 2. Take a random sample of your training data, weighted by the inverse of this distribution Well, it's a stream, so I can't do that outright. And, the distribution changes. Caching is the best thing I could think of ATM. I saw various papers on using interpolation methods for irregular point data distributions. |
Author: | longshot [ Thu May 16, 2013 6:40 am ] |
Post subject: | Re: Iteratively Modeling Irregularly Distributed Data Stream |
Nasher wrote: longshot wrote: What are the desired inputs and outputs of your regression model? The inputs are numbers from 0..1, outputs are numbers from 0..1. I thought about caching the stream, taking maybe 100k instances at a time, then using a model/training method that considered the global error rate of the data, like annealing or RPROP. I don't see how annealing or RPROP would help with this. Based on my 30s reading of the wikipedia page, it seems like RPROP is good when you have the correct polarity and frequency of training, but maybe the actual signal is noisy. That doesn't seem to be your problem. So if I understand, you're trying to predict an opponent's HS using a neural network, where you take in some features about the hand and output what the expected HS is. So what are you doing with folded hands? Are you just ignoring them, which is what's creating your bias? If so, then what you want is to basically marginalize over all the possible hands. You probably would want to adjust your backprop learning rate to alpha * p(HC | History). Wouldn't that correct for the bias? |
Author: | cantina [ Thu May 16, 2013 7:21 am ] |
Post subject: | Re: Iteratively Modeling Irregularly Distributed Data Stream |
longshot wrote: I don't see how annealing or RPROP would help with this. It works better than something like simple back-prop. longshot wrote: So if I understand, you're trying to predict an opponent's HS using a neural network, where you take in some features about the hand and output what the expected HS is. No, it's a different problem. I might try splitting up the model. But, I really hate to do that. Maybe just keep a set of slots for the various distribution intervals and wait for each slot to be filled? But, again, I don't know what the upper/lower bounds will be, and they change. |
Author: | longshot [ Thu May 16, 2013 7:37 am ] |
Post subject: | Re: Iteratively Modeling Irregularly Distributed Data Stream |
Nasher wrote: No, it's a different problem. So... the $64,000 question: what's the problem? Nasher wrote: I might try splitting up the model. But, I really hate to do that. Maybe just keep a set of slots for the various distribution intervals and wait for each slot to be filled? But, again, I don't know what the upper/lower bounds will be, and they change. So in the general case, you have some skewed distribution and you want to learn the regression model for the uniform distribution, online, in an incremental way so that you can do some small computation after seeing each sample from the skewed distribution. Is that right? If so, then why not simply correct it using a fictitious sampling approach like Magnum suggested? |
Author: | spears [ Thu May 16, 2013 7:38 am ] |
Post subject: | Re: Iteratively Modeling Irregularly Distributed Data Stream |
Weight the data according to pot or winnings? |
Author: | cantina [ Thu May 16, 2013 3:28 pm ] |
Post subject: | Re: Iteratively Modeling Irregularly Distributed Data Stream |
Good idea, spears. But, the winnings may not be uniformly distributed based on hand value. Or would it... |
Author: | cantina [ Thu May 16, 2013 3:42 pm ] |
Post subject: | Re: Iteratively Modeling Irregularly Distributed Data Stream |
longshot wrote: So in the general case, you have some skewed distribution and you want to learn the regression model for the uniform distribution, online, in an incremental way so that you can do some small computation after seeing each sample from the skewed distribution. Is that right? If so, then why not simply correct it using a fictitious sampling approach like Magnum suggested? I think you're misunderstanding, think of it like various points on a 2D map. I get x/y coordinates, and the associated elevation for that point, one at a time (in a streamed fashion). I'm trying to model the elevation based on the coordinates, however, most of the samples are from one small area of the map, and the others are from the various surrounding region. Lets say, 90% of the point data represents 10% of the area. <-- That's the irregular distribution I'm talking about. Now, consider that my point data is from the melting polar cap in Antarctica, where the elevation slowly changes over time. Also, consider that the sample locations change as well, so no longer is that 90% that represents 10% in the same x/y region it was before. Now, also consider that we're in a different Universe, where point data isn't 2D but 34D and elevation isn't 1D but 12D. This is the problem I'm working on. |
Author: | trojanrabbit [ Thu May 16, 2013 4:46 pm ] |
Post subject: | Re: Iteratively Modeling Irregularly Distributed Data Stream |
Maybe use kNN with the weights decreasing with distance? Tysen |
Author: | cantina [ Thu May 16, 2013 7:07 pm ] |
Post subject: | Re: Iteratively Modeling Irregularly Distributed Data Stream |
Will that work for streamed data? I suppose I could: cache, cluster, train. |
Author: | sn0w [ Thu May 16, 2013 7:54 pm ] |
Post subject: | Re: Iteratively Modeling Irregularly Distributed Data Stream |
You could use Kalman Filter (or any other Bayes Filter: variations of KF, particle filters, grid estimators, etc.), it works pretty good with streamed data. However, all those smart filters require motion/observation model of a system which is hard to be found analytically. But even with poor (e.g. constant or linear) motion model, you will get a good estimation of posterior distribution in some cases just because of the nature of recursive estimation. |
Author: | cantina [ Thu May 16, 2013 10:12 pm ] |
Post subject: | Re: Iteratively Modeling Irregularly Distributed Data Stream |
I don't receive a stream of information about a single point, rather I receive a stream of information about random points. So, I don't see how I would apply the Kalman filter? It would eventually just give me the (weighted) average of all point coordinates, would it not? The point coordinates (inputs) are not noisy, they are precise, they're just unevenly distributed. |
Author: | ibot [ Sat May 18, 2013 10:58 pm ] |
Post subject: | Re: Iteratively Modeling Irregularly Distributed Data Stream |
trojanrabbit wrote: Maybe use kNN with the weights decreasing with distance? Tysen An Adaptive Nearest Neighbor Classification Algorithm for Data Streams Had a quick glance over but seems quite clear and may have some useful information. What about PCA or similar techniques relating to dimensionality problems? Looks like the problem is with the rarity of some data - also looks like the main research in the cases of rare data comes from the medical side of things. Mining With Rare Cases is more general but has a few ideas that could be looked into. |
Author: | cantina [ Sat May 18, 2013 11:42 pm ] |
Post subject: | Re: Iteratively Modeling Irregularly Distributed Data Stream |
Found this: http://sourceforge.net/projects/moa-datastream/ |
Page 1 of 1 | All times are UTC |
Powered by phpBB® Forum Software © phpBB Group http://www.phpbb.com/ |