Poker-AI.org

Re: Iteratively Modeling Irregularly Distributed Data Stream

2013-05-18T23:42:21+00:00

Found this:
http://sourceforge.net/projects/moa-datastream/

Statistics: Posted by cantina — Sat May 18, 2013 11:42 pm

Re: Iteratively Modeling Irregularly Distributed Data Stream

2013-05-18T22:58:08+00:00

trojanrabbit wrote:

Maybe use kNN with the weights decreasing with distance?

Tysen

An Adaptive Nearest Neighbor Classiﬁcation Algorithm for Data Streams
Had a quick glance over but seems quite clear and may have some useful information.
What about PCA or similar techniques relating to dimensionality problems?

Looks like the problem is with the rarity of some data - also looks like the main research in the cases of rare data comes from the medical side of things.
Mining With Rare Cases is more general but has a few ideas that could be looked into.

Statistics: Posted by ibot — Sat May 18, 2013 10:58 pm

Re: Iteratively Modeling Irregularly Distributed Data Stream

2013-05-16T22:12:50+00:00

I don't receive a stream of information about a single point, rather I receive a stream of information about random points. So, I don't see how I would apply the Kalman filter? It would eventually just give me the (weighted) average of all point coordinates, would it not? The point coordinates (inputs) are not noisy, they are precise, they're just unevenly distributed.

Statistics: Posted by cantina — Thu May 16, 2013 10:12 pm

Re: Iteratively Modeling Irregularly Distributed Data Stream

2013-05-16T19:54:35+00:00

You could use Kalman Filter (or any other Bayes Filter: variations of KF, particle filters, grid estimators, etc.), it works pretty good with streamed data. However, all those smart filters require motion/observation model of a system which is hard to be found analytically. But even with poor (e.g. constant or linear) motion model, you will get a good estimation of posterior distribution in some cases just because of the nature of recursive estimation.

Statistics: Posted by sn0w — Thu May 16, 2013 7:54 pm

Re: Iteratively Modeling Irregularly Distributed Data Stream

2013-05-17T00:23:08+00:00

Will that work for streamed data?

I suppose I could: cache, cluster, train.

Statistics: Posted by cantina — Thu May 16, 2013 7:07 pm

Re: Iteratively Modeling Irregularly Distributed Data Stream

2013-05-16T16:46:32+00:00

Maybe use kNN with the weights decreasing with distance?

Tysen

Statistics: Posted by trojanrabbit — Thu May 16, 2013 4:46 pm

Re: Iteratively Modeling Irregularly Distributed Data Stream

2013-05-16T15:42:46+00:00

longshot wrote:

So in the general case, you have some skewed distribution and you want to learn the regression model for the uniform distribution, online, in an incremental way so that you can do some small computation after seeing each sample from the skewed distribution. Is that right?

If so, then why not simply correct it using a fictitious sampling approach like Magnum suggested?

I think you're misunderstanding, think of it like various points on a 2D map. I get x/y coordinates, and the associated elevation for that point, one at a time (in a streamed fashion). I'm trying to model the elevation based on the coordinates, however, most of the samples are from one small area of the map, and the others are from the various surrounding region. Lets say, 90% of the point data represents 10% of the area. <-- That's the irregular distribution I'm talking about. Now, consider that my point data is from the melting polar cap in Antarctica, where the elevation slowly changes over time. Also, consider that the sample locations change as well, so no longer is that 90% that represents 10% in the same x/y region it was before. Now, also consider that we're in a different Universe, where point data isn't 2D but 34D and elevation isn't 1D but 12D. This is the problem I'm working on.

Statistics: Posted by cantina — Thu May 16, 2013 3:42 pm

Re: Iteratively Modeling Irregularly Distributed Data Stream

2013-05-16T15:28:45+00:00

Good idea, spears. But, the winnings may not be uniformly distributed based on hand value. Or would it...

Statistics: Posted by cantina — Thu May 16, 2013 3:28 pm

Re: Iteratively Modeling Irregularly Distributed Data Stream

2013-05-16T07:38:36+00:00

Weight the data according to pot or winnings?

Statistics: Posted by spears — Thu May 16, 2013 7:38 am

Re: Iteratively Modeling Irregularly Distributed Data Stream

2013-05-16T07:37:03+00:00

Nasher wrote:

No, it's a different problem.

So... the $64,000 question: what's the problem?

Nasher wrote:

I might try splitting up the model. But, I really hate to do that. Maybe just keep a set of slots for the various distribution intervals and wait for each slot to be filled? But, again, I don't know what the upper/lower bounds will be, and they change.

Statistics: Posted by longshot — Thu May 16, 2013 7:37 am

Re: Iteratively Modeling Irregularly Distributed Data Stream

2013-05-16T07:21:59+00:00

longshot wrote:

I don't see how annealing or RPROP would help with this.

It works better than something like simple back-prop.

longshot wrote:

So if I understand, you're trying to predict an opponent's HS using a neural network, where you take in some features about the hand and output what the expected HS is.

No, it's a different problem.

I might try splitting up the model. But, I really hate to do that. Maybe just keep a set of slots for the various distribution intervals and wait for each slot to be filled? But, again, I don't know what the upper/lower bounds will be, and they change.

Statistics: Posted by cantina — Thu May 16, 2013 7:21 am

Re: Iteratively Modeling Irregularly Distributed Data Stream

2013-05-16T06:40:14+00:00

Nasher wrote:

longshot wrote:

What are the desired inputs and outputs of your regression model?

The inputs are numbers from 0..1, outputs are numbers from 0..1.

I thought about caching the stream, taking maybe 100k instances at a time, then using a model/training method that considered the global error rate of the data, like annealing or RPROP.

I don't see how annealing or RPROP would help with this. Based on my 30s reading of the wikipedia page, it seems like RPROP is good when you have the correct polarity and frequency of training, but maybe the actual signal is noisy. That doesn't seem to be your problem.

So if I understand, you're trying to predict an opponent's HS using a neural network, where you take in some features about the hand and output what the expected HS is. So what are you doing with folded hands? Are you just ignoring them, which is what's creating your bias?

If so, then what you want is to basically marginalize over all the possible hands. You probably would want to adjust your backprop learning rate to alpha * p(HC | History). Wouldn't that correct for the bias?

Statistics: Posted by longshot — Thu May 16, 2013 6:40 am

Re: Iteratively Modeling Irregularly Distributed Data Stream

2013-05-16T05:30:27+00:00

Magnum wrote:

1. Find the distribution of your training data
2. Take a random sample of your training data, weighted by the inverse of this distribution

Well, it's a stream, so I can't do that outright. And, the distribution changes. Caching is the best thing I could think of ATM. I saw various papers on using interpolation methods for irregular point data distributions.

Statistics: Posted by cantina — Thu May 16, 2013 5:30 am

Re: Iteratively Modeling Irregularly Distributed Data Stream

2013-05-16T05:22:08+00:00

longshot wrote:

What are the desired inputs and outputs of your regression model?

Statistics: Posted by cantina — Thu May 16, 2013 5:22 am

Re: Iteratively Modeling Irregularly Distributed Data Stream

2013-05-16T04:33:30+00:00

I'm not really clear on what you're trying to predict here. What are the desired inputs and outputs of your regression model?

Statistics: Posted by longshot — Thu May 16, 2013 4:33 am

Re: Iteratively Modeling Irregularly Distributed Data Stream

2013-05-16T03:31:45+00:00

I think the exact method will depend on what type of model you are using. You can either generate duplicates of the "great hands" or remove a portion of the "average value hands". For a regression problem you might be able to do something like the following...

1. Find the distribution of your training data
2. Take a random sample of your training data, weighted by the inverse of this distribution

afaik as long as your training data is uniform, you should be able to predict the same regardless of what the "real" distribution is.

Statistics: Posted by Magnum — Thu May 16, 2013 3:31 am

Iteratively Modeling Irregularly Distributed Data Streams?

2013-05-16T01:54:34+00:00

The title says it all... How do you deal with regressing a model from data that is not uniformly distributed? In poker, for example, you see a lot of hands with an average value, but comparatively fewer hands with a great value. If I were to train a model to recognize hand strength given some pattern in the cards as they're observed at random, the function that is learned would be "compacted" towards the frequently observed, average hands in a nonlinear fashion. <-- What is the best way to avoid this? What if the real distribution isn't completely known? What if the frequency of values changes over time?

Statistics: Posted by cantina — Thu May 16, 2013 1:54 am