Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 5:39 pm

All times are UTC




Post new topic Reply to topic  [ 17 posts ] 
Author Message
PostPosted: Thu May 16, 2013 1:54 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
The title says it all... How do you deal with regressing a model from data that is not uniformly distributed? In poker, for example, you see a lot of hands with an average value, but comparatively fewer hands with a great value. If I were to train a model to recognize hand strength given some pattern in the cards as they're observed at random, the function that is learned would be "compacted" towards the frequently observed, average hands in a nonlinear fashion. <-- What is the best way to avoid this? What if the real distribution isn't completely known? What if the frequency of values changes over time?


Top
 Profile  
 
PostPosted: Thu May 16, 2013 3:31 am 
Offline
New Member

Joined: Sun Mar 10, 2013 12:18 am
Posts: 3
I think the exact method will depend on what type of model you are using. You can either generate duplicates of the "great hands" or remove a portion of the "average value hands". For a regression problem you might be able to do something like the following...

1. Find the distribution of your training data
2. Take a random sample of your training data, weighted by the inverse of this distribution

afaik as long as your training data is uniform, you should be able to predict the same regardless of what the "real" distribution is.


Top
 Profile  
 
PostPosted: Thu May 16, 2013 4:33 am 
Offline
Junior Member

Joined: Thu Apr 11, 2013 10:13 pm
Posts: 22
I'm not really clear on what you're trying to predict here. What are the desired inputs and outputs of your regression model?


Top
 Profile  
 
PostPosted: Thu May 16, 2013 5:22 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
longshot wrote:
What are the desired inputs and outputs of your regression model?

The inputs are numbers from 0..1, outputs are numbers from 0..1. :)

I thought about caching the stream, taking maybe 100k instances at a time, then using a model/training method that considered the global error rate of the data, like annealing or RPROP.


Top
 Profile  
 
PostPosted: Thu May 16, 2013 5:30 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Magnum wrote:
1. Find the distribution of your training data
2. Take a random sample of your training data, weighted by the inverse of this distribution

Well, it's a stream, so I can't do that outright. And, the distribution changes. Caching is the best thing I could think of ATM. I saw various papers on using interpolation methods for irregular point data distributions.


Top
 Profile  
 
PostPosted: Thu May 16, 2013 6:40 am 
Offline
Junior Member

Joined: Thu Apr 11, 2013 10:13 pm
Posts: 22
Nasher wrote:
longshot wrote:
What are the desired inputs and outputs of your regression model?

The inputs are numbers from 0..1, outputs are numbers from 0..1. :)

I thought about caching the stream, taking maybe 100k instances at a time, then using a model/training method that considered the global error rate of the data, like annealing or RPROP.


I don't see how annealing or RPROP would help with this. Based on my 30s reading of the wikipedia page, it seems like RPROP is good when you have the correct polarity and frequency of training, but maybe the actual signal is noisy. That doesn't seem to be your problem.

So if I understand, you're trying to predict an opponent's HS using a neural network, where you take in some features about the hand and output what the expected HS is. So what are you doing with folded hands? Are you just ignoring them, which is what's creating your bias?

If so, then what you want is to basically marginalize over all the possible hands. You probably would want to adjust your backprop learning rate to alpha * p(HC | History). Wouldn't that correct for the bias?


Top
 Profile  
 
PostPosted: Thu May 16, 2013 7:21 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
longshot wrote:
I don't see how annealing or RPROP would help with this.

It works better than something like simple back-prop.

longshot wrote:
So if I understand, you're trying to predict an opponent's HS using a neural network, where you take in some features about the hand and output what the expected HS is.

No, it's a different problem.

I might try splitting up the model. But, I really hate to do that. Maybe just keep a set of slots for the various distribution intervals and wait for each slot to be filled? But, again, I don't know what the upper/lower bounds will be, and they change.


Top
 Profile  
 
PostPosted: Thu May 16, 2013 7:37 am 
Offline
Junior Member

Joined: Thu Apr 11, 2013 10:13 pm
Posts: 22
Nasher wrote:
No, it's a different problem.


So... the $64,000 question: what's the problem?

Nasher wrote:
I might try splitting up the model. But, I really hate to do that. Maybe just keep a set of slots for the various distribution intervals and wait for each slot to be filled? But, again, I don't know what the upper/lower bounds will be, and they change.


So in the general case, you have some skewed distribution and you want to learn the regression model for the uniform distribution, online, in an incremental way so that you can do some small computation after seeing each sample from the skewed distribution. Is that right?

If so, then why not simply correct it using a fictitious sampling approach like Magnum suggested?


Top
 Profile  
 
PostPosted: Thu May 16, 2013 7:38 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
Weight the data according to pot or winnings?


Top
 Profile  
 
PostPosted: Thu May 16, 2013 3:28 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Good idea, spears. But, the winnings may not be uniformly distributed based on hand value. Or would it... :twisted:


Top
 Profile  
 
PostPosted: Thu May 16, 2013 3:42 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
longshot wrote:
So in the general case, you have some skewed distribution and you want to learn the regression model for the uniform distribution, online, in an incremental way so that you can do some small computation after seeing each sample from the skewed distribution. Is that right?

If so, then why not simply correct it using a fictitious sampling approach like Magnum suggested?


I think you're misunderstanding, think of it like various points on a 2D map. I get x/y coordinates, and the associated elevation for that point, one at a time (in a streamed fashion). I'm trying to model the elevation based on the coordinates, however, most of the samples are from one small area of the map, and the others are from the various surrounding region. Lets say, 90% of the point data represents 10% of the area. <-- That's the irregular distribution I'm talking about. Now, consider that my point data is from the melting polar cap in Antarctica, where the elevation slowly changes over time. Also, consider that the sample locations change as well, so no longer is that 90% that represents 10% in the same x/y region it was before. Now, also consider that we're in a different Universe, where point data isn't 2D but 34D and elevation isn't 1D but 12D. This is the problem I'm working on.


Top
 Profile  
 
PostPosted: Thu May 16, 2013 4:46 pm 
Offline
Junior Member
User avatar

Joined: Fri Apr 05, 2013 2:21 am
Posts: 11
Maybe use kNN with the weights decreasing with distance?

Tysen


Top
 Profile  
 
PostPosted: Thu May 16, 2013 7:07 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Will that work for streamed data?

I suppose I could: cache, cluster, train.


Last edited by cantina on Fri May 17, 2013 12:23 am, edited 1 time in total.

Top
 Profile  
 
PostPosted: Thu May 16, 2013 7:54 pm 
Offline
Junior Member

Joined: Wed Mar 06, 2013 3:58 am
Posts: 10
You could use Kalman Filter (or any other Bayes Filter: variations of KF, particle filters, grid estimators, etc.), it works pretty good with streamed data. However, all those smart filters require motion/observation model of a system which is hard to be found analytically. But even with poor (e.g. constant or linear) motion model, you will get a good estimation of posterior distribution in some cases just because of the nature of recursive estimation.


Top
 Profile  
 
PostPosted: Thu May 16, 2013 10:12 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
I don't receive a stream of information about a single point, rather I receive a stream of information about random points. So, I don't see how I would apply the Kalman filter? It would eventually just give me the (weighted) average of all point coordinates, would it not? The point coordinates (inputs) are not noisy, they are precise, they're just unevenly distributed.


Top
 Profile  
 
PostPosted: Sat May 18, 2013 10:58 pm 
Offline
Regular Member
User avatar

Joined: Tue Mar 05, 2013 9:19 pm
Posts: 50
trojanrabbit wrote:
Maybe use kNN with the weights decreasing with distance?

Tysen

An Adaptive Nearest Neighbor Classification Algorithm for Data Streams
Had a quick glance over but seems quite clear and may have some useful information.
What about PCA or similar techniques relating to dimensionality problems?

Looks like the problem is with the rarity of some data - also looks like the main research in the cases of rare data comes from the medical side of things.
Mining With Rare Cases is more general but has a few ideas that could be looked into.


Top
 Profile  
 
PostPosted: Sat May 18, 2013 11:42 pm 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
Found this:
http://sourceforge.net/projects/moa-datastream/


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 17 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group