Image Image Image




Post new topic Reply to topic  [ 65 posts ]  Go to page 1, 2, 3, 4  Next
Author Message
 Post subject: Hand History Database (One billion hands for Research)
PostPosted: Tue Mar 31, 2009 5:05 pm 
Offline
PokerAI fellow
User avatar

Posts: 7731
Favourite Bot: V12
*
EDIT: See also http://pokerftp.com
*

Note: This is pre-announcement, so if you decide to go for this at this point, you would partially be a "beta-tester" for the whole process, software, and data involved.

In a nutshell

We are happy to announce that we are giving away free, for research purposes, nearly one billion real money poker hand histories, played on some major PokerSites this and last year, plus supporting software to read these.

In details

We want to provide these hands strictly for research. Therefore, one has to be published author of at least one Poker AI conference paper, and it should be possible for us to verify that, in order to be eligable to obtain these hands. Please PM me here, or e-mail me at findbg@gmail.com for further details.

The billion hands database contains Limit, Pot Limit and No Limit Texas Hold'em and Omaha cash games hand histories, limits from NL2 to NL100000. The inflated size of these hands is in the range of one terrabyte. Therefore the hands are offered in proprietary parsed format, together with Java library for reading the hands (possibility to export them in plain text will come later). The source code of the Java library are offered for free as well (Under GPL v3).

This format, as well as our own tools makes it possible to run analysis over hundreds of millions of hands on a mainstream PCs, where otherwise one might need to spend upto hundred thousands bucks for setup able to handle the same task via conventional RDBMS.

We support no opponent profiling policy. Therefore we are taking measures to prevent usage of this database for opponent profling. We are obfuscating the name of the pokersites from which these hands were obtained, tablenames, player Ids and hand Ids. We are also randomly modifying the time the hand was played (time is shifted with difference of some seconds, to still make possible the extraction of time-dependant player patterns).

Note: The end user licence agreements (EULA) of some of the poker sites from which these hands were obtained are against datamining. We have inquired these pokersites for permission to redistribute these hands in the described manner. We have no response from all of the sites at this time, but we have not been rejected either. Despite that we never formally and techically agreed to the EULAs of these pokersites, we still believe we did what is possible to comply with the intention and spirit of this EULA and we will further cooperate with the poker sites if they require us to do so, to eliminate doubts, if any, that the hands will be used exclusively for research purposes, but not to augment real money play.

If you chose to apply, please fillout the below form, and send it back by PM or e-mail.

Application process

Please provide the following information.
  • Name:
  • University:
  • Position:
  • E-mail /university email/:
  • Paper published /need to respond to verification send to Author's e-mail of that paper/:
  • Home page:
  • Purpose of request:

Alternative ways to verify you academic credibility would work as well, but will not be lighter than the above.

Please, also indicate that you agree with the following terms:
  • All hand histories are provided to you for personal use. You must not redistribute them to third parties under any circumstences. You may use them personally without restrictions or obligations (We would be happy if you cite pokerai.org/pf3 as source of these hands in your academic work).
  • All software for parsing the hand history database is provided under GPL v3 license. Any redistribution is bounded by this license.

I agree to these conditions/Type yes, and your first name as signature/:

_________________
indiana


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Wed Apr 08, 2009 8:56 pm 
Offline
PokerAI fellow
User avatar

Posts: 7731
Favourite Bot: V12
Here is an example of what kind of things you can easily do (this query took me 10 minutes to imlpement).
This and other examples are part of the software distribution.

Image

_________________
indiana


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Sat Apr 25, 2009 12:01 pm 
Offline
PokerAI fellow
User avatar

Posts: 7731
Favourite Bot: V12
RustyBrooks (mod on 2+2) suggested that it will be useful to settle the question which cards (if any) are flopping more than others. Here is what I got on this - I have verified flopped ranks against portion of the database only (I am in the middle of restructuring some code and rebuilding), but still I ran it over 75 million 6-max, NL (all limits) hands.

These are the results. I also post the code sample that calculate this - as ilustration how one can use the DB programatically.

Image

Image

Code:
/*
  This code is released under GPL v3.
  This example calculates which cards (if any) flop more often than others.
*/
package pokerai.hhex.examples;

import pokerai.hhex.*;
import java.io.File;

public class FlopCardsDistribution {

  static long cards[] = new long[13];
  static int totalHands = 0;

  public static void main(String[] args) {
    String rootfolder = "C:\\hhdb\\";
    if (args.length > 0) rootfolder = args[0];
    if (!rootfolder.endsWith("\\")) rootfolder += "\\";
    // ---
    File dir = new File(rootfolder);
    String[] all = dir.list();
    // Scan for all files from site "Default", Holdem NL, 6-seats, and aggregate stats for all found DBs
    for (int i = 0; i < all.length; i++) if (all[i].startsWith("pokerai.org.sample" + Consts.SITE_DEFAULT + "_" + Consts.HOLDEM_NL + "_6") && all[i].endsWith(".hhex")) {
      scan(rootfolder, all[i]);
    }
    // Printing results
    System.out.println("Number of hands parsed: " + (long)totalHands);
    for (byte i = 0; i < cards.length; i++) {
      System.out.println(Consts.printRank(i) + ": " + cards[i]);
    }
  }

  public static void scan(String rootfolder, String fullName) {
    HandManagerNIO hm = new HandManagerNIO();
    hm.init(rootfolder, fullName);
    hm.reset();
    while (hm.hasMoreHands()) {
      PokerHand hand = hm.nextPokerHand(); totalHands++;
      if (totalHands % 10000000 == 0) System.out.println((long)totalHands + " hands read.");
      byte[] boardCards = hand.getCommunityCardsBA();
      if (boardCards != null) {
        if (boardCards[0] != Consts.INVALID_CARD) cards[Consts.getCardRank(boardCards[0])]++;
        if (boardCards[1] != Consts.INVALID_CARD) cards[Consts.getCardRank(boardCards[1])]++;
        if (boardCards[2] != Consts.INVALID_CARD) cards[Consts.getCardRank(boardCards[2])]++;
      }
    }
    hm.closedb();
  }

}

_________________
indiana


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Sun Apr 26, 2009 3:11 am 
Offline
Newbie
User avatar

Posts: 373
Location: The Netherlands
Favourite Bot: The Crusher v2
Can you calculate what % the third flushcard falls on the turn? Should be 20% ofcourse but you never know :D


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Sun Apr 26, 2009 2:37 pm 
Offline
PokerAI fellow
User avatar

Posts: 7731
Favourite Bot: V12
Image

_________________
indiana


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Sun Apr 26, 2009 2:41 pm 
Offline
Senior member
User avatar

Posts: 465
Favourite Bot: Mine, of course!
newbee wrote:
Can you calculate what % the third flushcard falls on the turn? Should be 20% ofcourse but you never know :D

Ummmm, you mean 19% (assuming you include a hole hand that has no flush cards) or 18% (assuming you only consider board cards)? Maybe this is why you have trouble with your bots . . . too much rounding.


Top
 Profile  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Sun Apr 26, 2009 6:15 pm 
Offline
Regular member
User avatar

Posts: 81
Favourite Bot: ICM
indiana wrote:
RustyBrooks (mod on 2+2) suggested that it will be useful to settle the question which cards (if any) are flopping more than others.


Not very useful info, the cards between T and A will flop more seldom than the other cards because people tend to be more willing to pay to see a flop with cards like JT, AT, Ax, T9 than with 72 and so on, therefore I have proven thesis about the uninteresting fact you presented :-D

However I would be very interested in seeing a table showing how winning players are raising their startinghands (per hand ofcourse) vs loosing players preflop game, and if possible split eh winning players from the loosing players by using wins per hand or something like that. Also a table showing the differences in wins per showdown would be useful, my guess is that you will find a lot of loosing players have a high win-percentage.


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Sun Apr 26, 2009 6:27 pm 
Offline
PokerAI fellow
User avatar

Posts: 7731
Favourite Bot: V12
The flopped cards thing was more a dispute, which got resolved by this example, not something mega usefull. In real play, you might already have factored this effect by working with card removal effects and opponent ranges.

Trash wrote:
However I would be very interested in seeing a table showing how winning players are raising their startinghands (per hand ofcourse) vs loosing players preflop game

Let's start just with that, can you make it more definite. I.e. is just about what hands, and how much are the winning players raising, and how much the losing players? How should the stats be combined? Something like for each preflop card combination -> % raised by winning players, and % by losing?

_________________
indiana


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Sun Apr 26, 2009 9:06 pm 
Offline
Regular member
User avatar

Posts: 81
Favourite Bot: ICM
indiana wrote:
The flopped cards thing was more a dispute, which got resolved by this example, not something mega usefull. In real play, you might already have factored this effect by working with card removal effects and opponent ranges.


Theese facts may have not be clear to everyone, thats why I overcleared them :-D

indiana wrote:
Something like for each preflop card combination -> % raised by winning players, and % by losing?


That should be a sufficient start, my point with that kind of data is to determine which hands that actually is lucrative to raise with and which hand that only should be called / folded, with a bit of luck we might find if some hands are overrepresented in a why that takes advantage of the flops mentioned earlier...


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Sun Apr 26, 2009 9:50 pm 
Offline
PokerAI fellow
User avatar

Posts: 7731
Favourite Bot: V12
What you want to do is a bit too specific, but I will get to it once I have a little time, and give it a try (and post the results).

_________________
indiana


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Sun Apr 26, 2009 10:20 pm 
Offline
Regular member
User avatar

Posts: 81
Favourite Bot: ICM
indiana wrote:
What you want to do is a bit too specific, but I will get to it once I have a little time, and give it a try (and post the results).


Could you then please post info about the database layout and a couple of representive HH's so that we who doesn't qualify for the data (I think I dont) could develop tools that can investigate the data for you to present?

The thing for me is that I have a lot of questions around if it is possible to create a baselineplayer based on a couple of statistics that surely will be findable (is that a real word?) in your database and I am not interested in breaking your EULA or retrieve data illegally by downloading it from the pirate bay.

Some of the questions I have is about how mathematically correct you have to play to be a winner, how position actually affects the games for both winners and loosers, is there a really effective way of putting someone on a range and finally how do you most effectivly play a floater's game (a floater calls raises preflop, calls raises postflop and C-bets hard if the turn is checked) in the long term since most strategies have mathematical flaws against them.

Look at it like this:
You will be given a lot of code that mines the data for you.
Some of the students mining the data will be given code and solutions to some of their problems.
Us who just sit here and watch will be given the opportunity to get an insight of how different players acts in different situations.


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Sun Apr 26, 2009 10:22 pm 
Offline
Regular member
User avatar

Posts: 58
Favourite Bot: Marvin
It would be cool to have a forum where people could ask interesting questions like this, I think.

Of course, only few well chosen members should have access to the evaluation data and they should be capable of building rather complex queries. They also would need a lot of insight of datamining techniques as well as probability theory to assess the value of the asked questions to reject them if needed.

By that you could effectively prevent opponent profiling. Furthermore the gathered data would most likely be much more useful for examining many interesting things empirically by making it accessible to more people.

Just some thoughts which come to mind...


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Sun Apr 26, 2009 10:25 pm 
Offline
PokerAI fellow
User avatar

Posts: 7731
Favourite Bot: V12
Trash wrote:
Could you then please post info about the database layout and a couple of representive HH's so that we who doesn't qualify for the data (I think I dont) could develop tools that can investigate the data for you to present?

We thought about this - to release software (in Java) that reads the database, as well as very small sample database which one can use to test his examples and programs; and once he is ready - to submit the source code & we to post the results.

We will most probably do that, but it may take a while (two weeks at least).

_________________
indiana


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Sun Apr 26, 2009 10:35 pm 
Offline
Senior member
User avatar

Posts: 147
Location: Brazil
Favourite Bot: coded one
Can you separate the hands by site?

I would like to know if those myths about differences in the fields are true.
e.g. the AVG VPIP on each site at micro-stakes.

From my personal experience I would say the fishs play the same... w/e it is PS, FTP, iPoker...

_________________
English is not my main language. Sorry.


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Sun Apr 26, 2009 10:41 pm 
Offline
Regular member
User avatar

Posts: 58
Favourite Bot: Marvin
Quote:
We thought about this - to release software (in Java) that reads the database, as well as very small sample database which one can use to test his examples and programs; and once he is ready - to submit the source code & we to post the results.
I like that idea very much. :)
But what exactly do you mean by "reading the database"? How are the HHs stored?
Is it possible to execute SQL-like queries with your data representation?


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Sun Apr 26, 2009 10:52 pm 
Offline
PokerAI fellow
User avatar

Posts: 7731
Favourite Bot: V12
There is nothing SQL like, the approach is basically to enumerate all hands in the database (for the limits/games you're interested), the filter out programatically which hands you need, and do whatever calculations on top.

See the code in this example for how you can use the DB ...

hawkpkr wrote:
Can you separate the hands by site?
Yes. But that's pretty complex to get running now (I add it to my todo list).

_________________
indiana


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Sun Apr 26, 2009 11:14 pm 
Offline
Regular member
User avatar

Posts: 81
Favourite Bot: ICM
indiana wrote:
There is nothing SQL like, the approach is basically to enumerate all hands in the database (for the limits/games you're interested), the filter out programatically which hands you need, and do whatever calculations on top.


What if you released a couple of historyfiles and someone (perhaps me) built an sql-database around them?

I have a finished databasemodel designed to handle really complex querys, the model is basically:
Tables: Player, Action, Hand, Session, Statistic

One Session holds many hands, each hand holds many actions, each action has a player. The cards are represented by 52 a bit-value. Statistic can hold a player and also statistics gathered based on actions taken, ideally this would be an expandable view but that is to slow so I settle for the next best thing, a dynamic model allowing me to gather new statistics dynamicly.


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Sun Apr 26, 2009 11:51 pm 
Offline
PokerAI fellow
User avatar

Posts: 7731
Favourite Bot: V12
@Trash, I'm not interested in that - You can't put 1 or 2 or 5 billion hands (what is the long term goal) in relational database and query that successfully except with some very complex setup or 100k$ hardware. I'm also not releasing any data (being it obfuscated proprietary format, or obfuscated plain text) to anyone except as specified in the original post of this thread.

_________________
indiana


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Mon Apr 27, 2009 6:46 am 
Offline
Regular member
User avatar

Posts: 81
Favourite Bot: ICM
@Indiana: I respect your unwilliness to release the data and won't nag about it again.

But I disagree when it comes to storing the data. I have succesfully stored and queryd a months worth of stock-tradings using a MS-SQL 6 database in a bankingsystem I worked on using quite cheap hardware (back in 2002). The amount of data in that exceeds 5000 rows each second on a workingday (a month means over 57.6 billion rows worth of data) so in my mind it's just a quetion of a smart layout and well implemented indexes. Yes the querys did take some time to finish and reindexing everything was a bitch but going from there to be in need of $100k woth of equipment is a big step. Even though I've been a consultant for banks, telecom companies, insurance companies and at one point even poker sites, all the time working with large databases, I have never even seen the need of such expensive equipment for storing and retrieving data..


Top
 Profile E-mail  
 
 Post subject: Re: Hand History Database for Research (Beta)
PostPosted: Mon Apr 27, 2009 10:53 am 
Offline
PokerAI fellow
User avatar

Posts: 7731
Favourite Bot: V12
@Trash - make a proof of concept that you can store and handle 5 billion hands in DB (retaining full information about the hands). If you don't optimize the way I did (in the proprietary format) you might end up easily with 10 terabytes of disk space needed just to store that on disk.

Your layout of the DB is irrelevant for important scenarios for which I am interested in. Indeces and layout would help you to filter out hands easily (which I can do as well), but in scenarios where you need to really consider every single hand that was played - you have no other option than to read it all, all the 5Tb data on disk. Now consider how much is just the time to read 5Tb once from the disk - and find the minimum time to execute your query (assuming RDBMS overhead is zero). The proprietary format that I have allows me to shrink the neccessary reading to less than 100G (which is still a lot, but with under 1k$ you can get this on a SDD or even alternative disks with read speed upto 1G/sec, so at least making you query in the range of 1-2 minutes). And on a normal HDD - 10-15 minutes.

_________________
indiana


Top
 Profile E-mail  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 65 posts ]  Go to page 1, 2, 3, 4  Next


Who is online

Users browsing this forum: Google and 9 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to: