Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Jun 18, 2018 9:17 pm

All times are UTC




Post new topic Reply to topic  [ 10 posts ] 
Author Message
 Post subject: screen scraping
PostPosted: Sun Apr 15, 2018 3:50 pm 
Offline
New Member

Joined: Thu Apr 05, 2018 2:57 pm
Posts: 5
hi,

i am starting coding a scraper. I code on a linux machine so I cant use the OpenScrap project. I searched this forum how to start but i wanted
to check with you guys first if my approach is okay.

- cards + buttons + checkboxes: take sample images of these, and then make a screenshot and look for these images in certain regions
- names + stacks: use OCR for these. Tesseract and openocr seem to be the choices.

Is this the way to go scrape wise?

Another option is to run OpenScrape on a windows machine. Create a tablemap using that program. And then parse that table map in
my bot code.


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Sun Apr 15, 2018 5:30 pm 
Offline
New Member

Joined: Thu Apr 05, 2018 2:57 pm
Posts: 5
about scraping the stacks:

is it also possible to create images of all the digits used in the poker client. and then just search for those images and construct
the stacksize that way...or does this take too much time?


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Sun Apr 15, 2018 6:56 pm 
Offline
New Member

Joined: Thu Feb 22, 2018 2:09 pm
Posts: 9
Jannus wrote:
about scraping the stacks:

is it also possible to create images of all the digits used in the poker client. and then just search for those images and construct
the stacksize that way...or does this take too much time?


i think this is the wrong way because:
1) if you resize the room you can't use your palette of rgb
2) the computation is too much slow
3) is possible that the palette of screen colors generated by the windows/linux system are "different" for some motivation

For me the best choice is neural network with normalization of inputs parameters (in other words: OCR, but the standard "ocr" softwares have problem with smaller characters).


Last edited by Timmy1992 on Sun Apr 15, 2018 7:29 pm, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Sun Apr 15, 2018 7:04 pm 
Offline
New Member

Joined: Thu Apr 05, 2018 2:57 pm
Posts: 5
but if you keep all the poker tables the same size..it could work couldnt it? (i dont understand your 3 point)

really dont know if the speed is going to be an issue.

I am running a linux host...with virtual box windows 7. then scrape the virtual windows 7 system.


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Sun Apr 15, 2018 7:35 pm 
Offline
New Member

Joined: Thu Feb 22, 2018 2:09 pm
Posts: 9
Quote:
but if you keep all the poker tables the same size..it could work couldnt it? (i dont understand your 3 point)


Sorry for my bad english.
For example: if you need to scarp stack size "$ 200.57" you will have many problems because the pixels will changes and is possible that the same numbers have different pixels in different "times".

Quote:
really dont know if the speed is going to be an issue.

I think so. But try...


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Tue Apr 17, 2018 4:35 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 520
I've not done it myself but I've read that Tesseract can be made to work quite well by scaling up small characters, increasing contrast, and converting to greyscale.


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Fri Apr 20, 2018 3:39 am 
Offline
New Member

Joined: Fri Apr 20, 2018 3:17 am
Posts: 3
Hi, actually made an account just to join in this discussion.

I was working on a project and lost my data so im starting from scratch. my old project and current one both followed very similar logic as you are.

I used tesseract for my OCR and I was so close to having it perfect but the small fonts would get me, I havent gotten back to that part of the code as i just started the new version this morning. Just like spears replied I have been told that increasing the image size and sharpening / adding contrast will help. You can also convert to pure black and white but sometimes this was more hurtful because some of the spaces turn into part of the characters, Im sure theres a way to fix that im sort of new to processing images in c#.

heres a summary of what Im going to be doing on my new scraper(i use c#/.NET), FYI im no expert this is just what made sense to me Im always looking for better methods

1. capture screen
2. use imagemagick to sharpen the image( may also convert to pure B+W)
3. cut out the areas i need to use for OCR, bets, pot
4. Enlarge each section for better OCR results(I was told 400-500% is a good place to start)
5. save each section as a temp image to use for training in tesseract(once i have the accuracy dialed in I wont be saving temp images)
6. parse the cropped image with tesseract and compare results. if the results dont match continue to train tesseract . I have been using jTessBoxEditor for my training, its very easy to use.

again with out enlarging, sharpening, or converting to B/W i was able to get very good but not perfect(mostly with punctuation) accuracy in tesseract, Im hopeful the added steps will get me to where I want to be.

I hope any of this helps you out, even though this is really not that detailed, but if you want to compare notes let me know I'm always up for sharing information.

good luck.


edit: as far as the window size is concerned, I detect my windows by partial window title .Contains() in c#, then since the default size is known i resize the window to the default size so I am sure my coordinates are correct.

as far as the colors, AFAIK unless you are altering the colors sent to the screen red is red etc. Im pretty sure rgb 255,0,0 isnt interpreted any different between operating systems. also I use image comparison rather than pixel detection in those cases.

EX: Is the check box checked? take a cropped pic of the box checked, look within the coordinates of the box with some padding,does the box checked image exist within the coordinates? yes? box is checked. no? box is not checked.

I also use this logic for card detection.I have a saved image for each card value 2-10,J,Q,K,A as well as a saved image for each suit. I scan my hand then the community cards 1 by 1 to identify value and suit.

I feel like im rambling now, if i didnt make anything clear or if im wrong about something let me know


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Mon Apr 23, 2018 11:46 am 
Offline
New Member

Joined: Thu Apr 05, 2018 2:57 pm
Posts: 5
thanks for your post. incredibly helpful :)

But what i was wondering..is it possible to completely remove the OCR part. For example the stack size scraping. Just take screenshots of all the
10 digits. Store these as single image files (one for each digit). Then look for those images in the right places of the screenshot. Then reconstruct the stacksize based on the digits that are there. If this works (and is fast enough) you will have 100% accuracy and no need for OCR what so ever. I am going to try this approach first..if it doesnt work I will try Tesseract or OpenCV.

Will let you know how it went


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Mon Apr 23, 2018 5:09 pm 
Offline
Senior Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 204
This works on some site, but not on others. On some sites the digits will look differently, in a different context, for example when they are next to a different digit, so the 1 in 10 will look different than the 1 in 11.
The easiest sites are those, that don't use anti aliasing for the digits, the technique will work there.


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Mon Apr 23, 2018 8:45 pm 
Offline
New Member

Joined: Fri Apr 20, 2018 3:17 am
Posts: 3
Yes like HontoNiBaka said each site will be different in font style and text layout. Some will align the text left, some right, some will center the text and if it's centered the digits will move coordinates based on the length of the stack size.

Regarding Anti-Aliasing aka ClearType, I've been looking into possibly disabling AA while I capture my initial image then immediately re-enabling it. I'm not sure if this is a good idea or not but here are the two .reg insctructions to enable and disable them

Disable ClearType and antialiasing
Code:
[HKEY_CURRENT_USER\Control Panel\Desktop]
"FontSmoothing"="0"
"FontSmoothingType"=dword:00000000

Enable cleartype and antialiasing
Code:
[HKEY_CURRENT_USER\Control Panel\Desktop]
"FontSmoothing"="2"
"FontSmoothingType"=dword:00000002


to the best of my knowledge in c# it would be something like this
Code:
//Set Reg Key
RegistryKey key = Registry.CurrentUser.OpenSubKey(@"Control Panel\Desktop", true);
//Disable AA & ClearType
key.SetValue("FontSmoothing", 0, RegistryValueKind.String);
key.SetValue("FontSmoothingType", 0, RegistryValueKind.DWord);
//Grab Screenshot
YouScreenshotMethod();
//Enable AA & ClearType
key.SetValue("FontSmoothing", 2, RegistryValueKind.String);
key.SetValue("FontSmoothingType", 2, RegistryValueKind.DWord);


For getting centered text one character at a time, one idea I could think of to try and work around the locations would be to grab the rectangle for the entire stack size then run color detection pixel by pixel(pretty sure the best way to do this is to convert the image to a byte array) to get a grid of what pixels are text or background. Then parse through all the coordinates and find the first pixel that's the text color(the lowest X value that matches) and that would be where your first digit starts and you can make you first single character box(be sure to subtract the X value by one). Then knowing the size of each character you can keep making boxes, checking each one for the text color until you get a box that has no text color which means the last box was our last character.

Also you could possibly try to find the memory address for each stack using memory scanning software like cheat engine, but each time the program updates, the memory addresses will change and you would have to re-find them. but when using OCR once you have it working the locations more than likely wont change unless a major update occurs and they revamp their table layout.

Again not an expert, I just enjoy playing with this kind of stuff. Double check any information for yourself, good luck!


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 10 posts ] 

All times are UTC


Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Powered by phpBB® Forum Software © phpBB Group