Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 2:38 pm

All times are UTC




Post new topic Reply to topic  [ 48 posts ]  Go to page 1, 2, 3  Next
Author Message
 Post subject: screen scraping
PostPosted: Sun Apr 15, 2018 3:50 pm 
Offline
Junior Member

Joined: Thu Apr 05, 2018 2:57 pm
Posts: 23
hi,

i am starting coding a scraper. I code on a linux machine so I cant use the OpenScrap project. I searched this forum how to start but i wanted
to check with you guys first if my approach is okay.

- cards + buttons + checkboxes: take sample images of these, and then make a screenshot and look for these images in certain regions
- names + stacks: use OCR for these. Tesseract and openocr seem to be the choices.

Is this the way to go scrape wise?

Another option is to run OpenScrape on a windows machine. Create a tablemap using that program. And then parse that table map in
my bot code.


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Sun Apr 15, 2018 5:30 pm 
Offline
Junior Member

Joined: Thu Apr 05, 2018 2:57 pm
Posts: 23
about scraping the stacks:

is it also possible to create images of all the digits used in the poker client. and then just search for those images and construct
the stacksize that way...or does this take too much time?


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Sun Apr 15, 2018 6:56 pm 
Offline
Junior Member

Joined: Thu Feb 22, 2018 2:09 pm
Posts: 10
Jannus wrote:
about scraping the stacks:

is it also possible to create images of all the digits used in the poker client. and then just search for those images and construct
the stacksize that way...or does this take too much time?


i think this is the wrong way because:
1) if you resize the room you can't use your palette of rgb
2) the computation is too much slow
3) is possible that the palette of screen colors generated by the windows/linux system are "different" for some motivation

For me the best choice is neural network with normalization of inputs parameters (in other words: OCR, but the standard "ocr" softwares have problem with smaller characters).


Last edited by Timmy1992 on Sun Apr 15, 2018 7:29 pm, edited 1 time in total.

Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Sun Apr 15, 2018 7:04 pm 
Offline
Junior Member

Joined: Thu Apr 05, 2018 2:57 pm
Posts: 23
but if you keep all the poker tables the same size..it could work couldnt it? (i dont understand your 3 point)

really dont know if the speed is going to be an issue.

I am running a linux host...with virtual box windows 7. then scrape the virtual windows 7 system.


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Sun Apr 15, 2018 7:35 pm 
Offline
Junior Member

Joined: Thu Feb 22, 2018 2:09 pm
Posts: 10
Quote:
but if you keep all the poker tables the same size..it could work couldnt it? (i dont understand your 3 point)


Sorry for my bad english.
For example: if you need to scarp stack size "$ 200.57" you will have many problems because the pixels will changes and is possible that the same numbers have different pixels in different "times".

Quote:
really dont know if the speed is going to be an issue.

I think so. But try...


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Tue Apr 17, 2018 4:35 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
I've not done it myself but I've read that Tesseract can be made to work quite well by scaling up small characters, increasing contrast, and converting to greyscale.


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Fri Apr 20, 2018 3:39 am 
Offline
New Member

Joined: Fri Apr 20, 2018 3:17 am
Posts: 3
Hi, actually made an account just to join in this discussion.

I was working on a project and lost my data so im starting from scratch. my old project and current one both followed very similar logic as you are.

I used tesseract for my OCR and I was so close to having it perfect but the small fonts would get me, I havent gotten back to that part of the code as i just started the new version this morning. Just like spears replied I have been told that increasing the image size and sharpening / adding contrast will help. You can also convert to pure black and white but sometimes this was more hurtful because some of the spaces turn into part of the characters, Im sure theres a way to fix that im sort of new to processing images in c#.

heres a summary of what Im going to be doing on my new scraper(i use c#/.NET), FYI im no expert this is just what made sense to me Im always looking for better methods

1. capture screen
2. use imagemagick to sharpen the image( may also convert to pure B+W)
3. cut out the areas i need to use for OCR, bets, pot
4. Enlarge each section for better OCR results(I was told 400-500% is a good place to start)
5. save each section as a temp image to use for training in tesseract(once i have the accuracy dialed in I wont be saving temp images)
6. parse the cropped image with tesseract and compare results. if the results dont match continue to train tesseract . I have been using jTessBoxEditor for my training, its very easy to use.

again with out enlarging, sharpening, or converting to B/W i was able to get very good but not perfect(mostly with punctuation) accuracy in tesseract, Im hopeful the added steps will get me to where I want to be.

I hope any of this helps you out, even though this is really not that detailed, but if you want to compare notes let me know I'm always up for sharing information.

good luck.


edit: as far as the window size is concerned, I detect my windows by partial window title .Contains() in c#, then since the default size is known i resize the window to the default size so I am sure my coordinates are correct.

as far as the colors, AFAIK unless you are altering the colors sent to the screen red is red etc. Im pretty sure rgb 255,0,0 isnt interpreted any different between operating systems. also I use image comparison rather than pixel detection in those cases.

EX: Is the check box checked? take a cropped pic of the box checked, look within the coordinates of the box with some padding,does the box checked image exist within the coordinates? yes? box is checked. no? box is not checked.

I also use this logic for card detection.I have a saved image for each card value 2-10,J,Q,K,A as well as a saved image for each suit. I scan my hand then the community cards 1 by 1 to identify value and suit.

I feel like im rambling now, if i didnt make anything clear or if im wrong about something let me know


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Mon Apr 23, 2018 11:46 am 
Offline
Junior Member

Joined: Thu Apr 05, 2018 2:57 pm
Posts: 23
thanks for your post. incredibly helpful :)

But what i was wondering..is it possible to completely remove the OCR part. For example the stack size scraping. Just take screenshots of all the
10 digits. Store these as single image files (one for each digit). Then look for those images in the right places of the screenshot. Then reconstruct the stacksize based on the digits that are there. If this works (and is fast enough) you will have 100% accuracy and no need for OCR what so ever. I am going to try this approach first..if it doesnt work I will try Tesseract or OpenCV.

Will let you know how it went


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Mon Apr 23, 2018 5:09 pm 
Offline
Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267
This works on some site, but not on others. On some sites the digits will look differently, in a different context, for example when they are next to a different digit, so the 1 in 10 will look different than the 1 in 11.
The easiest sites are those, that don't use anti aliasing for the digits, the technique will work there.


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Mon Apr 23, 2018 8:45 pm 
Offline
New Member

Joined: Fri Apr 20, 2018 3:17 am
Posts: 3
Yes like HontoNiBaka said each site will be different in font style and text layout. Some will align the text left, some right, some will center the text and if it's centered the digits will move coordinates based on the length of the stack size.

Regarding Anti-Aliasing aka ClearType, I've been looking into possibly disabling AA while I capture my initial image then immediately re-enabling it. I'm not sure if this is a good idea or not but here are the two .reg insctructions to enable and disable them

Disable ClearType and antialiasing
Code:
[HKEY_CURRENT_USER\Control Panel\Desktop]
"FontSmoothing"="0"
"FontSmoothingType"=dword:00000000

Enable cleartype and antialiasing
Code:
[HKEY_CURRENT_USER\Control Panel\Desktop]
"FontSmoothing"="2"
"FontSmoothingType"=dword:00000002


to the best of my knowledge in c# it would be something like this
Code:
//Set Reg Key
RegistryKey key = Registry.CurrentUser.OpenSubKey(@"Control Panel\Desktop", true);
//Disable AA & ClearType
key.SetValue("FontSmoothing", 0, RegistryValueKind.String);
key.SetValue("FontSmoothingType", 0, RegistryValueKind.DWord);
//Grab Screenshot
YouScreenshotMethod();
//Enable AA & ClearType
key.SetValue("FontSmoothing", 2, RegistryValueKind.String);
key.SetValue("FontSmoothingType", 2, RegistryValueKind.DWord);


For getting centered text one character at a time, one idea I could think of to try and work around the locations would be to grab the rectangle for the entire stack size then run color detection pixel by pixel(pretty sure the best way to do this is to convert the image to a byte array) to get a grid of what pixels are text or background. Then parse through all the coordinates and find the first pixel that's the text color(the lowest X value that matches) and that would be where your first digit starts and you can make you first single character box(be sure to subtract the X value by one). Then knowing the size of each character you can keep making boxes, checking each one for the text color until you get a box that has no text color which means the last box was our last character.

Also you could possibly try to find the memory address for each stack using memory scanning software like cheat engine, but each time the program updates, the memory addresses will change and you would have to re-find them. but when using OCR once you have it working the locations more than likely wont change unless a major update occurs and they revamp their table layout.

Again not an expert, I just enjoy playing with this kind of stuff. Double check any information for yourself, good luck!


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Sat Aug 04, 2018 1:22 pm 
Offline
Junior Member

Joined: Thu Apr 05, 2018 2:57 pm
Posts: 23
Hi,

I tried my approach (scan for images of the digits in the screenshot) but as some has pointed out that does not work. I will try
tesseract next.

I have did some google searching..but its not entirely clear to me.... you have tesseract, tess4j, jTessBoxEditor. How do all these programs fit in to the big picture? What do i need?

I have a training set containing of several files for each digit.

Thanks!

Ps. do you also use tesseract to recognize cards? Or just image searching those?


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Sat Aug 04, 2018 2:09 pm 
Offline
Senior Member
User avatar

Joined: Sun Mar 10, 2013 10:31 am
Posts: 139
To recognize poker table on the image you must do several steps.
1. Find your table position and it size.
2. Decide is it table you want to recognize. (may be you want not recognize all tables)
3. Decide how many peoples sit here and where. (2, 3, 4, 6, 9 players table or something else)
4. Depending on table size and table type (6players for example) you need to know where is your nicks, stacks, cards and ets.
And only then you need to recognize each of them.
Tesseract is wery slow and I dont know why to use it.
Show what you cannot recognize. (stack, nick card or ets)


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Sat Aug 04, 2018 2:24 pm 
Offline
Junior Member

Joined: Thu Apr 05, 2018 2:57 pm
Posts: 23
@nefton

Thanks for your reply. I understand all that. So i know for instance where (x,y coordinates) the stacksizes of all the players
are displayed on screen. If you dont use tesseract to recognize each digit of the stacksize. What method
do you use instead?

For instance say my own stack is 10.45 I assume your bot has to recognize each digit seperately. And then combine those
digits to create the stack size.

Any help appreciated :)

Thanks!


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Sun Aug 05, 2018 6:01 am 
Offline
Veteran Member

Joined: Wed Mar 20, 2013 1:43 am
Posts: 267
Jannus wrote:
Hi,

I tried my approach (scan for images of the digits in the screenshot) but as some has pointed out that does not work. I will try
tesseract next.

I have did some google searching..but its not entirely clear to me.... you have tesseract, tess4j, jTessBoxEditor. How do all these programs fit in to the big picture? What do i need?

I have a training set containing of several files for each digit.

Thanks!

Ps. do you also use tesseract to recognize cards? Or just image searching those?


Tesseract is the name of the project and the C library. Tess4j is a Java wrapper around the C library, you can think of it as Tesseract for Java, so if you use Java you will need tess4j. JTessBoxEditor apparently let's you train the Tesseract model with your own images, I have always only used the pretrained models which can be downloaded from the Tesseract site.


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Sun Aug 05, 2018 8:37 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
To get the best out of Tesseract check out old posts https://www.google.co.uk/search?q=site% ... e&ie=UTF-8


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Sun Aug 05, 2018 8:41 pm 
Offline
Junior Member

Joined: Thu Apr 05, 2018 2:57 pm
Posts: 23
HontoNiBaka wrote:
Jannus wrote:
Hi,

I tried my approach (scan for images of the digits in the screenshot) but as some has pointed out that does not work. I will try
tesseract next.

I have did some google searching..but its not entirely clear to me.... you have tesseract, tess4j, jTessBoxEditor. How do all these programs fit in to the big picture? What do i need?

I have a training set containing of several files for each digit.

Thanks!

Ps. do you also use tesseract to recognize cards? Or just image searching those?


Tesseract is the name of the project and the C library. Tess4j is a Java wrapper around the C library, you can think of it as Tesseract for Java, so if you use Java you will need tess4j. JTessBoxEditor apparently let's you train the Tesseract model with your own images, I have always only used the pretrained models which can be downloaded from the Tesseract site.



Thanks for explaining this. Very helpful.But nefton said that tesseract should not be used because its too slow. Whats a faster and better alternative according to you guys?


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Mon Aug 06, 2018 8:10 am 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
If you don't use Tesseract, then you you have to use some other off the shelf OCR solution or write your own. Writing your own will take longer. You might be able to make it faster by specialising on particular fonts and vocabularies, but there is quite a lot of work to do that. For you to have a realistic chance of completing this project trying out Tesseract should not be a big deal.


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Mon Aug 06, 2018 9:41 am 
Offline
Senior Member
User avatar

Joined: Sun Mar 10, 2013 10:31 am
Posts: 139
nefton wrote:
Show what you cannot recognize.


Jannus wrote:
So i know for instance where (x,y coordinates) the stacksizes of all the players
are displayed on screen. If you dont use tesseract to recognize each digit of the stacksize. What method
do you use instead? For instance say my own stack is 10.45


I ask show me example of stack. *.png image. (not jpeg)
And "stack" have not only x,y coordinates, also height and width


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Mon Aug 06, 2018 3:27 pm 
Offline
Junior Member

Joined: Thu Apr 05, 2018 2:57 pm
Posts: 23
i dont really understand why you need an image of the stack. but there it is. sorry if i do not understand what you want, new to scraping :lol:

https://ibb.co/n9X0mK


Top
 Profile  
 
 Post subject: Re: screen scraping
PostPosted: Mon Aug 06, 2018 4:36 pm 
Offline
Senior Member
User avatar

Joined: Sun Mar 10, 2013 10:31 am
Posts: 139
Perfect! Here is your stack.
It will be hard to recognize..
I will write my steps here.


Attachments:
example.png
example.png [ 4.47 KiB | Viewed 32878 times ]
Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 48 posts ]  Go to page 1, 2, 3  Next

All times are UTC


Who is online

Users browsing this forum: No registered users and 2 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
cron
Powered by phpBB® Forum Software © phpBB Group