Poker-AI.org

Poker AI and Botting Discussion Forum
It is currently Mon Nov 13, 2023 3:46 pm

All times are UTC




Post new topic Reply to topic  [ 9 posts ] 
Author Message
 Post subject: Rule Based OCR
PostPosted: Tue Apr 07, 2015 8:26 am 
Offline
Junior Member

Joined: Wed Aug 27, 2014 12:15 pm
Posts: 12
hello, right now i am facing the problem of developing a robust OCR system. I have read and tried many of the widely known approaches, but in the end they all are not very robust. My idea is now that i limit my OCR to numbers, and try to write custom rules for them. As far as I can see this will become at least a pain in the ass, but eventually even impossible, especially because i did not find anything about this approach, which i hope is for the reason that most OCR aims to recognize the entire alphabet, which would make my approach unfeasible.
What do you think about this?


Top
 Profile  
 
 Post subject: Re: Rule Based OCR
PostPosted: Tue Apr 07, 2015 3:57 pm 
Offline
Veteran Member

Joined: Mon Mar 04, 2013 9:40 pm
Posts: 269
The tessaract ocr engine has a built in rule for only recognizing numbers if that is your problem. You do have to create a few rules unless you train it specifically (I avoided that as its too complicated). If your still having problems my advice is to scale the image 300% before running it thru the OCR engine. This will help quite a bit. The other thing I have noticed is that sometimes if you distort the image say 80% horizontally it reads it better. No idea why.

OCR can be a pain. Mine just all of a sudden has problems with the number 8. It thinks its a 3 but not all the time which makes it a major headache...


Top
 Profile  
 
 Post subject: Re: Rule Based OCR
PostPosted: Tue Apr 07, 2015 5:49 pm 
Offline
Junior Member

Joined: Wed Aug 27, 2014 12:15 pm
Posts: 12
hey, thx for your reply, i tried tesseract, also with scaling, the only numbers rule and manually making the image binary, however it was still regularly confusing 3 and 8, and some other stuff. Today i was playing around with the trial of Abbyy and got better results, however its impossible to incorporate in my bot. Probably got to go with some less robust methods for now.


Top
 Profile  
 
 Post subject: Re: Rule Based OCR
PostPosted: Wed Apr 08, 2015 8:26 am 
Offline
Junior Member

Joined: Thu Nov 14, 2013 2:56 pm
Posts: 12
This PhD thesis might be of interest: Recognition of ultra low resolution, anti-aliased text with small font sizes

Juk :)


Top
 Profile  
 
 Post subject: Re: Rule Based OCR
PostPosted: Wed Apr 08, 2015 1:02 pm 
Offline
Site Admin
User avatar

Joined: Sun Feb 24, 2013 9:39 pm
Posts: 642
Interesting. Have you come over to the dark side Juk?


Top
 Profile  
 
 Post subject: Re: Rule Based OCR
PostPosted: Wed Apr 08, 2015 3:35 pm 
Offline
Junior Member

Joined: Thu Nov 14, 2013 2:56 pm
Posts: 12
spears wrote:
Interesting. Have you come over to the dark side Juk?

LOL no - I don't even play poker any more, but do still maintain an interest in the AI side of things.

As for that thesis: I just remembered seeing it posted somewhere (possibly even here?) and thought it might be of interest.

Juk :)


Top
 Profile  
 
 Post subject: Re: Rule Based OCR
PostPosted: Wed Jul 22, 2015 4:59 pm 
Offline
Veteran Member

Joined: Mon Mar 04, 2013 9:40 pm
Posts: 269
I revamped my OCR after a long overdo rewrite and I am finally getting 100% accuracy with the Tessaract system on numbers. The first thing I had to do was init tessaract so it only recognizes numbers, commas and periods via a whitelist. Tessaract was designed for black letters on white so in order to do that you have to do a few things before running it thru the engine:

1. Convert to Greyscale
2 Invert the image (ie make black areas white and vice versa)
3. Apply a Threshold filter (very high like over 200)
4. Scale the image 100%

That solved all my issues with 5 and 8 for good. Tessaract does work if you give it a good image to work with.


Top
 Profile  
 
 Post subject: Re: Rule Based OCR
PostPosted: Thu Jul 23, 2015 6:38 am 
Offline
Veteran Member

Joined: Thu Feb 28, 2013 2:39 am
Posts: 437
If you can find the character edges why not use hashes? If they're anti-aliased, you can train your own NNs with Encog or just do a difference map if you want to keep it simple (combined with a hash cache). Tesseract is ok, but bulky and slow.


Top
 Profile  
 
 Post subject: Re: Rule Based OCR
PostPosted: Sun Jul 26, 2015 2:56 pm 
Offline
Veteran Member

Joined: Mon Mar 04, 2013 9:40 pm
Posts: 269
Nasher wrote:
If you can find the character edges why not use hashes? If they're anti-aliased, you can train your own NNs with Encog or just do a difference map if you want to keep it simple (combined with a hash cache). Tesseract is ok, but bulky and slow.


Hmm..Hashing is an idea I never though about before. That would definitely but as you know that NN stuff is way above my head..


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 9 posts ] 

All times are UTC


Who is online

Users browsing this forum: Bing [Bot] and 1 guest


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
cron
Powered by phpBB® Forum Software © phpBB Group