Poker-AI.org
http://poker-ai.org/phpbb/

Rule Based OCR
http://poker-ai.org/phpbb/viewtopic.php?f=26&t=2897
Page 1 of 1

Author:  notSoEasy [ Tue Apr 07, 2015 8:26 am ]
Post subject:  Rule Based OCR

hello, right now i am facing the problem of developing a robust OCR system. I have read and tried many of the widely known approaches, but in the end they all are not very robust. My idea is now that i limit my OCR to numbers, and try to write custom rules for them. As far as I can see this will become at least a pain in the ass, but eventually even impossible, especially because i did not find anything about this approach, which i hope is for the reason that most OCR aims to recognize the entire alphabet, which would make my approach unfeasible.
What do you think about this?

Author:  shalako [ Tue Apr 07, 2015 3:57 pm ]
Post subject:  Re: Rule Based OCR

The tessaract ocr engine has a built in rule for only recognizing numbers if that is your problem. You do have to create a few rules unless you train it specifically (I avoided that as its too complicated). If your still having problems my advice is to scale the image 300% before running it thru the OCR engine. This will help quite a bit. The other thing I have noticed is that sometimes if you distort the image say 80% horizontally it reads it better. No idea why.

OCR can be a pain. Mine just all of a sudden has problems with the number 8. It thinks its a 3 but not all the time which makes it a major headache...

Author:  notSoEasy [ Tue Apr 07, 2015 5:49 pm ]
Post subject:  Re: Rule Based OCR

hey, thx for your reply, i tried tesseract, also with scaling, the only numbers rule and manually making the image binary, however it was still regularly confusing 3 and 8, and some other stuff. Today i was playing around with the trial of Abbyy and got better results, however its impossible to incorporate in my bot. Probably got to go with some less robust methods for now.

Author:  jukofyork [ Wed Apr 08, 2015 8:26 am ]
Post subject:  Re: Rule Based OCR

This PhD thesis might be of interest: Recognition of ultra low resolution, anti-aliased text with small font sizes

Juk :)

Author:  spears [ Wed Apr 08, 2015 1:02 pm ]
Post subject:  Re: Rule Based OCR

Interesting. Have you come over to the dark side Juk?

Author:  jukofyork [ Wed Apr 08, 2015 3:35 pm ]
Post subject:  Re: Rule Based OCR

spears wrote:
Interesting. Have you come over to the dark side Juk?

LOL no - I don't even play poker any more, but do still maintain an interest in the AI side of things.

As for that thesis: I just remembered seeing it posted somewhere (possibly even here?) and thought it might be of interest.

Juk :)

Author:  shalako [ Wed Jul 22, 2015 4:59 pm ]
Post subject:  Re: Rule Based OCR

I revamped my OCR after a long overdo rewrite and I am finally getting 100% accuracy with the Tessaract system on numbers. The first thing I had to do was init tessaract so it only recognizes numbers, commas and periods via a whitelist. Tessaract was designed for black letters on white so in order to do that you have to do a few things before running it thru the engine:

1. Convert to Greyscale
2 Invert the image (ie make black areas white and vice versa)
3. Apply a Threshold filter (very high like over 200)
4. Scale the image 100%

That solved all my issues with 5 and 8 for good. Tessaract does work if you give it a good image to work with.

Author:  cantina [ Thu Jul 23, 2015 6:38 am ]
Post subject:  Re: Rule Based OCR

If you can find the character edges why not use hashes? If they're anti-aliased, you can train your own NNs with Encog or just do a difference map if you want to keep it simple (combined with a hash cache). Tesseract is ok, but bulky and slow.

Author:  shalako [ Sun Jul 26, 2015 2:56 pm ]
Post subject:  Re: Rule Based OCR

Nasher wrote:
If you can find the character edges why not use hashes? If they're anti-aliased, you can train your own NNs with Encog or just do a difference map if you want to keep it simple (combined with a hash cache). Tesseract is ok, but bulky and slow.


Hmm..Hashing is an idea I never though about before. That would definitely but as you know that NN stuff is way above my head..

Page 1 of 1 All times are UTC
Powered by phpBB® Forum Software © phpBB Group
http://www.phpbb.com/