What thresholding (binarization) algorithm is used in Tesseract OCR? -


i working on project needs accurate ocr results images rich background. comparing results of 2 ocrs (one of them tesseract) make choice. point results affected pre-processing step , image binarization. extracted binarized image of other ocr , passed tesseract enhanced results of tesseract 30-40%.

i have 2 questions , answers of me:

  1. what binarization algorithm tesseract use, , configurable?
  2. is there way extract binarized image of tesseract ocr can test other ocr it?

thanks in advance :)

i think have found answers questions:

1- binarization algorithm used otsu thresholding. can see here in line 179.

2- binarized image, method in tesseract api can called:

pix* thresholded = api->getthresholdedimage(); //thresholded must freed 

Comments

Popular posts from this blog

cakephp - simple blog with croogo -

How to group boxplot outliers in gnuplot -

bash - Performing variable substitution in a string -