What thresholding (binarization) algorithm is used in Tesseract OCR? -
i working on project needs accurate ocr results images rich background. comparing results of 2 ocrs (one of them tesseract) make choice. point results affected pre-processing step , image binarization. extracted binarized image of other ocr , passed tesseract enhanced results of tesseract 30-40%.
i have 2 questions , answers of me:
- what binarization algorithm tesseract use, , configurable?
- is there way extract binarized image of tesseract ocr can test other ocr it?
thanks in advance :)
i think have found answers questions:
1- binarization algorithm used otsu thresholding. can see here in line 179.
2- binarized image, method in tesseract api can called:
pix* thresholded = api->getthresholdedimage(); //thresholded must freed
Comments
Post a Comment