Tesseract OCR tips — useful commands on usage CL tool
Here is my second post on OCR using Tesseract. This time I would like to share some commands I found useful when using Tesseract command line tool.
First of all main help on Tesseract command line can be displayed using arguments “ — help” and “ — help-extra”. “ — help” will show breif wersion of user manual:
Usage:
tesseract --help | --help-extra | --version
tesseract --list-langs
tesseract imagename outputbase [options...] [configfile...]OCR options:
-l LANG[+LANG] Specify language(s) used for OCR.
NOTE: These options must occur before any configfile.Single options:
--help Show this help message.
--help-extra Show extra help for advanced users.
--version Show version information.
--list-langs List available languages for tesseract engine.
“ — help-extra” will show more detailed version:
Usage:
tesseract --help | --help-extra | --help-psm | --help-oem | --version
tesseract --list-langs [--tessdata-dir PATH]
tesseract --print-parameters [options...] [configfile...]
tesseract imagename|imagelist|stdin outputbase|stdout [options...] [configfile...]OCR options:
--tessdata-dir PATH Specify the location of tessdata path.
--user-words PATH Specify the location of user words file.
--user-patterns PATH Specify the location of user patterns file.
-l LANG[+LANG] Specify language(s) used for OCR.
-c VAR=VALUE Set value for config variables.
Multiple -c arguments are allowed.
--psm NUM Specify page segmentation mode.
--oem NUM Specify OCR Engine mode.
NOTE: These options must occur before any configfile.Page segmentation modes:
...OCR Engine modes: (see https://github.com/tesseract-ocr/tesseract/wiki#linux)
...Single options:
-h, --help Show minimal help message.
--help-extra Show extra help for advanced users.
--help-psm Show page segmentation modes.
--help-oem Show OCR Engine modes.
-v, --version Show version information.
--list-langs List available languages for tesseract engine.
--print-parameters Print tesseract parameters.
I have skipped some sections to save space.
You can specify special parameters when using Tesseract OCR on image. There is an argument to show all the available parameters in Tesseract. This argument is “ — print-parameters”. Nice thing in this option is that the output displays default values for all parameters
Tesseract parameters:
editor_image_xpos 590 Editor image X Pos
editor_image_ypos 10 Editor image Y Pos
editor_image_menuheight 50 Add to image height for menu bar
editor_image_word_bb_color 7 Word bounding box colour
editor_image_blob_bb_color 4 Blob bounding box colour
That’s all for this time. Enjoy the OCR using Tesseract and see you soon.