-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to pass --psm to detect text as a single column of text? #58
Comments
Hello - There isn't currently an option to configure the page segmentation mode (psm). It would make sense to expose this configuration though. The API could look something like: ocrClient.loadImage(image, {
segmentationMode: mode,
}); Do you have an examples of images where the text columns are incorrectly recognized? |
@dheimoz this should currently work if you are using the engine API: engine.setVariable("tessedit_pageseg_mode", "4"); |
Thanks, I will give it y |
Hi, I'm not using the engine API because i want the option to use wasm and wasm-fallback from the webworker. If i make the change to send options to the engine in the ocrClient and send you the PR are you interested in this change? |
Yes, I'd be willing to accept that. |
@robertknight im seeing that embind doesn't support overloaded functions, i changed the
and this one to support LoadImage with only one argument:
in this page says it's doesn't support overloaded functions |
I can see a few options:
The API that lib.cpp exposes to JS does not expose Tesseract internal types/enums directly, but rather abstracts them into something that is more convenient to use from the JS side and allows Tesseract version changes to be handled entirely in lib.cpp. See for example the |
@robertknight i see that you are using the function |
@robertknight, here's the PR: #67 |
NVM i saw that you used this function to pass between types |
Hey @robertknight ,
Great work you have been doing here. It is performing excellent in Vue 3 with Vite.
I would like to send the parameter to tesseract engine --psm 4, in order to assume line as a single column.
Sometimes, the engine assumes the text as 2 or 3 columns and the text recognized does not make sense.
More info:
https://stackoverflow.com/questions/44619077/pytesseract-ocr-multiple-config-options
I was looking through the source code, I could not find how to pass that option.
Thanks.
The text was updated successfully, but these errors were encountered: