Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Q] OCR for proper images only #96

Open
jetnet opened this issue May 17, 2019 · 2 comments
Open

[Q] OCR for proper images only #96

jetnet opened this issue May 17, 2019 · 2 comments

Comments

@jetnet
Copy link

jetnet commented May 17, 2019

Hello Pascal,
is it possible to configure the Document Parser to apply the OCR processing for images from a given size / dimention? There are some metadata that could be checked:

tiff:ImageLength = [756] // pixels
Content-Length = [93191] //bytes

Thanks!

@essiembre
Copy link
Contributor

Unfortunately, out-of-the-box, OCR is only supported as part of parsing a file and it is typically while parsing that the image size is known. I am marking this as a feature request to allow specifying minimum/maximum dimensions for OCR.

In the meantime, you can look at creating your own parser or use an ExternalTransformer to perform OCR if applicable.

If you do not want to keep images that are not eligible for OCR, you could also write a IDocumentFilter that extracts the image dimensions and rejects those not matching what you want.

@jetnet
Copy link
Author

jetnet commented Jun 13, 2019

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants