[Q] OCR for proper images only #96

jetnet · 2019-05-17T11:22:29Z

Hello Pascal,
is it possible to configure the Document Parser to apply the OCR processing for images from a given size / dimention? There are some metadata that could be checked:

tiff:ImageLength = [756] // pixels
Content-Length = [93191] //bytes

Thanks!

The text was updated successfully, but these errors were encountered:

essiembre · 2019-05-27T02:09:17Z

Unfortunately, out-of-the-box, OCR is only supported as part of parsing a file and it is typically while parsing that the image size is known. I am marking this as a feature request to allow specifying minimum/maximum dimensions for OCR.

In the meantime, you can look at creating your own parser or use an ExternalTransformer to perform OCR if applicable.

If you do not want to keep images that are not eligible for OCR, you could also write a IDocumentFilter that extracts the image dimensions and rejects those not matching what you want.

jetnet · 2019-06-13T17:54:57Z

Thank you!

essiembre added the feature-request label May 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Q] OCR for proper images only #96

[Q] OCR for proper images only #96

jetnet commented May 17, 2019

essiembre commented May 27, 2019

jetnet commented Jun 13, 2019

[Q] OCR for proper images only #96

[Q] OCR for proper images only #96

Comments

jetnet commented May 17, 2019

essiembre commented May 27, 2019

jetnet commented Jun 13, 2019