What is the suggested approach for ocr pdf? #2099

nissansz · 2022-12-02T12:58:50Z

nissansz
Dec 2, 2022

What is the suggested approach for ocr pdf?

JorjMcKie · 2022-12-02T17:21:29Z

JorjMcKie
Dec 2, 2022
Maintainer

Can you please be more specific? Which tool to use? How to decide if at all to OCR? ...
You may want to read this as a start: https://medium.com/p/e455465acb03, also here: https://artifex.com/blog.
Another good source to contact others is PyMuPDF's Discord channel.

0 replies

nissansz · 2022-12-02T20:28:45Z

nissansz
Dec 2, 2022
Author

Just convert image pdf to text bi-layer pdf with good layout

0 replies

JorjMcKie · 2022-12-02T20:41:21Z

JorjMcKie
Dec 2, 2022
Maintainer

Try OCRmyPDF. If you want an integration within your PyMuPDF script use the integrated Tesseract access. There are example scripts in the utilities repo.

0 replies

nissansz · 2022-12-02T21:03:27Z

nissansz
Dec 2, 2022
Author

Tesseract is hard to train and is low in accuracy. I want to change to paddleocr, any method?
Or change to other json based ocr result to format pdf finally.

1 reply

JorjMcKie Dec 2, 2022
Maintainer

Tesseract is hard to train and is low in accuracy. I want to change to paddleocr, any method? Or change to other json based ocr result to format pdf finally.

You obviously know better than me, so please communicate your findings in this channel.

nissansz · 2022-12-02T22:13:08Z

nissansz
Dec 2, 2022
Author

ok

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What is the suggested approach for ocr pdf? #2099

{{title}}

Replies: 5 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

What is the suggested approach for ocr pdf? #2099

nissansz Dec 2, 2022

Replies: 5 comments · 1 reply

JorjMcKie Dec 2, 2022 Maintainer

nissansz Dec 2, 2022 Author

JorjMcKie Dec 2, 2022 Maintainer

nissansz Dec 2, 2022 Author

JorjMcKie Dec 2, 2022 Maintainer

nissansz Dec 2, 2022 Author

nissansz
Dec 2, 2022

Replies: 5 comments 1 reply

JorjMcKie
Dec 2, 2022
Maintainer

nissansz
Dec 2, 2022
Author

JorjMcKie
Dec 2, 2022
Maintainer

nissansz
Dec 2, 2022
Author

JorjMcKie Dec 2, 2022
Maintainer

nissansz
Dec 2, 2022
Author