'Flattening' Annotation text into the searchable of a document #1181
Unanswered
petertennis
asked this question in
Looking for help
Replies: 2 comments 2 replies
-
Here is an example for reference - notice how the table in the bottom of the document has lots of Annotations but you cannot search the text easily via Ctrl-F etc |
Beta Was this translation helpful? Give feedback.
0 replies
-
Hm, I understand.
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I am processing a diverse set of architectural documents with the aim of making them more searchable in PDF viewers - some have native PDF text, some don't, often they have partial native text. So I OCR the page, and add the text elements which are not already present to the page (I check for overlaps etc to accomplish this)
Now I notice that some of my documents have Annotations that effectively contain the text for a particular part of the page. The quality of this content is often better than my OCR results and the Annotation bounding boxes seem to be in the right place to line up with the visual text.
So.......I would like to push this text into the natively searchable layer. Ideally, this would be accomplished without removing them as Annotations. Is there any function to do this automatically in PyMuPDF? Or any other thoughts on what I am trying to accomplish?
Thanks for reading!
Beta Was this translation helpful? Give feedback.
All reactions