Replies: 1 comment
-
https://pypdf2.readthedocs.io/en/latest/user/extract-text.html
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Some PDF files might be scanned documents, consist of images instead of texts. Some PDF documents might have images beside texts. For both situations, we lose some information.
extract_text() function of PyPDF2 can be extended to process the images automatically as well as usual texts. It would make our life easier. Though I don't know the backend. Is that possible to implement?
Beta Was this translation helpful? Give feedback.
All reactions