Skip to content

Missing image/text from getText('dict') #902

Answered by JorjMcKie
mani2106 asked this question in Q&A
Discussion options

You must be logged in to vote

What seems to be text here, really is a plethora of drawings:
The PDF creator has synthesized every single letter as a drawing: a "D" is drawn as a left-closed right half circle, an "o" is a small circle, and so on, etc., pp.
His motivation? Who knows! Maybe make things difficult for you and me. If you extract the page's contents via page.read_contents() and store it in a file, you will get this:
cont.zip
The only way to get that text is via some OCR tool.

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
2 replies
@mani2106
Comment options

@JorjMcKie
Comment options

Answer selected by mani2106
Comment options

You must be logged in to vote
1 reply
@mani2106
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants