Skip to content

Making a compressed grayscale PDF version #769

Answered by JorjMcKie
bserg66 asked this question in Q&A
Discussion options

You must be logged in to vote

Can I extract font information from this pdf (bold for digits)?

Use page.getText("dict", flags=0)["blocks"]. This is a list of text (only, because of the flags value) block dictionaries. Each such dict contains a list of line dictionaries, which in turn contains a list of text "span" dictionaries. COnsult the TextPage section of the docu to see the details.
The important point is that a span contains text with completely identical font properties: name, fontsize, color, font characteristics (bold, italic, mono, ...) are all identical.So you should receive a span containing "46)" following by a span with text "Велосипедист...".
If this is not the case (like here), then creator coded the …

Replies: 12 comments 13 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@JorjMcKie
Comment options

Comment options

You must be logged in to vote
10 replies
@bserg66
Comment options

@JorjMcKie
Comment options

@bserg66
Comment options

@JorjMcKie
Comment options

@bserg66
Comment options

Comment options

You must be logged in to vote
2 replies
@JorjMcKie
Comment options

@bserg66
Comment options

Answer selected by JorjMcKie
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants
Converted from issue

This discussion was converted from issue #769 on December 16, 2020 13:07.