Extract text in a proper order from multiple line #1158
ervinwirth
started this conversation in
Ask for help with specific PDFs
Replies: 2 comments 2 replies
-
Can you share the (redacted) PDF? It will be difficult to suggest a solution without that. In general, though, the output is what I would expect, since |
Beta Was this translation helpful? Give feedback.
0 replies
-
Of course, |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have a PDF, where the Consumption address ('Felhasználási hely címe' in Hungarian) written in multiple lines; so the extraction - because of coordinate flow - is not in the right order:
The output text:
xxxxx ; 1881 Budapest Csontváry Kosztka
Felhasználási hely címe:
Tivadar utca xxxxx
The output I would like:
Felhasználási hely címe:
xxxxx ; 1881 Budapest Csontváry Kosztka
Tivadar utca xxxxx
I use this method:
text_all_pages += page.dedupe_chars().extract_text(x_tolerance=1)
Any idea to solve this?
Beta Was this translation helpful? Give feedback.
All reactions