Trying to extract data from a PDF file #625
jakobdo
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment 1 reply
-
Hi @jakobdo, and glad to hear that you've found the library useful! For your particular example, I'd recommend getting the position of the PDF's missing final line this way, by identifying the bottom-most extremity of existing line_pos = max(r["bottom"] for r in page.rects)
table = page.extract_table({
"explicit_horizontal_lines": [ line_pos ]
}) Demonstrating via im = page.to_image()
line_pos = max(r["bottom"] for r in page.rects)
im.reset().debug_tablefinder({
"explicit_horizontal_lines": [ line_pos ]
}) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello, I have been using this library before and I am really amazed how "easy" it is to extract data. But!
When data is not easily extracted, I find it hard to tweak these settings and get the data I need.
Instead of getting a working solution, how will I be able to extract the data from this pdf: https://www.taggmbh.at/fileadmin/content/TAG-Website-Content-SM/2022_Maintenance_PROD_PDF.pdf
When using the debug-table-finder, the last row on page 1 is missing:
How do I tweak the table settings to get the last row/line?
I have read about using explicit_lines, but I need to get the lines for a start.
Beta Was this translation helpful? Give feedback.
All reactions