No tables detected on LibreofficeDraw generated pdf #418
Replies: 1 comment 1 reply
-
Hi @carl-krikorian, and thanks for your interest in this library. This issue stems from how In the meantime, you should be able to extract the tables this way: import pdfplumber
pdf = pdfplumber.open('./pdfs/submit.pdf')
page = pdf.pages[0]
tables = page.extract_tables({
"explicit_vertical_lines": page.curves,
"explicit_horizontal_lines": page.curves,
})
print(tables) |
Beta Was this translation helpful? Give feedback.
-
The Problem
I created an edited version of a pdf file while keeping the same format and tried extracting the tables but the extraction failed completely. What is strange is that the extraction from the original pdf worked perfectly fine with the same code. Also, Tabula was able to extract the tables from the edited version with no problem. I suspect the issue was with how the file was generated or the fact that the outlines of the page are being detected as seen under the screenshots. I used LibreOffice Draw to export it as PDF. Just want to know if there are any other fixes/ reasons I'm missing.
Code to reproduce the problem
import pdfplumber
pdf = pdfplumber.open('./pdfs/submit.pdf')
page = pdf.pages[0]
tables = page.extract_tables()
print(tables)
PDF file
submit.pdf
Screenshots
The curves seem to also be detected properly but strangely always with the outline of the page, even after cropping (this may also be the problem)
Environment
Beta Was this translation helpful? Give feedback.
All reactions