You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Apr 15, 2024. It is now read-only.
I am facing the same issue with pdfminer. For small size pdfs it works well.
I tried to parse etree it gives the tag missing error.
lxml.etree.XMLSyntaxError: Premature end of data in tag pages line 2, line 1594542, column 1
side2k
pushed a commit
to side2k/pdfminer
that referenced
this issue
Jul 14, 2019
When converting the attached pdf file to xml using below code, there should be a tag at the end. That tag is omitted.
pdf.pdf
Last 5 lines of extracted xml:
</textgroup> </textgroup> </textgroup> </layout> </page>
This is happening with every single PDF. Problem doesn't show up when using pdf2txt.py
The text was updated successfully, but these errors were encountered: