pdf plumb returning "none" #450
Replies: 1 comment 1 reply
-
Hi @gmwu843 Appreciate your interest in the library and wish you well in your Python journey. When dealing with text extraction related issues, the first step would be to check if The reason could be that font information/mapping/cmap is missing in the PDF. When viewing in a PDF reader, the text is copyable because the reader might be substituting the missing mappings with a default font. In order to extract the text correctly, you can repair the PDF using Ghostscript like so gs -o output.pdf -sDEVICE=pdfwrite input.pdf When using the repaired PDF, you'll be able to extract the text properly. Attaching the repaired PDF here for your reference. |
Beta Was this translation helpful? Give feedback.
-
I'm trying to use pdfplumber to extract text from a pdf, but I'm getting a return of "none" for certain pages. For other pages, the below code works fine. I suspect this has something to do with the way the pdf is set up, and I'm wondering if there is an easy work around. my code is below and sample pdf is attached
import pdfplumber
with pdfplumber.open(test_pdf) as pdf:
page = pdf.pages[0]
text = page.extract_text()
print(text)
test_pdf.pdf
I'm pretty new to coding, outside of a few python classes in college.
Beta Was this translation helpful? Give feedback.
All reactions