Repeat characters and unable to extract tables #662
zmingxie
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment 6 replies
-
Hi @zmingxie, and very interesting! I examined p2_fixed = p2.filter(lambda obj: obj.get("size", 0) < 33.84)
im2_fixed = p2_fixed.to_image()
im2_fixed.reset().debug_tablefinder() Does that give you the results you expect? |
Beta Was this translation helpful? Give feedback.
6 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi all,
I've been using this pdfplumber tool to extract data from the TREB monthly reports for personal use. It's been working great for the past reports (e.g. April 2022), but I'm having issues with its May 2022 report
After some diggings, it looks like pdfplumber is having problems extracting words and table data from page 3. Here is the code snippet from my troubleshooting:
May report word racts:
April report word racts:
May report table:
April report table:
I can't figure out what changed that is causing this extraction issue. Could anyone give me some hints? Thanks!
Beta Was this translation helpful? Give feedback.
All reactions