Space is not present between words in extracted table dataframe. #1194
AyushaKadi
started this conversation in
Ask for help with specific PDFs
Replies: 1 comment 1 reply
-
Hi @AyushaKadi, have you tried adjusting the |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi I am using the pdfplumber library and 'extract_vertical_lines' and 'extract_horizontal_lines' as table extract features to capture transactions spanning between multiple lines. The tables are perfectly being captured and so are the transactions.
The only issue is spacing between the words captured
I am unable to debug why the words which have space in the pdf are not generated as same in the text or dataframe.
Example- PointOf SaleWithdrawal PUBLIXSUPERMA 6901TAFT ST HOLLYWOOD FLUS
Expected- Point Of Sale Withdrawal PUBLIX SUPER MA 6901 TAFT ST HOLLYWOOD FLUS
Can this be resolved by any way? I have tried with tolerances and explicit codes for managing tolerances and space too.
v
Beta Was this translation helpful? Give feedback.
All reactions