Pdfplumber misses first column and last row for all tables within a schematic #544

PabloABCD · 2021-11-22T10:27:25Z

PabloABCD
Nov 22, 2021

PdfPlumber does not extract the first column and the last row of every table in document. I have tried to tweak several configuration parameters in table_settings variable, unluckily I haven't been able to achieve any better result (in my case, the rest of the chars in the schematic is considered as a table in case I use "text" instead of "lines").

Any help with this? I am using Python 3.9.8 and the pdf for testing can be found in: schematic.pdf

The source code for extracting first page is:

import pdfplumber
pdf_file = "C:/Users/fariza.CITEF/Desktop/OUIGO/Schematic.pdf"
tables=[]
with pdfplumber.open(pdf_file) as pdf:
    pages = pdf.pages
    tbl = pages[0].extract_tables()
    
    print(f'{tbl}')

Thank you a lot for your help and your impressive library.

samkit-jain · 2021-11-22T14:31:25Z

samkit-jain
Nov 22, 2021
Collaborator

Hi @PabloABCD Appreciate your interest in the library. Using the following table settings worked for me

{
    "vertical_strategy": "explicit",
    "horizontal_strategy": "explicit",
    "explicit_vertical_lines": page.curves+page.edges,
    "explicit_horizontal_lines": page.curves+page.edges,
    "intersection_tolerance": 15,
}

['(cid:47)(cid:44)(cid:54)(cid:55)(cid:36)(cid:3)(cid:39)(cid:40)(cid:3)(cid:39)(cid:40)(cid:54)(cid:57)(cid:203)(cid:50)(cid:54)', None, None, None, None, None]
['(cid:49)(cid:158)', 'PK', 'VEL.', '(cid:49)(cid:158)', 'PK', 'VEL.']
['A64', '3+100', '100 Km/h', 'A66', '3+365', '100 Km/h']
['A65', '3+189', '100 Km/h', 'S2MSU2', '5+884', '100 Km/h']
['A67', '3+363', '100 Km/h', 'S4MSU1', '6+052', '100 Km/h']
['', '', '', '', '', '']

['(cid:54)(cid:40)(cid:102)(cid:36)(cid:47)(cid:40)(cid:54)', None, None, None]
['NOMBRE', 'PK', 'NOMBRE', 'PK']
['E3', '3+720', 'EMSUF2', '5+766']
['E4', '3+784', 'EMSUF1', '5+766']
['B004F2', '4+295', 'SMSUM2', '6+185']
['B004F1', '4+295', 'SMSUM1', '6+188']
['', '', '', '']

1 reply

PabloABCD Nov 22, 2021
Author

That was great.

Thank you for your support and your awesome library!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pdfplumber misses first column and last row for all tables within a schematic #544

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Pdfplumber misses first column and last row for all tables within a schematic #544

PabloABCD Nov 22, 2021

Replies: 1 comment · 1 reply

samkit-jain Nov 22, 2021 Collaborator

PabloABCD Nov 22, 2021 Author

PabloABCD
Nov 22, 2021

Replies: 1 comment 1 reply

samkit-jain
Nov 22, 2021
Collaborator

PabloABCD Nov 22, 2021
Author