You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This pdf has a complex 2-table on a single page in page 2. Right now the best result is setting the algorithm to header-position but it seems that one still needs to extend it to accommodate the odd table format.
importrowstables=rows.import_from_pdf(
"Ibama.pdf",
page_numbers=[2],
algorithm="header-position", # `rects-boundaries` does not work and `y-groups` mixes header with entriesbackend="pymupdf", # `pymupdf` yields the best results
)
row=tables[0]._asdict().keys()
dict_keys(
[
'praiadocarroquebrado',
'barradesantoantonio',
'field_2019_09_18',
'al',
'field_09203008s_35265532w_2019_10_21',
'oleada_manchas',
'barradoriocamaratuba',
'mataraca',
'field_2019_09_07',
'pb',
'field_06353346s_34575812w_2019_10_04',
'oleo_naoobservado',
'name',
'municipio',
'data_avist_estado_latitude',
'longitude',
'data_revis_status',
'praiadocabobranco',
'joaopessoa',
'field_2019_09_01',
'pb_2',
'field_07084334s_34483384w_2019_10_01',
'oleo_naoobservado_2'
]
)
I'm kind of jealous of R for the first time b/c this operation is a 1-liner with tabulizer ;-p
I'll look into extending header-position but if that is an exercise that should always be on the user side feel free to just close this issue.
The text was updated successfully, but these errors were encountered:
This pdf has a complex 2-table on a single page in page 2. Right now the best result is setting the algorithm to
header-position
but it seems that one still needs to extend it to accommodate the odd table format.I'm kind of jealous of
R
for the first time b/c this operation is a 1-liner withtabulizer
;-pI'll look into extending
header-position
but if that is an exercise that should always be on the user side feel free to just close this issue.The text was updated successfully, but these errors were encountered: