combine multiple tables #786

88arvin · 2023-01-10T08:10:31Z

88arvin
Jan 10, 2023

I have a pdf with 10 pages, and some pages have one table and some pages have multiple tables. When I am using "extract table()", it is extracting the largest table on each page but I want to extract all the tables and append them into a single dataframe.
How to append every table into a single dataframe using "extract_tables()".

Answered by 88arvin

Jan 10, 2023

Thank you for responding so quickly. I have tried your suggested code; however, the data is still not coming in the form of a dataframe.
https://postimg.cc/3W86SH0d
If I use extract_table(), then I am able to iterate through all the pages and able to append the data into a single dataframe, but only able to extract the largest table from each page.

View full answer

jsvine · 2023-01-10T08:45:22Z

jsvine
Jan 10, 2023
Maintainer

Hi @88arvin, thanks for your interest in this library. If I'm understanding your question correctly, the answer has less to do with pdfplumber and is more generic to Python (or, since you used the phrase "dataframe", pandas). Here's what I'd suggest:

core = []  # Start with a blank list
for page in pdf.pages:  # Iterate over the pages
  core += page.extract_tables()  # In Python, += extends a list
df = pd.DataFrame(core)  # Assuming that, by "dataframe," you mean a pandas DataFrame

0 replies

88arvin · 2023-01-10T10:50:24Z

88arvin
Jan 10, 2023
Author

Thank you for responding so quickly. I have tried your suggested code; however, the data is still not coming in the form of a dataframe.
https://postimg.cc/3W86SH0d
If I use extract_table(), then I am able to iterate through all the pages and able to append the data into a single dataframe, but only able to extract the largest table from each page.

3 replies

jsvine Jan 10, 2023
Maintainer

Ah, I see what you mean. Try this:

core = []  # Start with a blank list
for page in pdf.pages:  # Iterate over the pages
  for table in page.extract_tables():
    core +=  table # In Python, += extends a list
df = pd.DataFrame(core)  # Assuming that, by "dataframe," you mean a pandas DataFrame

88arvin Jan 10, 2023
Author

Oh yes! This worked well, but it extracted not only the tables but also all the other data.

88arvin Jan 12, 2023
Author

How to only extract tables?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

combine multiple tables #786

{{title}}

Replies: 2 comments 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

combine multiple tables #786

88arvin Jan 10, 2023

Replies: 2 comments · 3 replies

jsvine Jan 10, 2023 Maintainer

88arvin Jan 10, 2023 Author

jsvine Jan 10, 2023 Maintainer

88arvin Jan 10, 2023 Author

88arvin Jan 12, 2023 Author

88arvin
Jan 10, 2023

Replies: 2 comments 3 replies

jsvine
Jan 10, 2023
Maintainer

88arvin
Jan 10, 2023
Author

jsvine Jan 10, 2023
Maintainer

88arvin Jan 10, 2023
Author

88arvin Jan 12, 2023
Author