-
I have a pdf with 10 pages, and some pages have one table and some pages have multiple tables. When I am using "extract table()", it is extracting the largest table on each page but I want to extract all the tables and append them into a single dataframe. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Hi @88arvin, thanks for your interest in this library. If I'm understanding your question correctly, the answer has less to do with core = [] # Start with a blank list
for page in pdf.pages: # Iterate over the pages
core += page.extract_tables() # In Python, += extends a list
df = pd.DataFrame(core) # Assuming that, by "dataframe," you mean a pandas DataFrame |
Beta Was this translation helpful? Give feedback.
-
Thank you for responding so quickly. I have tried your suggested code; however, the data is still not coming in the form of a dataframe. |
Beta Was this translation helpful? Give feedback.
Thank you for responding so quickly. I have tried your suggested code; however, the data is still not coming in the form of a dataframe.
https://postimg.cc/3W86SH0d
If I use extract_table(), then I am able to iterate through all the pages and able to append the data into a single dataframe, but only able to extract the largest table from each page.