-
Notifications
You must be signed in to change notification settings - Fork 101
Is it possible to measure progress of pandas.concat? #28
Comments
Might I suggest the following which should accomplish what pd.concat() does in one line, but will give you the ability to use this tool. Hope it helps. :)
|
@joshjacobson @milesgranger the real maintained repo is at https://github.com/tqdm/tqdm. |
The only problem with this solution is that appending to a DataFrame is very slow. In my case, when I have ~700,000 dataframes it takes over 6 hours while the standard pd.concat() does its job in 20 minutes. |
apart from the fact that this question should've been 1) asked on stackoverflow 2) maybe opened as a feature request on the maintained repo (https://github.com/tqdm/tqdm) as requested a year ago (#28 (comment)), there's 3) |
@casperdcl That doesn't actually measure the concat progress. |
I think that the only way to measure the progress is by using Dask as a workaround: import pandas as pd
import numpy as np
from tqdm import tqdm
import dask.dataframe as dd
n = 450000
maxa = 700
df1 = pd.DataFrame({'lkey': np.random.randint(0, maxa, n),'lvalue': np.random.randint(0,int(1e8),n)})
df2 = pd.DataFrame({'rkey': np.random.randint(0, maxa, n),'rvalue': np.random.randint(0, int(1e8),n)})
sd1 = dd.from_pandas(df1, npartitions=3)
sd2 = dd.from_pandas(df2, npartitions=3)
from tqdm.dask import TqdmCallback
from dask.diagnostics import ProgressBar
ProgressBar().register()
with TqdmCallback(desc="compute"):
sd1.merge(sd2, left_on='lkey', right_on='rkey').compute() Source: Progress Bar for Merge Or Concat Operation With tqdm in Pandas |
I have 25,000 pandas dataframes, each with ~300 columns, each dataframe comprised of just 1 data row with column labels, with ~50% of each dataframe's column labels unique and the other 50% of column labels the same as those of some of the other dataframes.
I have a list l containing these 25,000 dataframes, such that l[i] is a dataframe.
I need to combine these, which pd.concat(l) allows me to do, as I'd like to.
That said, the operation is lasting a long time. I'd like to estimate, know, or measure how long this is going to take. I'm hoping tqdm allows for a progress bar, but I don't see how to implement. It looks like support for this isn't there?
The text was updated successfully, but these errors were encountered: