-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up feature extraction #100
Comments
To demonstrate there's quite some room for improvement: import pandas as pd
import audb
import audinterface
import audiofile
db = audb.load(
'emodb',
version='1.3.0',
format='wav',
sampling_rate=16000,
mixdown=True,
)
files = db.files
def process_func(x, sr):
return [x.mean(), x.std()]
# slow
feature = audinterface.Feature(
['mean', 'std'],
process_func=process_func,
)
t = time.time()
df = feature.process_files(files)
print(time.time() - t)
# fast
t = time.time()
data = np.empty(
(len(files), 2),
dtype=np.float32,
)
for idx, file in enumerate(files):
signal, sampling_rate = audiofile.read(file)
data[idx, :] = process_func(
signal,
sampling_rate,
)
df_fast = pd.DataFrame(
data,
index=df.index,
columns=df.columns,
)
print(time.time() - t)
pd.testing.assert_frame_equal(df, df_fast)
|
I guess the idea for a solution is to avoid this step? |
Yes, especially the concatenation of the DataFrames seems awefully slow. So the idea would be to create a matrix of the expected size (samples x features) and directly assign the extracted features. This is of course only possible if no sliding window is selected as otherwise we cannot know the shape of the final matrix. |
I guess not, the comparison is also not 100% fair as in the second case we rely on the index created by |
I created #106 to track |
When extracting features with
Feature
we currently rely onProcess
under the hood, which returns apd.Series
with feature vectors. We then convert these to a list and afterwards callpd.concat(list)
to combine them to a single matrix. The last step can take quite long (sometimes as long or longer as the feature extraction itself). We could speed this up if we pre-allocate a matrix beforehand and directly assign the values. At least when not processing with a sliding window this should be possible.The text was updated successfully, but these errors were encountered: