Speed up Feature.process_*() #103

frankenjoe · 2023-03-10T08:02:45Z

Relates to #100

Speeds up Feature.process_*() by avoiding calls to pd.concat() on a lot of small pd.DataFrame objects. Instead, we now concatenate lists / numpy arrays and create the final pd.DataFrame from those.

Benchmark

import time
import audb
import audinterface


db = audb.load(
    'emodb',
    version='1.3.0',
    format='wav',
    sampling_rate=16000,
    mixdown=True,
)

def process_func(x, sr):
    return [x.mean(), x.std()]

feature = audinterface.Feature(
    ['mean', 'std'],
    process_func=process_func,
)

t = time.time()
feature.process_files(db.files)
print(time.time() - t)

main branch: ~8.5 s
this branch: ~4.5 s

frankenjoe · 2023-03-10T08:04:20Z

Together with #102 we should be down to ~0.35s, which is close to what we target in #100.

codecov · 2023-03-10T08:05:37Z

Codecov Report

Merging #103 (c11ca68) into main (7ad15b7) will not change coverage.
The diff coverage is 100.0%.

Impacted Files	Coverage Δ
audinterface/core/feature.py	`100.0% <100.0%> (ø)`

hagenw · 2023-03-13T09:51:06Z

You are not only concatenating lists, but also numpy arrays. But I guess the problem was concatenating with pandas before which seems to be much slower (most likely due to index related stuff) than concatenating with numpy.

frankenjoe · 2023-03-13T09:56:33Z

You are not only concatenating lists, but also numpy arrays

Actually, in the win_dur=None case we even pre-allocate the numpy array and assign directly to it. When a sliding is used this is not possible as we cannot know the matrix size in advance, so here we rely on np.concatenate(), which is still much faster than pd.concat().

hagenw · 2023-03-13T09:58:01Z

OK, thanks for clarification.

hagenw · 2023-03-13T10:03:21Z

With the incorporation of #102 the benchmark is down to ~0.26 s for me.

frankenjoe added 2 commits March 10, 2023 08:43

avoid use of pd.concat()

038800c

TST: test sliding window on file

c11ca68

frankenjoe requested a review from hagenw March 10, 2023 08:06

hagenw merged commit f4687e1 into main Mar 13, 2023

hagenw deleted the speed-up-feature branch March 13, 2023 09:58

hagenw mentioned this pull request Mar 13, 2023

Speed up ProcessWithContext.process_*() #104

Merged

hagenw mentioned this pull request Mar 13, 2023

Speed up feature extraction #100

Closed

hagenw mentioned this pull request Mar 29, 2023

Fix: do not truncate string features #110

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up Feature.process_*() #103

Speed up Feature.process_*() #103

frankenjoe commented Mar 10, 2023 •

edited

Loading

frankenjoe commented Mar 10, 2023 •

edited

Loading

codecov bot commented Mar 10, 2023 •

edited

Loading

hagenw commented Mar 13, 2023

frankenjoe commented Mar 13, 2023 •

edited

Loading

hagenw commented Mar 13, 2023

hagenw commented Mar 13, 2023

Speed up Feature.process_*() #103

Speed up Feature.process_*() #103

Conversation

frankenjoe commented Mar 10, 2023 • edited Loading

Benchmark

frankenjoe commented Mar 10, 2023 • edited Loading

codecov bot commented Mar 10, 2023 • edited Loading

Codecov Report

hagenw commented Mar 13, 2023

frankenjoe commented Mar 13, 2023 • edited Loading

hagenw commented Mar 13, 2023

hagenw commented Mar 13, 2023

frankenjoe commented Mar 10, 2023 •

edited

Loading

frankenjoe commented Mar 10, 2023 •

edited

Loading

codecov bot commented Mar 10, 2023 •

edited

Loading

frankenjoe commented Mar 13, 2023 •

edited

Loading