ENH: Improve convolution performance for Sparse variables #411

adelavega · 2019-03-02T00:44:39Z

Fixes #354 and related to #356

Profiling indicates that the slow function is actually np.convolve itself. It calls np.correlate which takes exponentially longer as the variable grows in length.

(see: numpy/numpy#1858)

The good news is that by not densifying prior to convolution things go much faster.
For example, a predictor with ~850 events takes about 15ms as sparse, but 1.03s when upsampled to 10hz. 50hz takes that up to about a minute (I did the profiling on that).

The question that remains is how to downsample at the end. As it is, it will use the original onsets as the frame_times. That is, it will resample only at those onsets. Does that make sense? Or would uniform resampling at the TR (or some factor above that), be better? Maybe we can even do 10hz resampling, although presumably this should be the final step in Transformations and TR should be OK.

yarikoptic

Just had urge to whine I guess ;-)
Will check in more detail when home with a laptop

yarikoptic · 2019-03-02T01:21:51Z

bids/analysis/transformations/compute.py

-                                source=var.source, sampling_rate=var.sampling_rate)
+        if isinstance(var, SparseRunVariable):
+            return SparseRunVariable(
+                name=var.name, values=convolved[0], onset=onsets,


Unrelated to the pr, but unchecked assumptions started as [0] is a recipe for a trouble or a cryptic error. I believe we addressed some of such, but it would be nice if new code with checks/proper errors or at least assert statements

You mean convolved[0]? I blame someone else! Maybe if it returns a dictionary?

yarikoptic · 2019-03-02T01:23:11Z

Ah, and how to get such a neat profile figure, please teach me

adelavega · 2019-03-02T18:46:19Z

Ah, and how to get such a neat profile figure, please teach me

cProfile.run('hrf.compute_regressor(vals, model, onsets)', 'restats')

gprof2dot -f pstats restats | dot -Tpng -o output.png

codecov · 2019-03-02T19:13:37Z

Codecov Report

Merging #411 into master will decrease coverage by 0.05%.
The diff coverage is 33.33%.

@@            Coverage Diff             @@
##           master     #411      +/-   ##
==========================================
- Coverage    62.3%   62.25%   -0.06%     
==========================================
  Files          27       27              
  Lines        4555     4554       -1     
  Branches     1173     1173              
==========================================
- Hits         2838     2835       -3     
- Misses       1433     1434       +1     
- Partials      284      285       +1

Flag	Coverage Δ
#unittests	`62.25% <33.33%> (-0.06%)`	⬇️

Impacted Files	Coverage Δ
bids/analysis/transformations/compute.py	`85.57% <33.33%> (-1.09%)`	⬇️
bids/analysis/analysis.py	`88.29% <0%> (-0.49%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4bb3cd...ffdaa01. Read the comment docs.

codecov · 2019-03-02T19:13:37Z

Codecov Report

Merging #411 into master will increase coverage by 0.08%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #411      +/-   ##
==========================================
+ Coverage    62.3%   62.39%   +0.08%     
==========================================
  Files          27       27              
  Lines        4555     4560       +5     
  Branches     1173     1173              
==========================================
+ Hits         2838     2845       +7     
+ Misses       1433     1432       -1     
+ Partials      284      283       -1

Flag	Coverage Δ
#unittests	`62.39% <100%> (+0.08%)`	⬆️

Impacted Files	Coverage Δ
bids/analysis/transformations/compute.py	`88.18% <100%> (+1.51%)`	⬆️
bids/analysis/transformations/munge.py	`91.81% <0%> (+0.58%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4bb3cd...46f638e. Read the comment docs.

adelavega · 2019-03-02T20:01:23Z

A few more profiling/timing results. This is passing Sparse variables as Sparse to compute_regressor.

I timed how long it would take to run .setup() on a model with 16 regressors that need to be convolved, while varying the sampling_rate of frame_times. This would still return a Dense variable for a Sparse variable, but would only density after convolution.

Since the frame_times are used (in conjunction w/ the oversampling factor) to upsample the HRF model + events, this is the main determinant of how long convolve will take. It scales linearly.

1hz: 1.2s
2hz: 2.14s
5hz: 10s
10hz: 20s

Now the question is whether its reasonable to downsample to TR (or a small factor above that), assuming Convolve is the final transformation. I'll post graphs of the actual convolved variables at different sampling rates soon.

adelavega · 2019-03-02T21:01:35Z

Speech at 0.5, 1, 5 and 10hz

and Brightness at 0.5, 1, 5, and 10:

Doesn't seem to make much of a difference, so I'd say we downsample to TR or 1-5 hz by default. You do see a slight degradation at 0.5hz, and below that it looks very very poor. E.g. 0.1 for speech:

adelavega · 2019-03-04T18:34:42Z

I think if we want to be smart about it, for no information loss to occur, you'd want to set combination of oversampling and frame_times frequency to match the minimum distance between events.

For example, in the case of speech, the minimum distance between events is 0.028s whereas other events are sampled at 1hz.

I tested out using an adaptive oversampling, such that it is set to the minimum distance between events * sampling_rate of frame time. In my testing, this seemed to work quite well, even when returning events at 10hz (which is what the previous behavior was).

This is also only took 1 second to run analysis.setup() on a run w/ 17 variables convolved.

tyarkoni · 2019-03-04T18:56:57Z

Cool, glad that works. But we should probably use the minimum of event duration rather than distance between events (or minimum of both). In the naturalistic context, these will generally coincide (e.g., for uniformly sampled measurements, duration will generally match distance to next sample), but in many other contexts, you can have widely-spaced but very short events.

effigies

I think this makes sense... I definitely think scaling the oversampling inversely with the actual sampling rate is sensible.

This is going to make refactoring #376 pretty awful, but that's my fault for not getting that merged first.

bids/analysis/transformations/compute.py

effigies

This looks almost ready. Is there a hold-up I'm not aware of being signaled by [WIP]?

bids/analysis/transformations/compute.py

Co-Authored-By: adelavega <[email protected]>

…ix/convolve

adelavega · 2019-03-05T15:50:53Z

Ready to merge as soon as tests pass. Oddly I can't edit the title...

effigies · 2019-03-05T15:53:18Z

Yeah, GitHub's UI seems to be glitchy. I was able to edit it in a tab I had open from earlier.

tyarkoni · 2019-03-05T16:03:34Z

Reviewing this now.

tyarkoni

Two minor comments, otherwise LGTM.

tyarkoni · 2019-03-05T16:07:36Z

bids/analysis/transformations/compute.py

@@ -49,11 +54,18 @@ def _transform(self, var, model='spm', derivative=False, dispersion=False,
        elif model != 'fir':
            raise ValueError("Model must be one of 'spm', 'glover', or 'fir'.")

-        convolved = hrf.compute_regressor(vals, model, onsets,
-                                          fir_delays=fir_delays, min_onset=0)
+        min_interval = min(np.ediff1d(np.sort(var.onset)).min(),


Maybe add a comment here for posterity explaining min_interval—it will reduce the maintenance burden a year or two down the line

tyarkoni · 2019-03-05T16:09:46Z

bids/analysis/transformations/compute.py

-                                          fir_delays=fir_delays, min_onset=0)
+        min_interval = min(np.ediff1d(np.sort(var.onset)).min(),
+                           var.duration.min())
+        oversampling = np.ceil(1 / (min_interval * sampling_rate))


We should probably add a test to make sure that oversampling is computing properly from the input parameters (which might require mocking compute_regressor).

adelavega · 2019-03-05T19:06:21Z

Okay @tyarkoni, I added tests with dense and sparse variables, and mocked compute_regressor to check the oversampling.

Two notes:

I am checking for min_interval only with unique value, as it appears more than one onset with the same value is possible (at least in test data), which results in min_interval=0
When densifying, rounding errors can lead to oversampling of 2.0, when it was computed to be 1.0000001. I think that's fine, but it would be a slight penalty in performance. Also, for many low frequency events (e.g. 'RT' in test data), the collections sampling_rate would have to be very low before oversampling goes above 1. I'm not 100% confident this is acceptable, but it seems to make sense.

adelavega · 2019-03-05T19:56:59Z

Ugh, go away python 2. I guess we need to all the mock library.

codecov-io · 2019-03-05T21:15:02Z

Codecov Report

Merging #411 into master will increase coverage by 0.08%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #411      +/-   ##
==========================================
+ Coverage    62.3%   62.39%   +0.08%     
==========================================
  Files          27       27              
  Lines        4555     4560       +5     
  Branches     1173     1173              
==========================================
+ Hits         2838     2845       +7     
+ Misses       1433     1432       -1     
+ Partials      284      283       -1

Flag	Coverage Δ
#unittests	`62.39% <100%> (+0.08%)`	⬆️

Impacted Files	Coverage Δ
bids/analysis/transformations/compute.py	`88.18% <100%> (+1.51%)`	⬆️
bids/analysis/transformations/munge.py	`91.81% <0%> (+0.58%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4bb3cd...0e18ff7. Read the comment docs.

codecov-io · 2019-03-05T21:15:13Z

Codecov Report

Merging #411 into master will increase coverage by 0.07%.
The diff coverage is 86.66%.

@@            Coverage Diff             @@
##           master     #411      +/-   ##
==========================================
+ Coverage    62.3%   62.37%   +0.07%     
==========================================
  Files          27       27              
  Lines        4555     4564       +9     
  Branches     1173     1174       +1     
==========================================
+ Hits         2838     2847       +9     
  Misses       1433     1433              
  Partials      284      284

Flag	Coverage Δ
#unittests	`62.37% <86.66%> (+0.07%)`	⬆️

Impacted Files	Coverage Δ
bids/analysis/transformations/compute.py	`86.84% <86.66%> (+0.17%)`	⬆️
bids/analysis/transformations/munge.py	`91.81% <0%> (+0.58%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update b4bb3cd...916c8a3. Read the comment docs.

tyarkoni · 2019-03-05T21:48:16Z

Also, for many low frequency events (e.g. 'RT' in test data), the collections sampling_rate would have to be very low before oversampling goes above 1. I'm not 100% confident this is acceptable, but it seems to make sense.

It's worth experimenting with the convolution code to make sure small oversampling values don't do wonky things.

That issue aside, we should probably also always double the value you're currently using. The minimum of event durations and onset deltas seems like a reasonable approximation of the highest-frequency signal in the timeseries, but we want to make sure we're above the Nyquist rate (i.e., 2 * the highest frequency). This should make a big difference in many cases.

adelavega · 2019-03-05T22:02:55Z

In my short look at it, it didn't seem to make a difference, since oversampling is essentially already done by requesting high frequency frame-times, but it's worth testing in more detail, and keeping an eye on it.

I'm fine with doubling the oversampling rate

adelavega · 2019-03-06T00:21:04Z

Turns out this line is (potentially) inaccurate given a float duration:

resample_frames = np.linspace(0, dur, dur * sampling_rate, endpoint=False)                  
                    
/usr/local/bin/ipython:2: DeprecationWarning: object of type <class 'float'> cannot be safely interpreted as an integer.

Or at least it's potentially inconsistent with how DenseRunVariable is created.

I think #361 deserves its own fix (throw error if index and values don't match at __init__), but here I'll aim to have resample_frames result in the same n as create index.

… frame_times

adelavega · 2019-03-07T20:41:11Z

Anybody want to give this a final review? If not, I will merge soon as its already been reviewed and seems to be working well for me.

tyarkoni

LGTM

tyarkoni · 2019-03-07T20:50:40Z

bids/analysis/transformations/compute.py

+            sampling_rate = self.collection.sampling_rate
+            dur = var.get_duration()
+            resample_frames = np.linspace(
+                0, dur, int(math.ceil(dur * sampling_rate)), endpoint=False)


I'm wondering if we should center the sampled frames within the timeseries... e.g., suppose we have a 2-second dense timeseries, and we want to downsample to 2 Hz. Currently we would sample at 0, 0.5, 1, and 1.5. Probably we should do 0.25, 0.75, 1.25, and 1.75. Let's not change anything here, because if we were going to do this, we'd need to do it throughout the codebase for consistency. Mostly just making a mental note.

Do not convert to Dense if variable is Sparse

29b9df5

adelavega changed the title ~~Do not convert to Dense if variable is Sparse~~ [WIP] HRF: Do not convert to Dense if variable is Sparse Mar 2, 2019

adelavega mentioned this pull request Mar 2, 2019

Estimate matrix from symbolic specification #410

Closed

yarikoptic reviewed Mar 2, 2019

View reviewed changes

Set resample_frames to uniform 10hz if Sparse, return Dense

ffdaa01

get_duration on var

857cd0f

adelavega added 2 commits March 4, 2019 11:56

Default to 1 Hz

40c5dec

Return events at 10Hz, enable adapative oversampling

625d67d

Take min of duration, and do future division

b4881c8

effigies reviewed Mar 4, 2019

View reviewed changes

bids/analysis/transformations/compute.py Outdated Show resolved Hide resolved

bids/analysis/transformations/compute.py Outdated Show resolved Hide resolved

Simplify adapative oversampling code

129a30b

effigies reviewed Mar 5, 2019

View reviewed changes

bids/analysis/transformations/compute.py Outdated Show resolved Hide resolved

bids/analysis/transformations/compute.py Outdated Show resolved Hide resolved

bids/analysis/transformations/compute.py Outdated Show resolved Hide resolved

effigies and others added 3 commits March 5, 2019 09:48

Update bids/analysis/transformations/compute.py

44825e2

Co-Authored-By: adelavega <[email protected]>

Lint convolve function

49547ea

Merge branch 'fix/convolve' of github.com:bids-standard/pybids into f…

814ee3f

…ix/convolve

effigies changed the title ~~[WIP] HRF: Do not convert to Dense if variable is Sparse~~ ENH: Improve convolution performance for Sparse variables Mar 5, 2019

effigies approved these changes Mar 5, 2019

View reviewed changes

tyarkoni reviewed Mar 5, 2019

View reviewed changes

tyarkoni mentioned this pull request Mar 5, 2019

ENH: Enable models for sparsely sampled fMRI series #376

Open

adelavega added 4 commits March 5, 2019 11:40

Use df onset and duration, not var

cd7821c

Add min_interval comment

258c26e

Compute min_interval based on unique values

38f6cc4

Add test_convolve, with mocking for testing oversampling factor

46f638e

Mock Python 2

0e18ff7

adelavega added 2 commits March 5, 2019 16:05

Double oversampling, (pre ceil rounding)

24646c8

Check that there are at least two events to compute min diff

d3e50ee

adelavega mentioned this pull request Mar 6, 2019

resample: ValueError: x and y arrays must be equal in length along interpolation axis. #361

Closed

Use same formula as DenseRunVariable._build_entity_index to calculate…

916c8a3

… frame_times

tyarkoni approved these changes Mar 7, 2019

View reviewed changes

adelavega merged commit 4315865 into master Mar 7, 2019

adelavega deleted the fix/convolve branch March 7, 2019 21:13

effigies mentioned this pull request Mar 11, 2019

ENH: Dynamically update convolution sampling rate for short duration events #356

Closed

adelavega mentioned this pull request Mar 21, 2019

Allow compute_regressor to handle already-dense events #409

Closed

yarikoptic mentioned this pull request Mar 22, 2019

FOI: profiling on make_studyforrest_mockup datalad/datalad#3261

Closed

adelavega mentioned this pull request Aug 11, 2020

FIX: Convolve zero-duration (impulse) events when variable contains multiple events #645

Merged

adelavega mentioned this pull request Apr 28, 2021

Incorrect duration for HRF variables when concatenated (not using Analysis.setup) #719

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Improve convolution performance for Sparse variables #411

ENH: Improve convolution performance for Sparse variables #411

adelavega commented Mar 2, 2019

yarikoptic left a comment

yarikoptic Mar 2, 2019

adelavega Mar 2, 2019 •

edited

Loading

yarikoptic commented Mar 2, 2019

adelavega commented Mar 2, 2019 •

edited

Loading

codecov bot commented Mar 2, 2019

codecov bot commented Mar 2, 2019 •

edited

Loading

adelavega commented Mar 2, 2019 •

edited

Loading

adelavega commented Mar 2, 2019 •

edited

Loading

adelavega commented Mar 4, 2019

tyarkoni commented Mar 4, 2019

effigies left a comment

effigies left a comment

adelavega commented Mar 5, 2019

effigies commented Mar 5, 2019

tyarkoni commented Mar 5, 2019

tyarkoni left a comment

tyarkoni Mar 5, 2019

tyarkoni Mar 5, 2019

adelavega commented Mar 5, 2019 •

edited

Loading

adelavega commented Mar 5, 2019

codecov-io commented Mar 5, 2019 •

edited

Loading

codecov-io commented Mar 5, 2019 •

edited by codecov bot

Loading

tyarkoni commented Mar 5, 2019

adelavega commented Mar 5, 2019

adelavega commented Mar 6, 2019

adelavega commented Mar 7, 2019

tyarkoni left a comment

tyarkoni Mar 7, 2019

ENH: Improve convolution performance for Sparse variables #411

ENH: Improve convolution performance for Sparse variables #411

Conversation

adelavega commented Mar 2, 2019

yarikoptic left a comment

Choose a reason for hiding this comment

yarikoptic Mar 2, 2019

Choose a reason for hiding this comment

adelavega Mar 2, 2019 • edited Loading

Choose a reason for hiding this comment

yarikoptic commented Mar 2, 2019

adelavega commented Mar 2, 2019 • edited Loading

codecov bot commented Mar 2, 2019

Codecov Report

codecov bot commented Mar 2, 2019 • edited Loading

Codecov Report

adelavega commented Mar 2, 2019 • edited Loading

adelavega commented Mar 2, 2019 • edited Loading

adelavega commented Mar 4, 2019

tyarkoni commented Mar 4, 2019

effigies left a comment

Choose a reason for hiding this comment

effigies left a comment

Choose a reason for hiding this comment

adelavega commented Mar 5, 2019

effigies commented Mar 5, 2019

tyarkoni commented Mar 5, 2019

tyarkoni left a comment

Choose a reason for hiding this comment

tyarkoni Mar 5, 2019

Choose a reason for hiding this comment

tyarkoni Mar 5, 2019

Choose a reason for hiding this comment

adelavega commented Mar 5, 2019 • edited Loading

adelavega commented Mar 5, 2019

codecov-io commented Mar 5, 2019 • edited Loading

Codecov Report

codecov-io commented Mar 5, 2019 • edited by codecov bot Loading

Codecov Report

tyarkoni commented Mar 5, 2019

adelavega commented Mar 5, 2019

adelavega commented Mar 6, 2019

adelavega commented Mar 7, 2019

tyarkoni left a comment

Choose a reason for hiding this comment

tyarkoni Mar 7, 2019

Choose a reason for hiding this comment

adelavega Mar 2, 2019 •

edited

Loading

adelavega commented Mar 2, 2019 •

edited

Loading

codecov bot commented Mar 2, 2019 •

edited

Loading

adelavega commented Mar 2, 2019 •

edited

Loading

adelavega commented Mar 2, 2019 •

edited

Loading

adelavega commented Mar 5, 2019 •

edited

Loading

codecov-io commented Mar 5, 2019 •

edited

Loading

codecov-io commented Mar 5, 2019 •

edited by codecov bot

Loading