-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GSOC] Add surrogate data generation for significant connectivity estimation #223
Conversation
Co-authored-by: Eric Larson <[email protected]>
Notes | ||
----- | ||
Surrogate data is generated by randomly shuffling the order of epochs, independently | ||
for each channel. This destroys the covariance of the data, such that connectivity | ||
estimates should reflect the null hypothesis of no genuine connectivity between | ||
signals (e.g., only interactions due to background noise) | ||
:footcite:`PellegriniEtAl2023`. | ||
|
||
For the surrogate data to properly reflect a null hypothesis, the data which is | ||
shuffled **must not** have a temporal structure that is consistent across epochs. | ||
Examples of this data include evoked potentials, where a stimulus is presented or an | ||
action performed at a set time during each epoch. Such data should not be used for | ||
generating surrogates, as even after shuffling the epochs, it will still show a high | ||
degree of residual connectivity between channels. As a result, connectivity | ||
estimates from your surrogate data will capture genuine interactions, instead of the | ||
desired background noise. Treating these estimates as a null hypothesis will | ||
increase the likelihood of a type II (false negative) error, i.e., that there is no | ||
significant connectivity in your data. | ||
|
||
Appropriate data for generating surrogates includes data from a resting state, | ||
inter-trial period, or similar. Here, a strong temporal consistency across epochs is | ||
not assumed, reducing the chances that connectivity information of interest is | ||
captured in your surrogate connectivity estimates. | ||
|
||
In situations where you want to assess whether evoked data has significant | ||
connectivity, you can generate your surrogate connectivity estimates from non-evoked | ||
data (e.g., rest data, inter-trial data) and compare this to your true connectivity | ||
estimates from the evoked data. | ||
|
||
Regardless of whether you are working with evoked or non-evoked data, **you should | ||
always compare true and surrogate connectivity estimates from epochs of the same | ||
duration**. This will ensure that spectral information is captured with the same | ||
accuracy in both sets of connectivity estimates. Ideally, **you should also compare | ||
true and surrogate connectivity estimates from the same number of epochs** to avoid | ||
biases from noise (fewer epochs gives noisier estimates) or finite sample sizes | ||
(e.g., in coherency, phase-locking value, etc... :footcite:`VinckEtAl2010`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very long notes section, but like we discussed, there are important considerations as to what data surrogates should be generated from.
Hmm, building docs is failing here on the new example (fine for me locally), but doesn't give a reason beyond
Could it be that it doesn't like multiprocessing? I'm using |
Often that's a sign that you've used too much memory, and decreasing the number of jobs can help |
Okay, even using 1/4 the CPU count ends up using too much memory (find it surprising with 36 cores and 70 GB). Just reverting to a single job and taking the hit on runtime. |
Switching CI back to upstream since mne-tools/mne-python#12747 was merged. |
Wanted to make sure everything was green before submitting final GSoC stuff. Newly failing tests addressed in #228 |
@wmvanvliet want to look and merge if you're happy? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks so great! thanks for all the hard work @tsbinns
Few comments:
- would it be great to actually print the pvalue too potentially besides just writing that it's <0.05?
- When comparing significance across the frequency band, the major difference appears in the alpha band, but you test the beta band. Perhaps it's worth mentioning that you test the beta band specifically since the alpha band is clearly significantly different (and due to simplicity for running the example), but in practice one can test both. You do allude to this point in multiple comparisons discussion, but that's one question I had when reading through that section
Thanks for the review @adam2392! Very good points, will have a look at implementing them by the end of the week. Cheers! |
Comments should be addressed now @adam2392. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @tsbinns ! |
Awesome! Thanks @larsoner 🚀 |
Follows from the changes in #220, so the diff will be quite smaller when that's merged.
In brief, adds a new function
make_surrogate_connectivity
in thedatasets
module (felt this was an appropriate place).Since
datasets
is more populous now I thought it could make sense to clean things up and bring the tests frommake_signals_in_freq_bands
from the generic tests folder to one within the datasets folder (where themake_surrogate_connectivity
tests also are).Also added an example demonstrating the use of the new function.