Splits are fetched by BIDSPath.fpath only when extension (and/or) suffix are missing #712

dmalt · 2021-02-20T12:57:44Z

Is there a reason for such behavior?
I'm asking because I'm using BIDSPath as a kind of template tool for paths in my dataset.
For instance, I like to use the same BIDSPath for writing a file and then loading it at the later processing steps.

The problem occurs when occasionally file sizes exceed the 2 Gb limit and have to be split. In this case, writing is still fine with split=None since MNE Python supports bids-style splits when saving raw data.
Reading on the other hand becomes problematic because I have to either keep track of all the files which happen to have splits and update my templates separately, which is tedious or use the template BIDSPaths with suffix or extension unset, which I didn't realize until looking at the source code and which seem a little random to guess.

mne-bids/mne_bids/path.py

Lines 414 to 446 in 74ac41a

    
           if self.suffix == 'meg' and self.extension == '.ds': 
        
               bids_fpath = op.join(data_path, self.basename) 
        
           elif self.suffix == 'meg' and self.extension == '.pdf': 
        
               bids_fpath = op.join(data_path, 
        
                                    op.splitext(self.basename)[0]) 
        
           else: 
        
               # if suffix and/or extension is missing, and root is 
        
               # not None, then BIDSPath will infer the dataset 
        
               # else, return the relative path with the basename 
        
               if (self.suffix is None or self.extension is None) and \ 
        
                       self.root is not None: 
        
                   # get matching BIDS paths inside the bids root 
        
                   matching_paths = \ 
        
                       _get_matching_bidspaths_from_filesystem(self) 
        
                   # FIXME This will break 
        
                   # FIXME e.g. with FIFF data split across multiple files. 
        
                   # if extension is not specified and no unique file path 
        
                   # return filepath of the actual dataset for MEG/EEG/iEEG data 
        
                   if self.suffix is None or self.suffix in ALLOWED_DATATYPES: 
        
                       # now only use valid datatype extension 
        
                       valid_exts = sum(ALLOWED_DATATYPE_EXTENSIONS.values(), []) 
        
                       matching_paths = [p for p in matching_paths 
        
                                         if _parse_ext(p)[1] in valid_exts] 
        
                   if (self.split is None and 
        
                           (not matching_paths or 
        
                            '_split-' in matching_paths[0])): 
        
                       # try finding FIF split files (only first one) 
        
                       this_self = self.copy().update(split='01') 
        
                       matching_paths = \ 
        
                           _get_matching_bidspaths_from_filesystem(this_self)

So my question is can splits be handled irrespective of suffix and extension?
And also can this fetching behavior of BIDSPath.fpath (and BIDSPath.__str__ for that matter) become more transparent? At least mentioning these rules in the docstrings would make a big difference I think.

Also maybe adding some flag like fetch_from_filesystem=True to make the behavior more explicit and controllable.

The text was updated successfully, but these errors were encountered:

dmalt · 2021-02-20T12:59:59Z

Discussion in #692 is relevant I think

hoechenberger · 2021-02-20T13:50:37Z

And also can this fetching behavior of BIDSPath.fpath (and BIDSPath.str for that matter) become more transparent? At least mentioning these rules in the docstrings would make a big difference I think.

Also maybe adding some flag like fetch_from_filesystem=True to make the behavior more explicit and predictable.

My personal opinion is that we should stop doing these heuristics altogether, but… well, that's just me :)

dmalt · 2021-02-20T14:05:20Z

Couldn't agree more. Debugging this was a pain.
Fetching splits as if there was a single file would be really useful though

agramfort · 2021-02-20T16:25:34Z

@dmalt can you profile a code snippet to demo the unnatural behaviour you observed?
it's not clear to me as the split should be handled transparently.

dmalt · 2021-02-20T17:27:08Z

@agramfort this is what I mean

from pathlib import Path
from mne_bids import BIDSPath

root_path = Path("bids_root")
subj_path = root_path / "sub-01" / "meg"
subj_path.mkdir(exist_ok=True, parents=True)

bp_template = BIDSPath(
    subject="01", root="bids_root", suffix="meg", extension=".fif"
)
print(bp_template.fpath)
# prints bids_root/sub-01/meg/sub-01_meg.fif
# which is what I want to write raw data with raw.save

# now let's emulate situation when the raw file was saved in splits:
(subj_path / "sub-01_split-01_meg.fif").touch(exist_ok=True)
(subj_path / "sub-01_split-02_meg.fif").touch(exist_ok=True)

print(bp_template.fpath)
# still prints bids_root/sub-01/meg/sub-01_meg.fif
# because both suffix and extension are set
# Ideally I would like this to pick the first split

# At the same time
bp_template = BIDSPath(subject="01", root="bids_root", suffix="meg")
print(bp_template.fpath)
# outputs
# bids_root/sub-01/meg/sub-01_split-01_meg.fif
# But this template won't work for saving data since it's missing extension

# Bottom line:
# 1) Can't use the same template for reading and writing data
# 2) Can't guess the output of fpath without looking at source code
# 3) It seems strange that I need suffix and extension to pick splits

agramfort · 2021-02-21T13:50:04Z

if the files are split then

bp_template.fpath is indeed
bids_root/sub-01/meg/sub-01_split-01_meg.fif

I don't understand why you say you need to update your template
manually. mne_bids should as shown above direct you to split-01
file automatically.

i've had to deal with split files in a real dataset using
https://github.com/mne-tools/mne-study-template/
and I had no problem after my last fix.

dmalt · 2021-02-22T09:58:16Z

No, I understand that it's possible to pick up splits of course. It's just that I had a hard time figuring out how to write the code properly for that because it's not mentioned in the docstring that BIDSPath.fpath would go to the filesystem to get the data. And my point is that it's not obvious why it would only work with one of extension and suffix unset.

dmalt · 2021-02-22T10:09:31Z

And I do care about setting both extension and suffix because I want to use the same BIDSPath for both reading and writing data. It feels logical and readable and requires very little code. I managed to make it work for my project, but the process was rough for me which is why I wanted to discuss it here.

hoechenberger · 2021-02-22T14:09:52Z

When writing, we return the (new? updated?) BIDSPath. Would using this one help in your case? (I actually haven't tried this out, so just leaving this here)

dmalt · 2021-02-22T14:35:05Z

You mean with write_raw_bids? Yes, looking at the docstring I think this might be an option. I need to test it out, thanks! I wrote my data with MNE Python. Since the writing is mostly for the derivatives, I don't care that much about all the metadata.

BTW, a kind of explanation about inferring the path there is exactly the thing that would be great to see in BIDSPath.fpath to avoid the surprises.

hoechenberger · 2021-02-22T15:01:08Z

Ok in this case I'm now confused and need to re-read the entire thread later tonight 😅

dmalt · 2021-02-22T19:52:29Z

I'm sorry I confused you :) Probably I'm not being clear enough, so I'll try to reformulate.

I don't like the fact that fpath getter does things behind the scenes and that its behavior changes depending on what files are present and which arguments were set in BIDSPath without saying anything about it in the docs and without any means to control it.
I also don't like what it does behind the scenes related to splits. In particular, I don't understand why it's required to unset either suffix or extension to pick up a path for a split file. It looks like the only requirement for fpath to pick the split should be that the fetched file can be uniquely identified in the filesystem.
I don't have a problem with making it work in my project now but it took some time to get there because of the first two points, so I wanted to discuss it with you to potentially improve fpath behavior and docs.

Does this make sense?

agramfort · 2021-02-28T11:24:00Z

ok got it ! yes this is clearly a bug !

this code block : https://github.com/mne-tools/mne-bids/blob/master/mne_bids/path.py#L423
is a mess. I edited this to support split files starting from @adam2392 's initial code and therefore
I miss the initial underlying motivations for certain cases.

I very open to clarify what fpath should be doing depending on:

if root is provided?
if exension is provided? clearly guessing split files are only necessary if extension in [None, '.fif'] and suffix in [None, 'meg']

@adam2392 can you comment in to clarify what you expect fpath to return in different situations? It will be the basis of an improved docstring

and sorry @dmalt for the slow reaction and the time it took me to understand the issue...

hoechenberger · 2021-02-28T11:43:23Z

clearly guessing split files are only necessary if extension in [None, '.fif'] and suffix in [None, 'meg']

suffix in [None, 'meg', 'eeg', 'ieeg'], no? iEEG datasets can be massive, and HD-EEG may also produce large files that may need to be split

agramfort · 2021-02-28T13:02:39Z

well fif is not part of standard beyond MEG but it's true that we use it in the study template ....

…

hoechenberger · 2021-02-28T13:45:16Z

Interesting thought!

adam2392 · 2021-02-28T16:52:19Z

@adam2392 can you comment in to clarify what you expect fpath to return in different situations? It will be the basis of an improved docstring

and sorry @dmalt for the slow reaction and the time it took me to understand the issue...

I think the documentation could definitely be improved.

Rn:

if root is not passed in, then fpath is just a relative path: e.g. sub-01/ses-01/<datatype>/sub-01_ses-01_task-01_run-01_<suffix>.<extension>
If root is passed in, and a full filename is defined (e.g. suffix and extension), then it will just be like the above, but with root prepended: <root>/sub-01/ses-01/<datatype>/sub-01_ses-01_task-01_run-01_<suffix>.<extension>
if root is passed in, then a full fpath will try to get inferred if one is missing the suffix and extension.

The third case is probably where it's potentially weird, and I also don't get what's going on w/ split files cuz I don't use them. So in this case, you might store data as BrainVision, and so you could theoretically leave out the extension='vhdr', and you can still use fpath to get the full filepath to the dataset (assuming you uniquely defined it with the other BIDS entities). Same w/ suffix.

agramfort · 2021-03-01T09:04:14Z

how about we disable fpath when root is None? it will simplify things

…

hoechenberger · 2021-03-01T10:52:08Z

how about we disable fpath when root is None? it will simplify things

I think we currently simply treat root as . if it's not been passed. We could actually SET it to Path('.').absolute() upon instantiation if it is None

agramfort · 2021-03-01T12:29:45Z

defaulting to '.' yes would simplify things

…

dmalt · 2021-03-04T14:33:23Z

Thanks for your replies, everybody!

I would like to point out that BIDSPath.__str__ also relies on fpath and it looks really weird when there are no complaints from MNE-bids until one calls str(bids_path) and then it breaks because fpath tries to infer path and something goes wrong.
Consider this. When BIDS_ROOT is a path to an existing BIDS dataset and there are multiple .fif files for sub-01, e.g. rest and task:

In [1]: bp = BIDSPath(root=BIDS_ROOT, subject="01")
In [2]: str(bp)  # simply checking what my BIDSPath looks like                                                                              
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-9-810b7d4e3b46> in <module>
----> 1 str(bp)

~/miniconda3/envs/mne_bids_latest/lib/python3.8/site-packages/mne_bids/path.py in __str__(self)
    346     def __str__(self):
    347         """Return the string representation of the path."""
--> 348         return str(self.fpath)
    349 
    350     def __repr__(self):

~/miniconda3/envs/mne_bids_latest/lib/python3.8/site-packages/mne_bids/path.py in fpath(self)
    463                            'BIDS dataset. Please run the BIDS validator on '
    464                            'your data.')
--> 465                     raise RuntimeError(msg)
    466                 else:
    467                     bids_fpath = matching_paths[0]

RuntimeError: Found more than one matching data file for the requested recording. Cannot proceed due to the ambiguity. This is likely a problem with your BIDS dataset. Please run the BIDS validator on your data.

Don't you think it might be reasonable to make fpath simply assemble path from the given entities and have a separate function for inferring paths from the dataset? For instance, there's BIDSPath.match() which already sort of does this.

adam2392 · 2021-03-06T20:59:22Z

^ I'm in favor of that.

simplify fpath/str
perhaps a function to try to "infer" raw dataset path (similar to match, but only tries to get the raw EEg/meg/ieeg dataset)

dmalt · 2021-03-06T21:06:20Z

Yes, this sounds great

agramfort · 2021-03-07T18:17:19Z

see #721 to start the ball rolling

…

agramfort linked a pull request Mar 7, 2021 that will close this issue

[WIP] : simplify fpath #721

Draft

sappelhoff mentioned this issue Jul 9, 2021

bidspath example contains runtimewarning #829

Closed

dmalt mentioned this issue Aug 9, 2023

Inconsistent splits naming between Raw and Epochs when using BIDS schema mne-tools/mne-python#11870

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Splits are fetched by BIDSPath.fpath only when extension (and/or) suffix are missing #712

Splits are fetched by BIDSPath.fpath only when extension (and/or) suffix are missing #712

dmalt commented Feb 20, 2021 •

edited

Loading

dmalt commented Feb 20, 2021

hoechenberger commented Feb 20, 2021

dmalt commented Feb 20, 2021 •

edited

Loading

agramfort commented Feb 20, 2021

dmalt commented Feb 20, 2021 •

edited

Loading

agramfort commented Feb 21, 2021

dmalt commented Feb 22, 2021

dmalt commented Feb 22, 2021 •

edited

Loading

hoechenberger commented Feb 22, 2021

dmalt commented Feb 22, 2021

hoechenberger commented Feb 22, 2021

dmalt commented Feb 22, 2021

agramfort commented Feb 28, 2021

hoechenberger commented Feb 28, 2021

agramfort commented Feb 28, 2021 via email

hoechenberger commented Feb 28, 2021

adam2392 commented Feb 28, 2021

agramfort commented Mar 1, 2021 via email

hoechenberger commented Mar 1, 2021

agramfort commented Mar 1, 2021 via email

dmalt commented Mar 4, 2021 •

edited

Loading

adam2392 commented Mar 6, 2021

dmalt commented Mar 6, 2021

agramfort commented Mar 7, 2021 via email

Splits are fetched by BIDSPath.fpath only when extension (and/or) suffix are missing #712

Splits are fetched by BIDSPath.fpath only when extension (and/or) suffix are missing #712

Comments

dmalt commented Feb 20, 2021 • edited Loading

dmalt commented Feb 20, 2021

hoechenberger commented Feb 20, 2021

dmalt commented Feb 20, 2021 • edited Loading

agramfort commented Feb 20, 2021

dmalt commented Feb 20, 2021 • edited Loading

agramfort commented Feb 21, 2021

dmalt commented Feb 22, 2021

dmalt commented Feb 22, 2021 • edited Loading

hoechenberger commented Feb 22, 2021

dmalt commented Feb 22, 2021

hoechenberger commented Feb 22, 2021

dmalt commented Feb 22, 2021

agramfort commented Feb 28, 2021

hoechenberger commented Feb 28, 2021

agramfort commented Feb 28, 2021 via email

hoechenberger commented Feb 28, 2021

adam2392 commented Feb 28, 2021

agramfort commented Mar 1, 2021 via email

hoechenberger commented Mar 1, 2021

agramfort commented Mar 1, 2021 via email

dmalt commented Mar 4, 2021 • edited Loading

adam2392 commented Mar 6, 2021

dmalt commented Mar 6, 2021

agramfort commented Mar 7, 2021 via email

dmalt commented Feb 20, 2021 •

edited

Loading

dmalt commented Feb 20, 2021 •

edited

Loading

dmalt commented Feb 20, 2021 •

edited

Loading

dmalt commented Feb 22, 2021 •

edited

Loading

dmalt commented Mar 4, 2021 •

edited

Loading