Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve identification of BIDS data files and sidecars #38

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

clane9
Copy link
Collaborator

@clane9 clane9 commented Jul 10, 2024

Fixes issues of mis-identifying BIDS data files and JSON sidecars brought up in #36.

  • Multiple non-sidecar JSONs in a session directory incorrectly treated as sidecars due to bad file matching.
  • Compound JSON extensions like '.surf.json' incorrectly treated as sidecars.
  • Missing directories such as microscopy '.ome.zarr' directories and MEG '.ds' directories.
  • Incorrectly including files contained in the above directories.
  • Incorrectly including metadata files like '*_scans.tsv'.

clane9 added 11 commits July 9, 2024 15:58
In #36 @birajstha found two cases where the `is_associated_sidecar`
function produces incorrect results.

1. When there is more than one non-sidecar json file with the same
   suffix in a directory.
2. When a file has a compound extension like '.surf.json'.

Here add (failing) test cases to cover these errors.
- Use `parse_bids_entities` to get full compound extension rather than
  the final part from `Path.suffix`.
- To determine if a file is a sidecar json, look for paths with same
  suffix that *don't* end in '.json'.
Test includes failing case for the 'ome.zarr' directories which are in
fact data "files".

Nb, I don't really like this pattern. IMO these directories should be
e.g. tar archives. It doesn't seem worth breaking the rule that data
files are files.
It turns out not all bids data "files" are actually files. Some can be
directories, e.g. the '.ome.zarr' directories
(see [here](https://bids-specification.readthedocs.io/en/stable/modality-specific-files/microscopy.html#file-formats)).

However, it seems it may be safe to assume all BIDS files have a file
extension.

This is still a pretty poor test for whether a file is a valid BIDS
file. But I don't want to do a test that is overly restictive or very
involved.
Rather than match against fixed datatypes, just match any lowercase word
in the correct path position.
- Make datatype and suffix required. This excludes metadata files like
  `*_scans.tsv` files.
- Exclude files contained in directories that are treated as bids files
  themselves.

Also remove the enforced `allowed_values` on the datatype entity.
@clane9 clane9 marked this pull request as ready for review July 10, 2024 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant