You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The originally proposed, partially implemented BidsComponent and BidsDatasets API, proposed in #209, cause great complications when considering snakebids as a more generic library for manipulating BIDS paths. A BidsComponent is fundamentally table of entities and values plus a template, allowing the storage, filtering, and retrieval of BIDs paths. As such, the API has been modeled around tabular access and indexing. However, BidsComponents are not primarily useful as tables, but as indexed, derivable lists of paths. With a table-centric API, snakebids attempts to recreate utility much better provided by Pandas and Polars. The following API proposal is intended to facilitate component derivation and ready access to component entries, the main strengths of snakebids.
Derivation methods
For instance, it would be very useful to have new API to facilitate the derivation of new components based on old components. For instance, a method BidsComponent.derive() might be used to change the template. A method like BidsComponent.produce might produce a new BidsComponent with additional wildcards. Similarly, BidsComponent.drop() might make a new component with a subset of wildcards.
Subset operations, such as the last method mentioned above, would lead to duplicate entries if implemented naively. For instance:
Asymmetry: The .produce() method previously mentioned adds entries. This reasonably leads to the expectation that a subset method like .drop()removes entries
Usability: If a component is derived via a subset, then looped over for some processing, one would need to remember to de-duplicate the component manually, first.
Thus, all component deriving methods should deduplicate the entry list automatically, as necessary.
However, the current API uses bracket indexing BidsComponent[entity] to select entities, effectively a subset operation. But this operation does not do deduplication, or return a new BidsComponent. Instead, it returns a PartialBidsComponent, without a template. We could add a method BidsComponent.pick() that returns a new component with the selected entities with deduplication, but now we have two very similar methods with very critical, but subtle, differences.
Since deriving new components is more useful than a table-like access to entity-values, we will get rid of the BidsComponent[entity] type indexing.
More generic components
Additionally, it would be very desirable to make BidsComponent more generic on two counts:
Allowing templates with potentially missing entities. For instance, a component where some paths have an acquisition, others do not. A single coherent template could still be produced, for instance, as follows: "sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}{acquisition}_T1w.nii.gz".
Components without any template. This would allow running on datasets with heterogeneous paths. The only requirement would be that each selected path have a unique set of entity-values. Note that such lack of restraint would likely need to be opt-in for most applications to maintain predictability.
Additional API is needed to ensure safe handling of the above allowances.
API
BidsComponent
BidsComponent[index]
Returns a BidsEntry if index is an int, or a new BidsComponent if index is a slice. Note that this no longer selects one or more entities, as introduced in Implement BidsPartialComponent #243. To deprecate, we recognize that current entity selection exclusively uses strings. Therefore, we can safely have both string and integer indexing simultaneously, with string indexing deprecated.
BidsComponent would fully implement the Sequence API as Sequence[BidsEntry].
Returns a single entry with the provided entities. Errors if no exact match is found.
BidsComponent.entities
To be removed. The name implies that all entities in the component template are accessible, which is not true. Entities as such are a property of an individual BIDS path.
BidsComponent.zip_lists
Removed. The name is not intuitive outside of Snakemake (even within Snakemake it's confusing), and it's implementation as a property hides that fact that it is somewhat computationally expensive
BidsComponent.to_dict()
Renamed version of BidsComponent.zip_lists with a more intuitive name.
BidsComponent.unique()
Fulfills the current role of Bidscomponent.entities, using a more appropriate name.
BidsComponent.wildcards
Unchanged. Ideally, this will be a simple list of the wildcards in the component, as the current implementation is very snakemake-centric. However, this change would severely impact a very central API, and is not a high priority.
BidsComponent.expand()
Unchanged. This method will be primarily kept for snakemake usage. The use of its arguments will be discouraged, as providing a string template would not safely handle components with optional entities.
BidsComponent.paths
Equivalent to .expand(), except that a list of Path is returned directly. Generally more appropriate than .expand().
BidsComponent.pick()
Returns a new component with only the selected entities. A new template is created and entries are de-duplicated.
BidsComponent.drop()
As above, but removes the selected entities.
BidsComponent.produce()
Returns a new component with provided entities as new wildcards. Values are combined with existing entries using product().
BidsComponent.derive()
Modifies the template with provided entities, similar to the bids() function. Errors if any wildcards are set.
BidsDataset
At this time, no significant deviations from #209 are planned. Some attributes, like zip_lists and entities, will likely be removed.
BidsEntry
BidsEntry[entity]: Implemented as Mapping[str, str].
BidsEntry.__fspath__: allows direct opening of the associated file.
BidsEntry.path: returns associated path as pathlib.Path
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
The originally proposed, partially implemented
BidsComponent
andBidsDatasets
API, proposed in #209, cause great complications when considering snakebids as a more generic library for manipulating BIDS paths. ABidsComponent
is fundamentally table of entities and values plus a template, allowing the storage, filtering, and retrieval of BIDs paths. As such, the API has been modeled around tabular access and indexing. However,BidsComponent
s are not primarily useful as tables, but as indexed, derivable lists of paths. With a table-centric API, snakebids attempts to recreate utility much better provided by Pandas and Polars. The following API proposal is intended to facilitate component derivation and ready access to component entries, the main strengths of snakebids.Derivation methods
For instance, it would be very useful to have new API to facilitate the derivation of new components based on old components. For instance, a method
BidsComponent.derive()
might be used to change the template. A method likeBidsComponent.produce
might produce a newBidsComponent
with additional wildcards. Similarly,BidsComponent.drop()
might make a new component with a subset of wildcards.Subset operations, such as the last method mentioned above, would lead to duplicate entries if implemented naively. For instance:
This is a problem for two reasons:
Asymmetry: The
.produce()
method previously mentioned adds entries. This reasonably leads to the expectation that a subset method like.drop()
removes entriesUsability: If a component is derived via a subset, then looped over for some processing, one would need to remember to de-duplicate the component manually, first.
Thus, all component deriving methods should deduplicate the entry list automatically, as necessary.
However, the current API uses bracket indexing
BidsComponent[entity]
to select entities, effectively a subset operation. But this operation does not do deduplication, or return a newBidsComponent
. Instead, it returns aPartialBidsComponent
, without a template. We could add a methodBidsComponent.pick()
that returns a new component with the selected entities with deduplication, but now we have two very similar methods with very critical, but subtle, differences.Since deriving new components is more useful than a table-like access to entity-values, we will get rid of the
BidsComponent[entity]
type indexing.More generic components
Additionally, it would be very desirable to make
BidsComponent
more generic on two counts:Allowing templates with potentially missing entities. For instance, a component where some paths have an acquisition, others do not. A single coherent template could still be produced, for instance, as follows:
"sub-{subject}/ses-{session}/anat/sub-{subject}_ses-{session}{acquisition}_T1w.nii.gz"
.Components without any template. This would allow running on datasets with heterogeneous paths. The only requirement would be that each selected path have a unique set of entity-values. Note that such lack of restraint would likely need to be opt-in for most applications to maintain predictability.
Additional API is needed to ensure safe handling of the above allowances.
API
BidsComponent
BidsComponent[index]
Returns a
BidsEntry
ifindex
is anint
, or a newBidsComponent
ifindex
is aslice
. Note that this no longer selects one or more entities, as introduced in ImplementBidsPartialComponent
#243. To deprecate, we recognize that current entity selection exclusively uses strings. Therefore, we can safely have both string and integer indexing simultaneously, with string indexing deprecated.BidsComponent
would fully implement theSequence
API asSequence[BidsEntry]
.BidsComponent.filter(**filters)
Filter the entries of a component. Covered by
BidsComponent.filter
#202 and Extension ofBidsComponent.filter
api #335. No significant changes.BidsComponent.find(**entities)
Returns a single entry with the provided entities. Errors if no exact match is found.
BidsComponent.entities
To be removed. The name implies that all entities in the component template are accessible, which is not true. Entities as such are a property of an individual BIDS path.
BidsComponent.zip_lists
Removed. The name is not intuitive outside of Snakemake (even within Snakemake it's confusing), and it's implementation as a property hides that fact that it is somewhat computationally expensive
BidsComponent.to_dict()
Renamed version of
BidsComponent.zip_lists
with a more intuitive name.BidsComponent.unique()
Fulfills the current role of
Bidscomponent.entities
, using a more appropriate name.BidsComponent.wildcards
Unchanged. Ideally, this will be a simple list of the wildcards in the component, as the current implementation is very snakemake-centric. However, this change would severely impact a very central API, and is not a high priority.
BidsComponent.expand()
Unchanged. This method will be primarily kept for snakemake usage. The use of its arguments will be discouraged, as providing a string template would not safely handle components with optional entities.
BidsComponent.paths
Equivalent to
.expand()
, except that a list ofPath
is returned directly. Generally more appropriate than.expand()
.BidsComponent.pick()
Returns a new component with only the selected entities. A new template is created and entries are de-duplicated.
BidsComponent.drop()
As above, but removes the selected entities.
BidsComponent.produce()
Returns a new component with provided entities as new wildcards. Values are combined with existing entries using
product()
.BidsComponent.derive()
Modifies the template with provided entities, similar to the
bids()
function. Errors if any wildcards are set.BidsDataset
At this time, no significant deviations from #209 are planned. Some attributes, like
zip_lists
andentities
, will likely be removed.BidsEntry
BidsEntry[entity]
: Implemented asMapping[str, str]
.BidsEntry.__fspath__
: allows direct opening of the associated file.BidsEntry.path
: returns associated path aspathlib.Path
Beta Was this translation helpful? Give feedback.
All reactions