Replies: 5 comments 1 reply
-
Just noting here that this is a long term API vision, not something to implement in a single PR. I don't think it even needs to be finished before 1.0, but it would be good to have things nailed down at least |
Beta Was this translation helpful? Give feedback.
-
Talked about this with a larger group at Dev Lunch today and got general buy-in so I think we can move forward with this for 1.0. |
Beta Was this translation helpful? Give feedback.
-
I've been going over the API proposal and trying to recall where we landed before Christmas. I may be mis-remembering some details, but in my reconceptualization of the problem I think I'm landing in a different place, different from what the proposal above says, but still fairly close. We'll definitely need to discuss this again. |
Beta Was this translation helpful? Give feedback.
-
Also just occurred to me that, as sometimes the order of files is important (e.g. when running a regression across image files with a design matrix), we should offer |
Beta Was this translation helpful? Give feedback.
-
Just updated the post to reflect our current consensus on the issue. |
Beta Was this translation helpful? Give feedback.
-
A need has come up to enable better filtering of
BidsDataset
, especially getting the intersection of multiple components (see #208). I wanted a propose a basic api to start handling these cases. The overarching strategy for me hear is to make lots of atomic operations that can be chained to achieve arbitrary goals, similar to the strategy in pandas and Xarray:#241
BidsDataset.filter(**filters)
**filters
refers toentity=values
pairs. This achieves a similar outcome as the currentfilter_list
, but acts on an entire dataset#202
BidsComponent.filter(**filters)
As above, but a single component
#240
BidsComponent.entities[<one or more entities>]
Allow the selection of multiple entities. No behaviour change if one entity provided, but if multiple, a
dict: entity -> list[values]
will be returned.BidsComponent.zip_lists[<one or more entities>]
As in
entities
, allow the selection of multiple entities.BidsComponent.wildcards[<one or more entities>]
As in
entities
, allow the selection of multiple entities.#243
BidsComponent[<one or more entities>] -> BidsPartialComponent
The main application motivating this is partial expansion, e.g.
Because removing entities from a component would disrupt some aspects, such as the
path
, it could only return a partial component that doesn't support the full api. We would also distinguish between a partial component with one entity and one with multiple entities so that the following would work:My hope is that the internal complexity would not be apparent to the user, but that the syntax would work in an intuitive way.
BidsComponent.drop(*entities) -> BidsPartialComponent
As above, but drops instead of selects
#244
BidsDataset[<one or more components>]
Extends the current api allowing the selection of just one component. If multiple components are provided, a new dataset is returned containing just those components. This is potentially useful in combination with intersection type calculations.
BidsDataset.drop(*components)
As above, but drops components instead of selecting
BidsDataset.with_entities(*entities, exact: bool = False) -> BidsDataset
Return dataset containing only the components with the given entities. Setting
exact=True
also filters out components with extra entities beyond the selected*entities
. Most likely use here is in combination withBidsDataset.expand
to expand over a consensus of specific entities.BidsDataset.without_entities(*entities) -> BidsDataset
As above, but inverse
#201
BidsComponent.expand(path, **extra_entities)
Discussed at length in
BidsComponent.expand()
#201#239
BidsDataset.path
The root path of the datset.
BidsDataset.wildcards[<one or more entities>]
Return
{"wildcard": "{snakemake_wildcard}"}
pairings. Any selected entities not found in any component would be silently ignored, allowing a generic version of the currentBidsDataset.subj_wildcards
.BidsDataset.entities[<one or more entities>]
An extension of
BidsComponent.entities
. In the simple case, with one entity in the selector, the entity values across all components which have the entity will be returned in a list. With multiple entities in the selector, adict[entity, list[values]
will be returned. If an entity is not found in any component, it could raise an error, or the entity could be ignored.If used as an iterator, or if
.items
,.values
, or.keys
is called, any entity appearing in at least one component will be considered.dict(BidsDataset.entities)
will be equivalent to selecting every single available entity.BidsDataset.zip_lists[<one or more entities>]
Returns the entity group consensus across all components.
itertools.product(*BidsDataset.entities[*selected_entities].values())
will be used as the baseline. In other words, all possible combinations of all values of the selected entities found across all components. Each such combination will be called a row. From this baseline, rows with values missing in one or more components will be filtered out. Components with just one of the selected entities will filter out all rows with entity values not found in the component. Components with multiple of the selected entities will filter all rows with entity combinations not found in the component. Components not containing any of the selected entities will not be considered.Lists are automatically de-duplicated prior to return. This is necessary because different components may have different numbers of entities, making meaningful comparison without de-duplication impossible:
Because of this, note that:
dict(BidsDataset.zip_lists)
will be equivalent toBidsDataset.zip_list[<every single entity...>]. If used as an iterator, or if any of
.keys,
.values, or
.itemsis called, and no selection made, it shall be treated as the
dict` case above.#242
BidsDataset.expand(path, **extra_entities)
As with
BidsComponent.expand(...)
, this shall be a shorthand forexpand(path, **BidsDataset.zip_list)
. Extra logic will be applied to ensure only the required wildcards are selected, and to allow the provision of additional wildcards.Breaking Ideas
Some parts of the above API (
BidsDataset.entities
andBidsDataset.zip_lists
) are breaking. Currently,BidsDataset
allows two basic patterns of access:The new API removes this redundancy and switches the first pattern to:
The legacy pattern stems from the pre
0.5
days whengenerate_inputs
only returned a dict. While there's a good chance a lot of oldish code uses the pattern, the change will open up the API and improve consistency.Beta Was this translation helpful? Give feedback.
All reactions