-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: Bundle Types #86
Changes from 3 commits
82b1302
4734480
c53a3ce
b4d5f5e
43e0d86
4d0a588
2c4a65e
6f3c135
f6d2cad
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
### DCP PR: | ||
|
||
***Leave this blank until the RFC is approved** then the **Author(s)** must create a link between the assigned RFC number and this pull request in the format:* | ||
|
||
`[dcp-community/rfc#](https://github.com/HumanCellAtlas/dcp-community/pull/<PR#>)` | ||
|
||
# RFC: Bundle Types | ||
|
||
## Summary | ||
|
||
Formalizes data bundle types, definitions, target users/consumers, and specific use cases to improve clarity and | ||
confidence among DCP developers when working with the DCP data model. | ||
|
||
## Author(s) | ||
|
||
* [Mallory Freeberg](mailto:[email protected]) | ||
* [Brian Hannafious](mailto:[email protected]) | ||
* [Hannes Schmidt](mailto:[email protected]) | ||
* [Andrey Kislyuk](mailto:[email protected]) | ||
|
||
## Shepherd | ||
***Leave this blank.** This role is assigned by DCP PM to guide the **Author(s)** through the RFC process.* | ||
|
||
*Recommended format for Shepherds:* | ||
|
||
`[Name](mailto:[email protected])` | ||
|
||
## Motivation | ||
|
||
There are currently no documented definitions for what a "bundle" is in the HCA DCP, including no definitions of what | ||
they are, what they contain, or who they support. Lack of clarity causes confusion for DCP developers and data consumers | ||
about how the data and metadata in the DCP are structured and organized. | ||
|
||
### User Stories | ||
|
||
1. As a tool developer, I would like to identify and get raw data in the DSS so that I can run my data processing and | ||
analysis tools. | ||
|
||
1. As a tool developer, I would like to identify and get alignment results and count matrices in order to produce | ||
expression matrices. | ||
|
||
1. As a computational biologist, I would like to get expression matrices for the Immune Cell Atlas project so that I can | ||
do my research. | ||
|
||
1. As an external user of the DSS who is developing analysis tools, I would like to browse, query, and get raw and | ||
processed data based on some rough criteria so that I can understand what the data are and develop my tool. | ||
|
||
1. As an Azul or Query Service developer I want to be able to distinguish between different types of bundles in order to | ||
know which bundle to index, link, etc. or to provide bundle types to users as a search criterion. | ||
|
||
1. As a computational biologist, I would like to know where the reference data is. | ||
jahilton marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## Detailed Design | ||
|
||
A new field, `bundle-type`, will be introduced to the DSS bundle manifest (the response to `GET bundle` in DSS). The | ||
field value is a string formatted in a similar manner | ||
to [DCP Media Types](https://docs.google.com/document/d/1TqihrgXjct9aDmTJO52_gE2WlpFysB1OkG9C8exmWTw) and consistent | ||
with the syntax defined in [RFC 7231](https://tools.ietf.org/html/rfc7231#section-3.1.1.1). DSS must require the bundle | ||
type to be specified when creating a new bundle, but should not enforce the vocabulary. DSS must allow subscriptions | ||
and search results to be predicated on bundle type using the same string predicates available with other metadata and | ||
manifest string fields. | ||
|
||
### Registry of Bundle Types | ||
|
||
Introduce a registry of bundle types to be maintained by the Metadata team. The initial contents of the registry are: | ||
kislyuk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
| Bundle Type | Example `bundle-type` value | | ||
|--------------------------|-----------------------------------------------------------| | ||
| Project Metadata Bundle | `hca/project` | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It's my understanding that "Project Metadata Bundle" isn't just classifying an existing type of bundle, but introducing new bundling, right? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes. A bundle idea that has been kicked around for a long time. Nobody has tried to justify, tightly define or implement it yet to my knowledge. |
||
| Primary Sequence Bundle | `hca/primary-data; hca-data-type:SmartSeq2` | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Nevermind, I just saw the discussion about this field here: #86 (comment) |
||
| Primary Imaging Bundle | `hca/primary-data; hca-data-type:sptx` | | ||
| Secondary Analysis Bundle| `hca/analysis-output; hca-pipeline:snap-atac` | | ||
| Reference Resource Bundle| `hca/analysis-support; hca-resource-type:reference-genome`| | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What was the reasoning behind naming this bundle-type value There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @samanehsan your suggestion sounds good to me, adopting. Thanks! |
||
| Expression Matrix Bundle | `hca/analysis-output; hca-resource-type:expression-matrix`| | ||
|
||
### Unresolved Questions | ||
|
||
- Bundle degrees: Useful? Who assigns them? Who uses them? | ||
kislyuk marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- Is there actually added value to not having to update all bundles when e.g. the project is updated? | ||
- Versioned/unversioned references? | ||
|
||
### Prior Art | ||
|
||
[Submission to bundles SOP](https://docs.google.com/document/d/1x8mYLU8ubpZtTrzkwJrft1heqReX-pJLjJfb2X0fd2w) (Apr 2018) | ||
[Bundle metadata schema update ticket](https://github.com/HumanCellAtlas/metadata-schema/issues/986) | ||
|
||
### Alternatives | ||
|
||
- Status Quo: no well-defined bundle types. Existing DCP components continue to infer bundle types via ad hoc methods | ||
and included metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a syntax note, numbering is not incrementing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TimothyTickle that is intentional. Markdown supports automatic numbered list incrementing, so that items can be re-arranged more easily. The rendered version increments the list properly.