-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC: DSS Bundle Enumeration #101
base: master
Are you sure you want to change the base?
Changes from 1 commit
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
### DCP PR: | ||
|
||
***Leave this blank until the RFC is approved** then the **Author(s)** must create a link between the assigned RFC number and this pull request in the format:* | ||
|
||
`[dcp-community/rfc#](https://github.com/HumanCellAtlas/dcp-community/pull/<PR#>)` | ||
|
||
# RFC: Bundle Enumeration of the DCP Data Store (DSS) | ||
|
||
## Summary | ||
|
||
This RFC proposes bundle enumeration endpoint(s) for the DSS. | ||
|
||
## Author(s) | ||
|
||
* [Brian Hannafious](mailto:[email protected]) | ||
|
||
## Shepherd | ||
***Leave this blank.** This role is assigned by DCP PM to guide the **Author(s)** through the RFC process.* | ||
|
||
*Recommended format for Shepherds:* | ||
|
||
`[Name](mailto:[email protected])` | ||
|
||
## Motivation | ||
|
||
Currently, the only means of listing bundles stored in the DSS is the internal Elasticsearch (ES) metadata index, which | ||
must be kept current with object storage. The DSS should provide bundle enumeration independently of the ES metadata index, | ||
emphasizing consistency and scalability. | ||
|
||
### User Stories | ||
|
||
* As a downstream service developer, I would like to enumerate the bundle contents of the DSS so I can create my own | ||
index. | ||
|
||
* As a downstream service developer, I would like to check if my index contains all the bundles in the DSS. | ||
|
||
## Detailed Design | ||
|
||
A new bundle enumeration endpoint, `GET /bundles`, will be introduced, taking replica and prefix parameters. These | ||
xbrianh marked this conversation as resolved.
Show resolved
Hide resolved
|
||
parameters will be used to return a paginated listing of bundles directly from object storage. Pagination semantics | ||
and all other semantics of this route will be in line with the established conventions of the DSS API. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Without some kind of filtering, this seems like a very heavyweight endpoint to use. A couple of things come to mind:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This endpoint is intended for heavyweight use by downstream indexers. Also, an incremental approach seems preferable: if filtering becomes desirable in the future, it can be added to the endpoint. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed that this is easy to add downstream. Assuming a full dump is what existing users want, then my speculative use is not a real use case ;-) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @diekhans you raise a good point, but as @xbrianh pointed out, the use case here is an unfiltered bulk pull of all bundle IDs for external indexing. We did look for a way to use a "lightweight" database to do filtering using our established filter language process (JMESpath), but didn't find any suitable database/indexing engine for such a task. |
||
### Unresolved Questions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was expecting something a bit more swagger-y rather than a narrative description in Detailed Design for a API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@brianraymor It's not clear to me how to address your comment. Perhaps you have something in mind similar to the API endpoint descriptions found in the Deletion RFC.
However, those additions will not necessarily improve the clarity and actionability of this document, which defines a simple extension to the DSS API in language that I believe is clear to the developers who will implement it. Do you feel there are technical details missing, or that there is ambiguity that needs to be cleared up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Deletion RFC more closely meets my expectations for a RFC as design document. Another approach is how Azure defines their REST APIs. Any reviewer/developer should be able to read this RFC and understand the DSS API in detail - not just the implementers. This currently reads more like what we called one pagers at Microsoft.