-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/issue 334 drs bulk requests #365
Conversation
@@ -103,8 +103,12 @@ x-tagGroups: | |||
paths: | |||
/objects/{object_id}: | |||
$ref: ./paths/objects@{object_id}.yaml | |||
/bulk/objects: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about making this endpoint at just POST /objects
rather than POST /bulk/objects
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I'll make that change.
We do need to figure out how to enable multi-part POST here so it can be used with Passports. this is a TODO
schema: | ||
type: object | ||
properties: | ||
object_ids: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about renaming this property to selection
to harmonize with the DRS + Passport downscoping thread.
@kwrodarmer @mbarkley WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So "object_ids" --> "selection"? I'm fine with that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It started as selection and, with some discussion, we consciously went to object_ids as there may be other use cases than "selection" for how you get to a list of ids that you want to request. object_ids is neutral as to use case, rather than projecting a particular use case on to DRS which has some generality.
schema: | ||
type: object | ||
properties: | ||
drsobjects: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thoughts on the response payload of a bulk DRS request:
- Currently
drsobjects
is a flat list, would it be advantageous to make it an object with the supplied IDs as keys, and the corresponding DRS Object as the value? - There could be a mix of valid and invalid IDs in the overall selection (e.g. no resource by an ID, client not authorized to access that DRS Object). Should we make a slightly more complex object that clearly outlines which IDs could not be returned as Objects and why?
Taking the above points together, here is a proposed example response payload:
{
"summary": {
"requested": 5,
"loaded": 2,
"unloaded": 3
},
"loadedDrsObjects": {
"123": {
"id": "123",
"name": "DRS Object 123",
...
},
"456": {
"id": "456",
"name": "DRS Object 456",
...
}
},
"unloadedDrsObjects": {
"777": 404,
"778": 404,
"779": 401
}
}
In the above example, the response payload summarizes how many IDs were passed, how many DRS Objects were successfully retrieved, and how many were unsuccessful. Each retrieved DRS Object can be referenced by its ID. Each non-retrieved DRS Object comes with an explanation (in the form of HTTP code) why it wasn't retrieved.
It's a more complex payload overall, but provides more information to the client about how the request was processed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's discuss on today's call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments on the proposed format:
Summary
Depending on the number of items that are requested and where we end up falling on pagination in a bulk request, the summary information may not be known ahead of time when the initial response is sent. Its possible this number may be per page, instead of in total.
Results Format
It may be nice to return the objects as an array with the ordering mirroring the order of the requested Id's This leads to less surprises and allows the client to iterate over the objects in the order they expect. Additionally, since this is a bulk operation we could interleave the errors that are encountered directly in the array at the appropriate index. (this is using the error message format from beacon here
for example, if I request both [456,457]
the following may be a valid response
{
"drsObjects" [
{
"id": 456,
"name": "..."
},
{
"id": 457,
"error": {
"errorCode": 404,
"errorMessage": "DRS object does not exist"
}
}
]
}
See also - #377 for a version of this which drops "contents" and "expand". |
Comments are relative to the built doc: Same comments generally apply to POST /objects/access |
Thanks @bcli4d this is great feedback. I'll make these changes/clarifications since they all sound totally reasonable. |
pulling in the latest dev branch to this feature branch
FYI, this branch is building properly even though it's a PR. So I'm not going to use (and will delete) the v2 of this feature branch. |
Added 413 error based on @bcli4d 's comment above |
…e to require authorization
We took a look at this PR and we do like the general approach and how it is set-up. One request we would like to advance, similar to what @bcli4d has mentioned and @briandoconnor started to solve is the length of the We understand the 413 is a good error message, however we would like to have it written as an integer variable somewhere, so the client can figure out before starting, what is the allowed length. We happy to have this in service info or somewhere else, but it must be explicit, and mandatory, so the client can build its own logic with that in mind. Do we know if other GA4GH API specified specific variable/info in the service_info response, which it's useful for that API and if there is a canonical format to do it? |
So looping back on the remaining issues/comments here:
|
@dglazer gave the feedback of what happens when the DRS server doesn't support bulk? Is the field still required? To be "1"? |
I think defaulting to 1 in the absence of the field is OK. |
I'm "+0" on my own suggestion -- I think it's clean to say "you don't have to support batch, and if you don't you shouldn't have to specify a max batch size". But I'm only +0 since, as Brian pointed out in person, my initial concern about breaking backwards compatibility was overblown, since compatibility only kicks in after you've already decided to upgrade to a new API version. |
Just as a heads up, I'm planning on merging this into develop and opening a 1.4 release branch (which will have a request for comment period by drivers/implementers) by the end of GA4GH Connect 2023. Unless I hear an urgent problem that needs to be fixed of course... |
Thanks everyone, I don't see any remaining issues. Merging into develop now. If you see more issues I'll open a release branch for comments. Always remember, this is going to develop now so if you see an issue you can always open up a new PR to fix it before we finalize 1.4 |
Goal
We want to have this merged into develop for DRS 1.4 for the 2023 Plenary
Background
See Issue #334
A PR request for bulk operations in DRS. As of 5/22/23 we have the complete set of bulk endpoints for authorization information, DRS IDs, and DRS access methods. This PR does not include pagination nor does it include explicit pairing of passports to the output of a bulk response. For pagination, I think we should move bulk forward and cleanly separate out pagination as a different feature/PR. For pairing, the feature branch now has bulk options support, so you can 1) ask the DRS server what authorization mechanisms are needed for a bulk list of DRS IDs and then 2) make one or more bulk requests where you pair the correct passport(s) or bearer tokens in your request.
Built Docs
See the built doc here: https://ga4gh.github.io/data-repository-service-schemas/preview/feature/issue-334-drs-bulk-requests/docs/#tag/Objects/operation/GetBulkObjects/
For more information...
See the Hackathon notes from our June FASP hackathon. Including this gist
See also Feature/issue 334 bulk requests nobundle from Ian and DRS bundle contents pagination from Jeremy