Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add total_runs in response of GET /runs endpoint #183

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

Mifrill
Copy link

@Mifrill Mifrill commented Aug 1, 2022

The property total_runs allows to implement the pagination with the references for each page, includes the last page and showing number of pages.

Related to #159 (comment)

@Mifrill Mifrill force-pushed the add-total_runs-in-get-runs-endpoint-response branch 3 times, most recently from 611152b to b9f7d0b Compare August 1, 2022 17:21
@Mifrill Mifrill force-pushed the add-total_runs-in-get-runs-endpoint-response branch from b9f7d0b to ea05099 Compare August 1, 2022 17:21
@patmagee
Copy link
Contributor

@Mifrill thanks for the PR! I would be really interested to know your use case and what you are trying to solve with the total_runs (outside of the conversation on the linked issue). I do not think this would add TOO much of a burden on engines, although I will say in some cases it is simply not possible to know the total number of runs without performing an extremely expensive operation. So I wonder if this is a required field, or if maybe there is a generic way to represent optional summary metadata in the RunListResponse.

elasticsearch has the concept of "hits" which may not really be applicable here since this is not a search context, but maybe "metadata" or some other field:

{
   "runs": [],
   "metadata": {
     "total_runs": 5000
   }
}

@patmagee patmagee self-requested a review August 26, 2022 20:14
@cjllanwarne
Copy link

+1 this seems pretty important for neat pagination

total_runs:
type: integer
description: >-
Total count of workflow runs that the service has executed or is executing and the caller has permission to see.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion for brevity, and future-proofing (in case we ever add the ability to filter results from this endpoint). I'd also suggest replacing the words "workflow runs" with just "runs", per the WES terminology.

Suggested change
Total count of workflow runs that the service has executed or is executing and the caller has permission to see.
Total count of runs in the result set.

Copy link
Author

@Mifrill Mifrill Sep 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case we ever add the ability to filter results from this endpoint

not sure I follow.
The sentence has executed or is executing is looks generic in terms of workflow-execution-service-schemas.
The sentence the caller has permission to see didn't set any constraints in case of policy scope implementation, the records might be across entire data as well.

"workflow runs" with just "runs", per the WES terminology

As I see in other examples, the workflow runs preferred:
Selection_11023
https://github.com/ga4gh/workflow-execution-service-schemas/blob/develop/openapi/components/schemas/RunId.yaml

Selection_11021
https://github.com/ga4gh/workflow-execution-service-schemas/blob/develop/openapi/components/schemas/RunListResponse.yaml

Selection_11022
https://github.com/ga4gh/workflow-execution-service-schemas/blob/develop/openapi/components/schemas/RunStatus.yaml

@Mifrill
Copy link
Author

Mifrill commented Sep 13, 2022

@patmagee thanks for the review!

what you are trying to solve with the total_runs

The total count required to implement "orthodox" pagination, by dividing the total count by page size we can provide a direct link for any page. Rather than in the current implementation only possible to go next page or the previous page.

it is simply not possible to know the total number of runs without performing an extremely expensive operation

In my understanding, the implementation of the counter is the responsibility of each particular service, hence the possible performance issues should be handled on the specific service side. There are many possibilities to avoid performing expensive operations like counter field implementation or last-document-index etc. it fully depends on a particular service's inner logic, so it's not related to the schema.

if maybe there is a generic way to represent optional summary metadata in the RunListResponse.

I don't think it is a good way to go because if we will follow that logic, the filed next_page_token also should be moved to some metadata wrapper field:
https://github.com/Mifrill/workflow-execution-service-schemas/blob/ea050992fd0d95a9624024bf2f14977a905ff9c7/openapi/components/schemas/RunListResponse.yaml#L9

@patmagee
Copy link
Contributor

@Mifrill thanks for the clarification. I think your suggestion makes a lot of sense, and I wonder if there is a way to frame this PR in a forward looking way and consider what we would need to improve pagination in general (starting with a total_runs). There is this open issue, but it has not really gotten any steam lately. I know trs (@denis-yuen) has a specific approach that the settled on which is externally documented, so I wonder if we should likewise investigate what paging schemas are there in the wild and what fits with WES in a holistic way

@patmagee
Copy link
Contributor

@Mifrill would you mind updating your changes against the current develop branch? We merged all of the openapi components into a single file to better work with our tooling

@Mifrill
Copy link
Author

Mifrill commented Nov 28, 2022

@patmagee thanks for the ping 👍. Updated

@uniqueg
Copy link
Contributor

uniqueg commented Jul 17, 2023

On this subject, I would like to draw attention to the draft (see link in next comment by @jaeddy) of the upcoming GA4GH pagination style guide (being drafted to address ga4gh/TASC#29).

The style guide is not very prescriptive at the moment (I think it's rather a survey of currently used pagination solutions across the GA4GH API landscape with a discussion of pros and cons) and is definitely open for feedback.

@jaeddy
Copy link
Member

jaeddy commented Jul 17, 2023

FYI @uniqueg - I believe this is the latest version of the pagination guide from @andrewyatz and @mamanambiya.

@andrewyatz
Copy link

FYI @uniqueg - I believe this is the latest version of the pagination guide from @andrewyatz and @mamanambiya.

Yes this is the right one

@uniqueg
Copy link
Contributor

uniqueg commented Jul 18, 2023

Thanks for pointing that out @jaeddy 🙏

@Mifrill
Copy link
Author

Mifrill commented May 15, 2024

Hey @patmagee, @cjllanwarne just a gentle reminder, please have a look at this PR when you get a time 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants