Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI is re-running successful skipped jobs #6466

Open
mhofman opened this issue Oct 18, 2022 · 3 comments
Open

CI is re-running successful skipped jobs #6466

mhofman opened this issue Oct 18, 2022 · 3 comments
Assignees
Labels
bug Something isn't working tooling repo-wide infrastructure vaults_triage DO NOT USE

Comments

@mhofman
Copy link
Member

mhofman commented Oct 18, 2022

Describe the bug

Because integration tests are label triggered, it's possible for these actions to end up running concurrently on the exact same commit. To avoid duplicated work, we use a 3rd party action to detect duplicate jobs, and wait for the results of pending jobs on identical commits. If the previous check succeeds, the actual job should be skipped and outcome carried over.

For some reason the latter part seem to no longer work correctly. While we wait until a concurrent job completes, we no longer skip if successful, and end up re-running the job, creating a worse case scenario of checking the same thing multiple times sequentially.

To Reproduce

See jobs:

Expected behavior

Correctly skip job if concurrent check is successful.

Platform Environment

  • Github Actions
@mhofman mhofman added bug Something isn't working tooling repo-wide infrastructure labels Oct 18, 2022
@mhofman mhofman self-assigned this Oct 18, 2022
@ivanlei ivanlei added the vaults_triage DO NOT USE label Jan 4, 2023
@mhofman
Copy link
Member Author

mhofman commented Aug 29, 2023

To work around this issue #8253 kept the concurrent cancellation config introduced during our stint with GitHub merge queues. However this is also inefficient in some cases, e.g. when a PR label is changed while the integration tests are still running.

@frazarshad
Copy link
Contributor

@mhofman i've tried replicating this issue by removing the concurrency key we have in integration.yml
This results in the use of check_and_cancel job in pre-check-integration.yml. but it seems to working as expected.
Here is the first job which completes: https://github.com/Agoric/agoric-sdk/actions/runs/10303452336
And here is the job that runs after it: https://github.com/Agoric/agoric-sdk/actions/runs/10303457883
It waits for the result as expected.

On a side note: Is our goal with this job to save time or resources?
if our goal is to save resources spent on github actions, wouldnt it be more efficient to have the previous job cancel? since two jobs running side by side will use up more github action minutes

@mhofman
Copy link
Member Author

mhofman commented Aug 8, 2024

but it seems to working as expected.

Oh that's wonderful, it looks like it doesn't re-run the job! Maybe it was a fluke. We should keep an eye on this (do we have a graph showing the runtime of the integration checks we could monitor?)

On a side note: Is our goal with this job to save time or resources?

The goal was to save time at the expense of resources. Integration jobs are long, and let's say you're into 30 minutes of a 45 minute job, apply a label unrelated (e.g. a classification label), currently we would cancel the 30 min deep job and restart from scratch. The goal was to keep that going if the PR content itself didn't change. I believe this would also happen in a much more common case of updating a PR when entering the merge queue:

  • push a small fixup commit and immediately apply the automerge label
  • all regular and integration checks would start, this would run on a simulated merge of the PR into the target
  • once the regular required checks would pass, the PR would enter the merge queue, and mergify would rebase onto the target (rebase strategy) or merge the target branch (squash strategy)
  • all regular and integration checks would start on a simulated merge of this new commit into the target branch.
  • For integration checks, it would realize that the content is the exact same, and would simply "watch" the status of the still running check before the rebase, then carry over the result of that check, avoiding having to restart the check from the beginning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tooling repo-wide infrastructure vaults_triage DO NOT USE
Projects
None yet
Development

No branches or pull requests

4 participants