-
Notifications
You must be signed in to change notification settings - Fork 360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add retries #15593
add retries #15593
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This PR adds a retry mechanism to allow tasks affected by file locks during publishing to eventually succeed.
- Adds a retryCountOnTaskFailure setting with a value of 10
- Applies the change to both common and official job templates
Reviewed Changes
File | Description |
---|---|
eng/common/templates/job/job.yml | Adds a retry mechanism to artifact and log publishing steps |
eng/common/templates-official/job/job.yml | Adds a retry mechanism to artifact and log publishing steps |
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
@@ -31,6 +31,7 @@ jobs: | |||
PathtoPublish: '$(Build.ArtifactStagingDirectory)/artifacts' | |||
ArtifactName: ${{ coalesce(parameters.artifacts.publish.artifacts.name , 'Artifacts_$(Agent.Os)_$(_BuildConfig)') }} | |||
condition: always() | |||
retryCountOnTaskFailure: 10 # for any logs being locked |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need this in the non-log case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured it was another place where stuff is getting uploaded. I can remove it, as folks recommend
oh, Will pointed out I already put it in the aspire copies. So we just need to merge this to stop it getting overwritten |
@mmitche how can I repro this locally? there also seems to be no logs for the pipeline for me to look at. I haven't edited that file, so I assume it's because it's pulling in my change via 'template' but I don't see a chain that would do that. I don't know why it would be line 13, nor why it would be column 7 as the new lines are more than 7 indented. |
There's no local repo AFAIK. It's failing to parse and process the YAML |
I think the issue is that /eng/common/core-templates/steps/publish-build-artifacts.yml doesn't have the template parameter 'retryCountOnTaskFailure' |
Oddly, aspire official builds haven't been failing (we already added this same change to aspire's eng/common folder directly): https://dev.azure.com/dnceng/internal/_build?definitionId=1309 Where did the original suggestion to add "retryCountOnTaskFailure" come from? Is that a well-known azdo parameter? If so I guess we just need to get it passed thru to |
In dotnet/aspire, we have a log that is written during test runs and closed only at the end. It can take 10 sec-2 mins to unlock. Occasionally this causes
during artefact publishing. I added retries which should cause no effect in the normal case, but give it a chance of succeeding in this failure case.
I chose 10 because the time between retries would then have summed to over 2 minutes. https://learn.microsoft.com/en-us/azure/devops/pipelines/process/tasks?view=azure-devops&tabs=yaml#number-of-retries-if-task-failed
This assumes publish is idempotent, which @mmitche believes is true.