Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add retries #15593

Closed
wants to merge 2 commits into from
Closed

add retries #15593

wants to merge 2 commits into from

Conversation

danmoseley
Copy link
Member

In dotnet/aspire, we have a log that is written during test runs and closed only at the end. It can take 10 sec-2 mins to unlock. Occasionally this causes

System.IO.IOException: The process cannot access the file 'D:\a_work\1\a\artifacts\log\dcp\dcpctrl-1741041174-9644.log' because it is being used by another process.

during artefact publishing. I added retries which should cause no effect in the normal case, but give it a chance of succeeding in this failure case.

I chose 10 because the time between retries would then have summed to over 2 minutes. https://learn.microsoft.com/en-us/azure/devops/pipelines/process/tasks?view=azure-devops&tabs=yaml#number-of-retries-if-task-failed

This assumes publish is idempotent, which @mmitche believes is true.

@danmoseley danmoseley requested review from wtgodbe and Copilot March 5, 2025 17:04

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR adds a retry mechanism to allow tasks affected by file locks during publishing to eventually succeed.

  • Adds a retryCountOnTaskFailure setting with a value of 10
  • Applies the change to both common and official job templates

Reviewed Changes

File Description
eng/common/templates/job/job.yml Adds a retry mechanism to artifact and log publishing steps
eng/common/templates-official/job/job.yml Adds a retry mechanism to artifact and log publishing steps

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

@@ -31,6 +31,7 @@ jobs:
PathtoPublish: '$(Build.ArtifactStagingDirectory)/artifacts'
ArtifactName: ${{ coalesce(parameters.artifacts.publish.artifacts.name , 'Artifacts_$(Agent.Os)_$(_BuildConfig)') }}
condition: always()
retryCountOnTaskFailure: 10 # for any logs being locked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this in the non-log case?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I figured it was another place where stuff is getting uploaded. I can remove it, as folks recommend

@danmoseley danmoseley requested a review from mmitche March 5, 2025 17:14
@danmoseley
Copy link
Member Author

@mmitche @radical would love to get this merged and flow today

@danmoseley
Copy link
Member Author

oh, Will pointed out I already put it in the aspire copies. So we just need to merge this to stop it getting overwritten

@mmitche mmitche closed this Mar 5, 2025
@mmitche mmitche reopened this Mar 5, 2025
@mmitche
Copy link
Member

mmitche commented Mar 5, 2025

@danmoseley:

image

@danmoseley
Copy link
Member Author

@mmitche how can I repro this locally? there also seems to be no logs for the pipeline for me to look at.

I haven't edited that file, so I assume it's because it's pulling in my change via 'template' but I don't see a chain that would do that. I don't know why it would be line 13, nor why it would be column 7 as the new lines are more than 7 indented.

@mmitche
Copy link
Member

mmitche commented Mar 6, 2025

@mmitche how can I repro this locally? there also seems to be no logs for the pipeline for me to look at.

I haven't edited that file, so I assume it's because it's pulling in my change via 'template' but I don't see a chain that would do that. I don't know why it would be line 13, nor why it would be column 7 as the new lines are more than 7 indented.

There's no local repo AFAIK. It's failing to parse and process the YAML

@mmitche
Copy link
Member

mmitche commented Mar 6, 2025

I think the issue is that /eng/common/core-templates/steps/publish-build-artifacts.yml doesn't have the template parameter 'retryCountOnTaskFailure'

@wtgodbe
Copy link
Member

wtgodbe commented Mar 10, 2025

I think the issue is that /eng/common/core-templates/steps/publish-build-artifacts.yml doesn't have the template parameter 'retryCountOnTaskFailure'

Oddly, aspire official builds haven't been failing (we already added this same change to aspire's eng/common folder directly): https://dev.azure.com/dnceng/internal/_build?definitionId=1309

Where did the original suggestion to add "retryCountOnTaskFailure" come from? Is that a well-known azdo parameter? If so I guess we just need to get it passed thru to publish-build-artifacts.yml ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants