Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retries to publish steps #7892

Merged
merged 1 commit into from
Mar 5, 2025
Merged

Add retries to publish steps #7892

merged 1 commit into from
Mar 5, 2025

Conversation

danmoseley
Copy link
Member

attempt to work around DCP log locking causing

System.IO.IOException: The process cannot access the file 'D:\a_work\1\a\artifacts\log\dcp\dcpctrl-1741041174-9644.log' because it is being used by another process.

@danmoseley danmoseley requested review from radical and Copilot March 4, 2025 23:59
@danmoseley
Copy link
Member Author

I chose 10 because the time between retries would then have summed to over 2 minutes. https://learn.microsoft.com/en-us/azure/devops/pipelines/process/tasks?view=azure-devops&tabs=yaml#number-of-retries-if-task-failed

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This PR adds a retry mechanism to address file locking issues encountered during artifact and log publishing in the pipeline.

  • Introduces a new parameter (retryCountOnTaskFailure: 10) in the artifact publishing step.
  • Adds the same retry mechanism to the logs publishing step, in both the common and official job templates.

Reviewed Changes

File Description
eng/common/templates/job/job.yml Adds retryCountOnTaskFailure for both artifact and log publishing steps
eng/common/templates-official/job/job.yml Adds retryCountOnTaskFailure for artifact and log publishing steps

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

Comments suppressed due to low confidence (4)

eng/common/templates/job/job.yml:49

  • [nitpick] Consider adding tests to verify that the new retry mechanism resolves file locking issues during artifact publishing.
retryCountOnTaskFailure: 10 # for any logs being locked

eng/common/templates/job/job.yml:60

  • [nitpick] Consider adding tests to verify that the new retry mechanism resolves file locking issues during log publishing.
retryCountOnTaskFailure: 10 # for any logs being locked

eng/common/templates-official/job/job.yml:33

  • [nitpick] Consider adding tests to verify that the new retry mechanism resolves file locking issues during artifact publishing in the official template.
retryCountOnTaskFailure: 10 # for any logs being locked

eng/common/templates-official/job/job.yml:42

  • [nitpick] Consider adding tests to validate that the retry mechanism properly handles file locking during log publishing in the official template.
retryCountOnTaskFailure: 10 # for any logs being locked
@danmoseley danmoseley enabled auto-merge (squash) March 5, 2025 01:06
@danmoseley danmoseley merged commit 33a7fcc into dotnet:main Mar 5, 2025
73 checks passed
@danmoseley danmoseley deleted the retries1 branch March 5, 2025 15:14
@@ -30,6 +30,7 @@ jobs:
PathtoPublish: '$(Build.ArtifactStagingDirectory)/artifacts'
ArtifactName: ${{ coalesce(parameters.artifacts.publish.artifacts.name , 'Artifacts_$(Agent.Os)_$(_BuildConfig)') }}
condition: always()
retryCountOnTaskFailure: 10 # for any logs being locked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to eng/common will get overwritten by the next update from dotnet/arcade - looks like that happens once per month in Aspire. Any idea why publishing is failing so frequently here?

@@ -30,6 +30,7 @@ jobs:
PathtoPublish: '$(Build.ArtifactStagingDirectory)/artifacts'
ArtifactName: ${{ coalesce(parameters.artifacts.publish.artifacts.name , 'Artifacts_$(Agent.Os)_$(_BuildConfig)') }}
condition: always()
retryCountOnTaskFailure: 10 # for any logs being locked
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to eng/common will get overwritten by the next update from dotnet/arcade - looks like that happens once per month in Aspire. Any idea why publishing is failing so frequently here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can instead add a step to explicitly wait for dcp/dcpctrl processes to end.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-actions github-actions bot added the area-codeflow for labeling automated codeflow. intentionally a different color! label Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-codeflow for labeling automated codeflow. intentionally a different color!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants