Skip to content

[8.18] (backport #8670) [testing] validate artifact hashes in artifact fetcher #8686

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: 8.18
Choose a base branch
from

Conversation

mergify[bot]
Copy link
Contributor

@mergify mergify bot commented Jun 26, 2025

What does this PR do?

This PR improves the robustness of the test artifact fetcher by validating the integrity of downloaded packages and introducing retry logic for both corrupted downloads and snapshot metadata fetch failures. Specifically, it:

  • Validates the .sha512 checksum of each downloaded artifact to detect corruption.
    • Automatically retries corrupted downloads, up to 3 attempts, using a constant backoff of 3 seconds.
  • Applies retry logic to snapshot metadata resolution (e.g., .json file for snapshot build IDs), which previously failed with transient errors.

Why is it important?

This PR addresses two known failure cases in upgrade test pipelines:

  • Corrupted downloads: Previously, corrupted .tar.gz artifacts would not be detected, leading to tar extraction errors such as flate: corrupt input before offset.... This caused test flakiness and debugging overhead.
  • Transient 502 errors during metadata fetch: Snapshot builds occasionally return 502 errors when fetching the latest snapshot manifest. Without retry logic, these tests fail unnecessarily.

By introducing checksum validation and retry logic, this PR makes upgrade tests more reliable and deterministic.

Checklist

  • I have read and understood the pull request guidelines of this project.
  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Disruptive User Impact

None. This change only affects internal test infrastructure and does not impact users of the Elastic Agent or its public APIs.

How to test this PR locally

mage unitTest

Related issues


This is an automatic backport of pull request #8670 done by [Mergify](https://mergify.com).

* feat: validate artifact hashes in artifact fetcher

* fix: do not load the whole file in memory

* fix: reuse existing VerifySHA512Hash to verify package hash

(cherry picked from commit 707c63f)
@mergify mergify bot added the backport label Jun 26, 2025
@mergify mergify bot requested a review from a team as a code owner June 26, 2025 07:07
@mergify mergify bot added the backport label Jun 26, 2025
@mergify mergify bot requested review from ycombinator and pchila and removed request for a team June 26, 2025 07:07
@github-actions github-actions bot added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team skip-changelog labels Jun 26, 2025
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants