Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different Augur versions between amd64 and arm64 images #128

Closed
victorlin opened this issue Feb 21, 2023 · 20 comments
Closed

Different Augur versions between amd64 and arm64 images #128

victorlin opened this issue Feb 21, 2023 · 20 comments
Assignees
Labels
bug Something isn't working

Comments

@victorlin
Copy link
Member

victorlin commented Feb 21, 2023

Current Behavior

It is possible that images of the same tag for different architectures have different Augur versions.

Expected behavior

Images of the same tag for different architectures should be guaranteed to have the same Augur version.

How to reproduce

Not sure that this is easily reproducible, but it was brought up by @corneliusroemer in Slack for the Augur 21.0.0 → 21.0.1 update.

In the triggered image rebuild, the two images pulled different versions of Augur:

2023-02-17T19:29:48.6526581Z #90 [linux/amd64 builder 42/47] RUN /builder-scripts/download-repo https://github.com/nextstrain/augur "$(/builder-scripts/latest-augur-release-tag)" .  && pip3 install --editable .
… nextstrain-augur==21.0.0

2023-02-17T19:30:52.5529403Z #99 [linux/arm64 builder 42/47] RUN /builder-scripts/download-repo https://github.com/nextstrain/augur "$(/builder-scripts/latest-augur-release-tag)" .  && pip3 install --editable .
… nextstrain-augur==21.0.1

A relevant line from the Augur 21.0.1 release job logs shows when the new version was published to PyPI:

2023-02-17T19:28:11.7789440Z https://pypi.org/project/nextstrain-augur/21.0.1/

My speculation is that the PyPI endpoint used to retrieve the version was only updated sometime within the 1 minute difference between the two invocations of latest-augur-release-tag.

Possible solution

  1. Determine the latest Augur version using the latest GitHub Release or by parsing __version__.py, instead of a PyPI endpoint that has some lag time.
  2. Only trigger the Docker image rebuild after the new Augur version is known to be available at the PyPI endpoint used to retrieve the version.
  3. Add tests to verify that Augur versions (and other software versions) are the same before publishing the public final images.
  4. Catch this sort of thing programmatically by testing more than just the zika workflow. I think we can pretty easily run the existing CI workflows using pathogen-repo-ci (monkeypox, ncov) with a bit of tinkering to allow specifying the Docker image used for the runtime.
@victorlin victorlin added the bug Something isn't working label Feb 21, 2023
@victorlin victorlin self-assigned this Feb 21, 2023
@corneliusroemer
Copy link
Member

corneliusroemer commented Feb 21, 2023

Agree with your analysis, that's also what I figured out. Though the logs show [linux/arm64 builder] twice, rather than the required amd once and arm the other time.

This is the AMD line: https://github.com/nextstrain/docker-base/actions/runs/4206926139/jobs/7301049323#step:8:812 (unfortunately no time stamp)

And this is the ARM line, later: https://github.com/nextstrain/docker-base/actions/runs/4206926139/jobs/7301049323#step:8:1070

Could be more than 10 seconds between them, in facts it's 1 minute between them.

Ah, here we go (https://pipelines.actions.githubusercontent.com/serviceHosts/ecb1df80-1e0b-475b-ba1c-8c7d279a5a11/_apis/pipelines/1/runs/859/signedlogcontent/2?urlExpires=2023-02-21T17%3A59%3A59.6428270Z&urlSigningMethod=HMACV1&urlSignature=YSQN5AfBl%2FDruTkTH8UfPd3LAHtPS3B2hop3Ak%2FLT6U%3D)

2023-02-17T19:29:48.6526581Z #90 [linux/amd64 builder 42/47] RUN /builder-scripts/download-repo https://github.com/nextstrain/augur "$(/builder-scripts/latest-augur-release-tag)" .  && pip3 install --editable .

2023-02-17T19:30:52.5529403Z #99 [linux/arm64 builder 42/47] RUN /builder-scripts/download-repo https://github.com/nextstrain/augur "$(/builder-scripts/latest-augur-release-tag)" .  && pip3 install --editable .

We should just not get the release version from Pypi, but use the release as tagged in Github releases in the Augur repo directly! Or use the latest tag.

The push to Pypi happened only at 19:28, around 1.5 minutes before amd64 looked for the latest version, so the mismatch is not quite unexpected (see https://pipelines.actions.githubusercontent.com/serviceHosts/886afdd6-e5c9-427c-ae26-e921526cf955/_apis/pipelines/1/runs/1128/signedlogcontent/2?urlExpires=2023-02-21T18%3A03%3A46.8786006Z&urlSigningMethod=HMACV1&urlSignature=Wm4lfBHpTiXQejLUsA2V%2F7Rs0BSPK%2FAjd%2FEN2aDCLq8%3D)

2023-02-17T19:28:08.3881622Z Uploading distributions to https://upload.pypi.org/legacy/

@huddlej
Copy link
Contributor

huddlej commented Feb 21, 2023

We should just not get the release version from Pypi, but use the release as tagged in Github releases in the Augur repo directly! Or use the latest tag.

Yeah, if we're pulling the code from GitHub, we might as well use the latest version on GitHub. The latest tag is probably too fragile, since we can add non-version tags at any time that could be technically the latest. We store the version info in Augur's code itself, which might be the most reliable way to find the latest version tag. We could modify latest-augur-release-tag to run download-repo and grep/head/awk/whatever the version from augur/__version__.py.

Or maybe we could pass the version information from the Augur release CI job to the docker-base rebuild CI job? Not sure if GH Action workflows accept arbitrary command line arguments...

@corneliusroemer
Copy link
Member

Yeah, if we're pulling the code from GitHub, we might as well use the latest version on GitHub.

Exactly! Why not query for the latest Github release? Since we now make these in an automated fashion that should work?

@huddlej
Copy link
Contributor

huddlej commented Feb 21, 2023

Do you mean the latest GitHub "Release"? Those aren't automated; we still manually copy/paste content to create them. Although, automating those and then pulling the latest release would be great!

@corneliusroemer
Copy link
Member

Oh I didn't know this wasn't automated. Then my proposal doesn't work, although automation of the GitHub release should not be too hard, we could easily create some release and edit the notes later.

@joverlee521
Copy link
Contributor

We should just not get the release version from Pypi, but use the release as tagged in Github releases in the Augur repo directly! Or use the latest tag.

Hmm, changing the release version source to GitHub wouldn't completely solve the issue. If someone created a new release on GitHub while the job is running then we could end up with the same issue.

I think the most stable path would be to add checks to verify the versions are the same between the two images. Either error if they differ for us to intervene or automatically re-run the build that has the earlier version.

@corneliusroemer
Copy link
Member

If someone created a new release on GitHub while the job is running then we could end up with the same issue.

But why would that happen? Why would someone create a new release minutes after a previous one? Even then, this would trigger another docker build which would use the latest release - so only the intermediate docker build (not the latest) would be at risk of having different versions.

A simple delay should fix it in almost all situations, using tags would be possible, just mean we can't use tags for anything else - though we've never used tags for anything but versions. Checking versions are the same between the images (not just augur but also other packages) is a bonus.

image

@joverlee521
Copy link
Contributor

But why would that happen? Why would someone create a new release minutes after a previous one?

I'm not saying it would happen, just that if it does happen we will see the same issue again.

Even then, this would trigger another docker build which would use the latest release - so only the intermediate docker build (not the latest) would be at risk of having different versions.

True, but I thought we want to avoid the two images differing at all, not just in the latest.
Ah, actually should we delete the nextstrain/base:build-20230217T192833Z images from docker hub or somehow label them as differing?

@victorlin
Copy link
Member Author

@corneliusroemer

Though the logs show [linux/arm64 builder] twice, rather than the required amd once and arm the other time.

Oops good catch! I'll update the issue description.

although automation of the GitHub release should not be too hard

I just picked up on an old PR which implements this: nextstrain/augur#957

@victorlin
Copy link
Member Author

@huddlej

if we're pulling the code from GitHub, we might as well use the latest version on GitHub.

This is a good point that I didn't realize! I agree, there is no good reason to use the version from PyPI unless Augur is being downloaded from there.

Once GitHub Releases are automated, I imagine pulling from that would be easier than parsing __version__.py.

@victorlin
Copy link
Member Author

@joverlee521

True, but I thought we want to avoid the two images differing at all, not just in the latest.

Yes, this is still a good aim! 2 separate issues are being discussed here:

  1. devel/latest-augur-release-tag unnecessarily sources the version from PyPI when Dockerfile does not use PyPI.
  2. Build commands in the Dockerfile can produce different results at different times. Checks should be put in place to reduce the effects of this when building a multi-arch image (which is really 2 images built separately).

(1) can be addressed by changing the release version source, and (2) can be addressed by adding tests before publishing the public final images.

Ah, actually should we delete the nextstrain/base:build-20230217T192833Z images from docker hub or somehow label them as differing?

I think it's best to leave it there, like how we don't "un-release" other software versions once tagged/released. Most use cases of the Docker image should automatically pull the latest, and it's recommended for others to manually pull the latest via nextstrain update.

@joverlee521
Copy link
Contributor

  1. devel/latest-augur-release-tag unnecessarily sources the version from PyPI when Dockerfile does not use PyPI.

Conversation from the original PR that added the PyPI release version source is relevant here. We should decide on if we want to stick with GitHub as the source for Augur in docker-base or move to pull Augur from PyPI to match the other runtimes.

@victorlin
Copy link
Member Author

@joverlee521 thanks for pointing that out, I had forgotten. Created a separate issue #129, since the topic is tangential from the original issue here.

@corneliusroemer
Copy link
Member

pull Augur from PyPI to match the other runtimes

Our conda-base environment pulls from bioconda 🙃 So there's already inconsistency anyways. Using Github means we don't rely on servers that may take variable amounts of time to show changes.

What other runtimes do we have?

@tsibley
Copy link
Member

tsibley commented Feb 27, 2023

@corneliusroemer But Bioconda is derived from PyPI. The point of using PyPI is that it's the most upstream release distribution point (c.f. my comment in the PR that @joverlee521 referenced).

@victorlin wrote:

Yes, this is still a good aim! 2 separate issues are being discussed here:

  1. devel/latest-augur-release-tag unnecessarily sources the version from PyPI when Dockerfile does not use PyPI.

  2. Build commands in the Dockerfile can produce different results at different times. Checks should be put in place to reduce the effects of this when building a multi-arch image (which is really 2 images built separately).

(1) can be addressed by changing the release version source, and (2) can be addressed by adding tests before publishing the public final images.

Spot on with breaking this out into two separate things. It's best we don't conflate them.

For (2), I'm in agreement that we should add tests to ensure that our essential software packages (at least first-party ones, e.g. Augur, Auspice, etc) are the same (or have an expected difference) between the image's supported platforms (amd64, arm64).

For (1), I think the issue is not that we get a version from PyPI and source code from GitHub, but that we don't pass an explicit Augur version to the image build when triggering it on Augur release. The Augur (and Auspice) release process that triggers an image build should pass the version it just released to the image build process. (This is what @huddlej suggested too.) The image build process would install that explicitly given version instead of querying PyPI. (PyPI would then only be used by builds not triggered by upstream releases.)

The principle here is to make an invariant out of our expectation of what it means to build a new image after an Augur (or Auspice) release. Querying the "latest" version from GitHub instead of PyPI doesn't make that invariant, it just narrows (but doesn't eliminate) the conditions in which our expectation can still fail (as @joverlee521 noted too).

@victorlin
Copy link
Member Author

The Augur (and Auspice) release process that triggers an image build should pass the version it just released to the image build process.

At first I thought this sounded good, but I'm having second thoughts now.

It only makes an invariant for the specified software, not others. Let's say we want to implement this for both Augur and Auspice. It would look something like this:

# Upon Augur release
gh workflow run ci.yml --repo nextstrain/docker-base -f augur_ref=21.0.1

# Upon Auspice release
gh workflow run ci.yml --repo nextstrain/docker-base -f auspice_ref=2.45.0

When triggering from a new Augur release, there would still be a need to determine the latest Auspice version (still variant), and vice-versa.

Since the specified version info is only applicable to the current build, this feels like "half-pinning" to me (where full pinning would be like pip3 install augur==21.0.1). I'd rather:

  1. Improve the way the latest version is fetched (only reduces failures, but they can be caught by checking across platforms), or
  2. Implement full pinning somehow where the specified version info is checked into the repo to be used by all builds.

@corneliusroemer
Copy link
Member

When triggering from a new Augur release, there would still be a need to determine the latest Auspice version (still variant), and vice-versa.

It's not necessary because in almost all cases, when we make an Augur release, npm will have the latest released Auspice version. -f is just a fix to not proceed with the docker build until that version is ready through npm.

This is the implementation of your "rather" point 1: "Improve the way the latest version is fetched (only reduces failures, but they can be caught by checking across platforms), or"

That's exactly what it does, reduce failure. Inconsistency will still be checked across platforms.

@tsibley
Copy link
Member

tsibley commented Mar 10, 2023

It only makes an invariant for the specified software, not others. … When triggering from a new Augur release, there would still be a need to determine the latest Auspice version (still variant).

Yes, but I think that'll be fine in almost all cases (as @corneliusroemer says too).

The invariant I'm suggesting we establish isn't on all versions, but on the just-released version that's the reason we're triggering the rebuild in the first place.

Implement full pinning somehow where the specified version info is checked into the repo to be used by all builds.

I do agree that this is a more robust alternative, but it complicates a bit the triggering of new builds after a version bump. Maybe worth it! (But the easier/quicker thing seemed like it reduce failures enough.)

@tsibley
Copy link
Member

tsibley commented Mar 10, 2023

We'll also want to consider that any solution we choose here we will also want to apply to conda-base (if not now, in the future).

@victorlin
Copy link
Member Author

Closing this issue since the original issue (different Augur versions between image variants) has been addressed by #130.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
No open projects
Development

No branches or pull requests

5 participants