-
Notifications
You must be signed in to change notification settings - Fork 13
CLOUDP-333692: Re-design images building #303
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
MCK 1.3.0 Release NotesOther Changes
|
d7ae339
to
6649987
Compare
Fix build scenario Remove create and push manifests Continue improvement to main Simplify main and build_context missed Pass Build Configuration object directly Use legacy and new pipeline Fix Remove --include Rename MCO test image Multi platform builds, with buildx TODOs Implement is_release_step_executed() Fix init appdb image Import sort black formatting Some cleaning and version adjustments Adapt main to new build config Add buildscenario to buildconfig Handle build env Renaming, usage of high level config All images build pass on EVG Lint Explicit image type, support custom build_path Replace old by new pipeline in EVG Add documentation Split in multiple files, cleanup WIP, passing builds on staging temp + multi arch manifests Replace usage of sonar Remove namespace Remove pin_at and build_id Copied pipeline, removed daily builds and --exclude
This reverts commit 426e522.
scripts/release/atomic_pipeline.py
Outdated
|
||
|
||
def build_operator_image_patch(build_configuration: BuildConfiguration): | ||
if not build_operator_image_fast(build_configuration): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have two different operator build functions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an optimization we have in the pipeline, that I preserved, I don't know if it is still relevant. I can measure performance of both as a follow up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nammn @Julien-Ben do we need to keep this? How much is this useful right now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't remember using this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove it then !
@classmethod | ||
def infer_scenario_from_environment(cls) -> "BuildScenario": | ||
"""Infer the build scenario from environment variables.""" | ||
git_tag = os.getenv("triggered_by_git_tag") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
guiding question - should we handle this via env vars or should we have all of that as args instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can always override the scenario by passing the --scenario
arg. I'm fine with env vars if they are at least documented in the command help
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thats what I was worrying, right now pipeline is a mess partly due to the fact that we have multiple entrypoints. Which are env vars, arguments and files. This adds complexity as we need to codify argument hierachy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, but triggered_by_git_tag
, version_id
, RUNNING_IN_EVG
and is_patch
are internals of pipeline.py
and I don't like the option to pass those values as cmd args. For example it will be hard to motivate why developer need to provide them when running locally.
Another way is to have a separate cmd tool i.e. calculate_build_scenario
that calculates the build scenario for given args triggered_by_git_tag
, version_id
, RUNNING_IN_EVG
and is_patch
. The result will be scenario
output that can be provided to atomic_pipeline.py
as arg. The only issue is that evg functions don't support sharing outputs and we would need to somehow embed it in env vars 😭
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another improvement is to move all env var definitions to separate python file and hide the values behind python functions. This way at least we will know all env var usages and it will be easier to refactor later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think calculating the scenario and passing it to atomic_pipeline.py
is the way to go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be discussed in the TD. The point of the build_context
and the pipeline_main
files is to give full flexibility and a central place to decide how we pass arguments.
Once we decide the source of truth (e.g env vars vs JSON config files) we can modify it.
In the meantime I kept the behavior of the pipeline.py
, but made it easy to change the configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can discuss this in the td, but i rather give up flexibility in favour of clarity
from typing import Dict | ||
|
||
import boto3 | ||
import python_on_whales |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we describe/compare python_on_whales in the td compared to other wrappers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add a section, but TL;DR it is lower level and have 100% parity with docker cli. Especially for using buildx
scripts/release/build_images.py
Outdated
platforms=platforms, | ||
) | ||
|
||
if sign: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here and all related functions
scripts/release/atomic_pipeline.py
Outdated
return json.load(release) | ||
|
||
|
||
@TRACER.start_as_current_span("sonar_build_image") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we rename the span names to match the function name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will this not break existing dashboards (if we have them?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nope we don't have them
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed ✅
scripts/release/atomic_pipeline.py
Outdated
|
||
|
||
def build_operator_image_patch(build_configuration: BuildConfiguration): | ||
if not build_operator_image_fast(build_configuration): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't remember using this
@@ -0,0 +1,552 @@ | |||
#!/usr/bin/env python3 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how do we test this? Do we have a related unit test file integrated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous tests we had in pipeline_test.py
were mostly testing some internals that are not used anymore i.e. run_command_with_retries
, is_version_in_range
or test_is_release_step_executed
(sonar stuff)
Apart from e2e tests that verify image creation I would also ask for:
- moving some tests that still qualify for
atomic_pipeline.py
:test_build_latest_agent_versions
test_get_versions_to_rebuild_same_version
- possibly others
- add tests for
process_image
and verify what commands are actually invoked using mocks?
Most complex building process is for agent
image, but this will greatly change after non-matrix is merged, so I would wait with adding tests for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with what Maciej said, all unit tests were for low level internals. I don't think the pipeline is very relevant to test with unit tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this be merged with Maciej's changes (the release_info file)?
scripts/release/atomic_pipeline.py
Outdated
|
||
|
||
@TRACER.start_as_current_span("sign_image_in_repositories") | ||
def sign_image_in_repositories(args: Dict[str, str], arch: str = None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, not anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
scripts/release/atomic_pipeline.py
Outdated
logger.info(f"Building Operator args: {args}") | ||
|
||
image_name = "mongodb-kubernetes" | ||
build_image_generic( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a bit too many methods that could easily be merged. There is pipeline_process_image
, build_image_generic
, process_image
. They should be only one function.
I even notice that both build_image_generic
and process_image
sign the image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
""" | ||
agent_versions_to_build = list() | ||
agent_versions_to_build.append( | ||
( | ||
release["supportedImages"]["mongodb-agent"]["opsManagerMapping"]["cloud_manager"], | ||
release["supportedImages"]["mongodb-agent"]["opsManagerMapping"]["cloud_manager_tools"], | ||
) | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reuse the method above for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't really touched agent related stuff, once both Nam's PR (matrix removal) and mine are merged, we will make the necessary changes to the atomic pipeline.
Re-design images building
Note for review:
Since
atomic_pipeline.py
is largely a refactored version ofpipeline.py
, it’s much clearer to review their side-by-side diff than to wade through GitHub’s “all new lines” view.Here's the diff:
https://gist.github.com/Julien-Ben/3698d532d17bafb380f2e4f05b5db153
You can also take a look at the related TD Section
Changes
The PR refactors our Docker image build system. Most notably by replacing
pipeline.py
along with other components, detailed below.Usage of standalone Dockerfiles
Added in a previous PR, they eliminate the need for templating, and make it possible to retire Sonar once the Atomic Releases Epic is completed.
Building with docker buildx, multi-platform builds
In
build_images.py
we use docker buildx through a python API. It eliminates the need for building images separately for each platform (ARM/AMD), and then manually bundling them in a manifest.Handle build environments explicitly
We’ve introduced a framework that centralizes build configuration by scenario (e.g local development, staging releases etc) so the pipeline automatically picks sensible defaults (registry, target platforms, signing flags, and more) based on where you’re running.
In
pipeline_main.py
(with support frombuild_configuration.py
andbuild_context.py
) we treat each execution context (local dev, merge to master, release etc...) as an explicit, top-level environment.It infers defaults automatically but lets you override any value via CLI flags, ensuring all build parameters live in one single source of truth rather than scattered through pipeline scripts.
CLI usage
Proof of work
CI is building images with the new pipeline, and tests pass.
Note
For the duration of the Atomic Releases epic, both pipelines will be in the repository, until we are done with the staging and promotion process. This new pipeline will only be used for Evergreen patches.
This PR also heavily depends on changes that are introduced by the agent matrix removal, and the multi-platform support epic.
The existing Evergreen function, that uses
pipeline.py
has been renamedlegacy_pipeline
, and is used for release and periodic builds tasks.A new one has been created, calling the new pipeline.
Once the Atomic Release Epic is complete, we'll be able to remove:
pipeline.py
Follow up ticket to this PR: https://jira.mongodb.org/browse/CLOUDP-335471
Checklist
skip-changelog
label if not needed