CLOUDP-333692: Re-design images building #303

Julien-Ben · 2025-07-29T15:28:56Z

Re-design images building

Note for review:

Since atomic_pipeline.py is largely a refactored version of pipeline.py, it’s much clearer to review their side-by-side diff than to wade through GitHub’s “all new lines” view.
Here's the diff:

https://gist.github.com/Julien-Ben/3698d532d17bafb380f2e4f05b5db153

You can also take a look at the related TD Section

Changes

The PR refactors our Docker image build system. Most notably by replacing pipeline.py along with other components, detailed below.

Usage of standalone Dockerfiles

Added in a previous PR, they eliminate the need for templating, and make it possible to retire Sonar once the Atomic Releases Epic is completed.

Building with docker buildx, multi-platform builds

In build_images.py we use docker buildx through a python API. It eliminates the need for building images separately for each platform (ARM/AMD), and then manually bundling them in a manifest.

Handle build environments explicitly

We’ve introduced a framework that centralizes build configuration by scenario (e.g local development, staging releases etc) so the pipeline automatically picks sensible defaults (registry, target platforms, signing flags, and more) based on where you’re running.

In pipeline_main.py (with support from build_configuration.py and build_context.py) we treat each execution context (local dev, merge to master, release etc...) as an explicit, top-level environment.
It infers defaults automatically but lets you override any value via CLI flags, ensuring all build parameters live in one single source of truth rather than scattered through pipeline scripts.

CLI usage

usage: pipeline_main.py [-h] [--parallel] [--debug] [--sign] [--scenario {BuildScenario.RELEASE,BuildScenario.PATCH,BuildScenario.STAGING,BuildScenario.DEVELOPMENT}]
                        [--platform PLATFORM] [--version VERSION] [--registry REGISTRY] [--parallel-factor PARALLEL_FACTOR]
                        image

Build container images.

positional arguments:
  image                 Image to build.

options:
  -h, --help            show this help message and exit
  --parallel            Build images in parallel.
  --debug               Enable debug logging.
  --sign                Sign images.
  --scenario {BuildScenario.RELEASE,BuildScenario.PATCH,BuildScenario.STAGING,BuildScenario.DEVELOPMENT}
                        Override the build scenario instead of inferring from environment. Options: release, patch, master, development
  --platform PLATFORM   Target platforms for multi-arch builds (comma-separated). Example: linux/amd64,linux/arm64. Defaults to linux/amd64.
  --version VERSION     Override the version/tag instead of resolving from build scenario
  --registry REGISTRY   Override the base registry instead of resolving from build scenario
  --parallel-factor PARALLEL_FACTOR
                        Number of builds to run in parallel, defaults to number of cores

Proof of work

CI is building images with the new pipeline, and tests pass.

Note

For the duration of the Atomic Releases epic, both pipelines will be in the repository, until we are done with the staging and promotion process. This new pipeline will only be used for Evergreen patches.
This PR also heavily depends on changes that are introduced by the agent matrix removal, and the multi-platform support epic.

The existing Evergreen function, that uses pipeline.py has been renamed legacy_pipeline, and is used for release and periodic builds tasks.
A new one has been created, calling the new pipeline.

Once the Atomic Release Epic is complete, we'll be able to remove:

Sonar
Inventories
Periodic builds
pipeline.py

Follow up ticket to this PR: https://jira.mongodb.org/browse/CLOUDP-335471

Checklist

Have you linked a jira ticket and/or is the ticket in the title?
Have you checked whether your jira ticket required DOCSP changes?
Have you added changelog file?
- use skip-changelog label if not needed
- refer to Changelog files and Release Notes section in CONTRIBUTING.md for more details

github-actions · 2025-07-29T15:29:41Z

⚠️ (this preview might not be accurate if the PR is not rebased on current master branch)

MCK 1.3.0 Release Notes

Other Changes

Optional permissions for PersistentVolumeClaim moved to a separate role. When managing the operator with Helm it is possible to disable permissions for PersistentVolumeClaim resources by setting operator.enablePVCResize value to false (true by default). When enabled, previously these permissions were part of the primary operator role. With this change, permissions have a separate role.
subresourceEnabled Helm value was removed. This setting used to be true by default and made it possible to exclude subresource permissions from the operator role by specifying false as the value. We are removing this configuration option, making the operator roles always have subresource permissions. This setting was introduced as a temporary solution for this OpenShift issue. The issue has since been resolved and the setting is no longer needed.

Fix build scenario Remove create and push manifests Continue improvement to main Simplify main and build_context missed Pass Build Configuration object directly Use legacy and new pipeline Fix Remove --include Rename MCO test image Multi platform builds, with buildx TODOs Implement is_release_step_executed() Fix init appdb image Import sort black formatting Some cleaning and version adjustments Adapt main to new build config Add buildscenario to buildconfig Handle build env Renaming, usage of high level config All images build pass on EVG Lint Explicit image type, support custom build_path Replace old by new pipeline in EVG Add documentation Split in multiple files, cleanup WIP, passing builds on staging temp + multi arch manifests Replace usage of sonar Remove namespace Remove pin_at and build_id Copied pipeline, removed daily builds and --exclude

This reverts commit 426e522.

scripts/release/main.py

.evergreen-functions.yml

scripts/release/main.py

MaciejKaras · 2025-08-04T14:32:13Z

scripts/release/atomic_pipeline.py

+
+
+def build_operator_image_patch(build_configuration: BuildConfiguration):
+    if not build_operator_image_fast(build_configuration):


why do we have two different operator build functions?

This is an optimization we have in the pipeline, that I preserved, I don't know if it is still relevant. I can measure performance of both as a follow up

@nammn @Julien-Ben do we need to keep this? How much is this useful right now?

i don't remember using this

I'll remove it then !

nammn · 2025-08-07T11:15:23Z

scripts/release/build_context.py

+    @classmethod
+    def infer_scenario_from_environment(cls) -> "BuildScenario":
+        """Infer the build scenario from environment variables."""
+        git_tag = os.getenv("triggered_by_git_tag")


guiding question - should we handle this via env vars or should we have all of that as args instead?

We can always override the scenario by passing the --scenario arg. I'm fine with env vars if they are at least documented in the command help

thats what I was worrying, right now pipeline is a mess partly due to the fact that we have multiple entrypoints. Which are env vars, arguments and files. This adds complexity as we need to codify argument hierachy.

I agree, but triggered_by_git_tag, version_id, RUNNING_IN_EVG and is_patch are internals of pipeline.py and I don't like the option to pass those values as cmd args. For example it will be hard to motivate why developer need to provide them when running locally.

Another way is to have a separate cmd tool i.e. calculate_build_scenario that calculates the build scenario for given args triggered_by_git_tag, version_id, RUNNING_IN_EVG and is_patch. The result will be scenario output that can be provided to atomic_pipeline.py as arg. The only issue is that evg functions don't support sharing outputs and we would need to somehow embed it in env vars 😭

Another improvement is to move all env var definitions to separate python file and hide the values behind python functions. This way at least we will know all env var usages and it will be easier to refactor later

I think calculating the scenario and passing it to atomic_pipeline.py is the way to go

I think this should be discussed in the TD. The point of the build_context and the pipeline_main files is to give full flexibility and a central place to decide how we pass arguments.

Once we decide the source of truth (e.g env vars vs JSON config files) we can modify it.
In the meantime I kept the behavior of the pipeline.py, but made it easy to change the configurations.

we can discuss this in the td, but i rather give up flexibility in favour of clarity

nammn · 2025-08-07T11:17:04Z

scripts/release/build_images.py

+from typing import Dict
+
+import boto3
+import python_on_whales


do we describe/compare python_on_whales in the td compared to other wrappers?

I'll add a section, but TL;DR it is lower level and have 100% parity with docker cli. Especially for using buildx

scripts/release/build_images.py

nammn · 2025-08-07T11:18:47Z

scripts/release/build_images.py

+        platforms=platforms,
+    )
+
+    if sign:


same here and all related functions

nammn · 2025-08-07T11:20:29Z

scripts/release/atomic_pipeline.py

+        return json.load(release)
+
+
+@TRACER.start_as_current_span("sonar_build_image")


can we rename the span names to match the function name?

will this not break existing dashboards (if we have them?)

nope we don't have them

Renamed ✅

nammn · 2025-08-07T11:26:30Z

scripts/release/atomic_pipeline.py

+
+
+def build_operator_image_patch(build_configuration: BuildConfiguration):
+    if not build_operator_image_fast(build_configuration):


i don't remember using this

nammn · 2025-08-07T11:26:59Z

scripts/release/atomic_pipeline.py

@@ -0,0 +1,552 @@
+#!/usr/bin/env python3
+


how do we test this? Do we have a related unit test file integrated?

The previous tests we had in pipeline_test.py were mostly testing some internals that are not used anymore i.e. run_command_with_retries, is_version_in_range or test_is_release_step_executed (sonar stuff)

Apart from e2e tests that verify image creation I would also ask for:

moving some tests that still qualify for atomic_pipeline.py:

test_build_latest_agent_versions

test_get_versions_to_rebuild_same_version

possibly others

add tests for process_image and verify what commands are actually invoked using mocks?

Most complex building process is for agent image, but this will greatly change after non-matrix is merged, so I would wait with adding tests for it.

I agree with what Maciej said, all unit tests were for low level internals. I don't think the pipeline is very relevant to test with unit tests.

lucian-tosa

How will this be merged with Maciej's changes (the release_info file)?

lucian-tosa · 2025-08-07T13:30:47Z

scripts/release/atomic_pipeline.py

+
+
+@TRACER.start_as_current_span("sign_image_in_repositories")
+def sign_image_in_repositories(args: Dict[str, str], arch: str = None):


is this used anywhere?

Good catch, not anymore

lucian-tosa · 2025-08-07T13:31:46Z

scripts/release/atomic_pipeline.py

+    logger.info(f"Building Operator args: {args}")
+
+    image_name = "mongodb-kubernetes"
+    build_image_generic(


There are a bit too many methods that could easily be merged. There is pipeline_process_image, build_image_generic, process_image. They should be only one function.
I even notice that both build_image_generic and process_image sign the image

Very good point, with all the simplifications there's no need for so many abstraction layers anymore. I merged and renamed methods.
493d4d6
a21b254

lucian-tosa · 2025-08-07T13:34:52Z

scripts/release/atomic_pipeline.py

+    """
+    agent_versions_to_build = list()
+    agent_versions_to_build.append(
+        (
+            release["supportedImages"]["mongodb-agent"]["opsManagerMapping"]["cloud_manager"],
+            release["supportedImages"]["mongodb-agent"]["opsManagerMapping"]["cloud_manager_tools"],
+        )
+    )
+


Can we reuse the method above for this?

I haven't really touched agent related stuff, once both Nam's PR (matrix removal) and mine are merged, we will make the necessary changes to the atomic pipeline.

Julien-Ben mentioned this pull request Jul 29, 2025

[Draft] CLOUDP-333692: Re-design images building #209

Closed

Julien-Ben force-pushed the julienben/redesign-pipeline branch from d7ae339 to 6649987 Compare July 29, 2025 15:30

Julien-Ben self-assigned this Jul 29, 2025

Julien-Ben added 19 commits July 29, 2025 17:37

Remove file

675bee4

Put lib back in dependencies

833e25f

add todo

15e7f51

Fix

120c1af

Remove multi arch call, fix test image path

c9ceabf

Fix agent version for default case

fb87f4d

Lindt

c05e180

isort

747c4ba

Cleanup TODOs

03fd9b8

Rename arch -> platform

1fbb8d5

Don't rely on exception to check for builder existence

e9a524f

Remove unused variables

fa6b899

Pre commit

426e522

Cleanup

6890858

Correct build envs

aab9592

Lindt

33173bb

Update Makefile

74e867c

Add TODO

b13b054

Revert "Pre commit"

832ce61

This reverts commit 426e522.

MaciejKaras reviewed Aug 4, 2025

View reviewed changes

scripts/release/main.py Outdated Show resolved Hide resolved

MaciejKaras reviewed Aug 4, 2025

View reviewed changes

.evergreen-functions.yml Show resolved Hide resolved

MaciejKaras reviewed Aug 4, 2025

View reviewed changes

scripts/release/main.py Outdated Show resolved Hide resolved

MaciejKaras reviewed Aug 4, 2025

View reviewed changes

scripts/release/main.py Outdated Show resolved Hide resolved

MaciejKaras reviewed Aug 4, 2025

View reviewed changes

scripts/release/main.py Outdated Show resolved Hide resolved

MaciejKaras reviewed Aug 4, 2025

View reviewed changes

Julien-Ben added 9 commits August 6, 2025 10:35

Cleanup

a7c63c9

Rename file

742e784

Remove cli sbom

1f0a21b

Renamed image building file

813d539

Freeze python on whales

c06061b

Lint

5f9d49a

Remove everything SBOM related

f390dc9

Lint

a47341d

Add TODO

972b23c

Julien-Ben added the skip-changelog Use this label in Pull Request to not require new changelog entry file label Aug 6, 2025

Remove --all-agents

4ae4034

Julien-Ben marked this pull request as ready for review August 6, 2025 12:15

Julien-Ben requested a review from a team as a code owner August 6, 2025 12:15

Julien-Ben requested review from anandsyncs, viveksinghggits, nammn and lucian-tosa August 6, 2025 12:15

Merge branch 'master' into julienben/redesign-pipeline

291f043

nammn reviewed Aug 7, 2025

View reviewed changes

Julien-Ben added 3 commits August 7, 2025 15:34

Rename trace

88c76bc

Remove operator build

0fd4db8

Doc and logs

ee86ebf

lucian-tosa reviewed Aug 7, 2025

View reviewed changes

Julien-Ben added 6 commits August 7, 2025 17:27

Use build_image_generic for test images too

5f5940f

Remove unused sign images in repositories

6dd208f

Remove pipeline_process_image

493d4d6

Remove process_image

a21b254

Rename function

a7db180

Lint

52b8662



		def build_operator_image_patch(build_configuration: BuildConfiguration):
		if not build_operator_image_fast(build_configuration):

		return json.load(release)


		@TRACER.start_as_current_span("sonar_build_image")



		@TRACER.start_as_current_span("sign_image_in_repositories")
		def sign_image_in_repositories(args: Dict[str, str], arch: str = None):

CLOUDP-333692: Re-design images building #303

Are you sure you want to change the base?

CLOUDP-333692: Re-design images building #303

Conversation

Julien-Ben commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Re-design images building

Note for review:

Changes

Usage of standalone Dockerfiles

Building with docker buildx, multi-platform builds

Handle build environments explicitly

CLI usage

Proof of work

Note

Checklist

Uh oh!

github-actions bot commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

MCK 1.3.0 Release Notes

Other Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Julien-Ben Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MaciejKaras Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Julien-Ben commented Jul 29, 2025 •

edited

Loading

github-actions bot commented Jul 29, 2025 •

edited

Loading

Julien-Ben Aug 6, 2025 •

edited

Loading

MaciejKaras Aug 7, 2025 •

edited

Loading

lucian-tosa left a comment •

edited

Loading