Skip to content

CLOUDP-331841 - remove the agent matrix work #267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 51 commits into from
Aug 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
51 commits
Select commit Hold shift + click to select a range
ca0c007
baseline refactoring
nammn Jul 16, 2025
08c2c49
first refactoring
nammn Jul 16, 2025
35c5f53
first refactoring
nammn Jul 16, 2025
cb04b20
first refactoring
nammn Jul 16, 2025
43ed386
add agent container
nammn Jul 16, 2025
99a1ec8
remove operator suffix
nammn Jul 16, 2025
7de5edb
remove operator suffix from agents
nammn Jul 16, 2025
ac515aa
remove init container dep and fix agent yaml
nammn Jul 16, 2025
8a32581
make docker work locally and make agent building work locally
nammn Jul 16, 2025
ed9d21c
make docker work locally and make agent building work locally
nammn Jul 16, 2025
52a14f8
make docker work locally and make agent building work locally
nammn Jul 16, 2025
8f06d46
fix pipeline and fix stati cunit test
nammn Jul 16, 2025
c84e54d
fix pipeline for om
nammn Jul 16, 2025
e38b491
repush all ecr images
nammn Jul 16, 2025
1e48f2a
unify agent names
nammn Jul 16, 2025
c11bf32
add dummy probes
nammn Jul 16, 2025
d49c870
add dummy probes
nammn Jul 17, 2025
86f42d8
add dummy probes
nammn Jul 17, 2025
823cc59
add dummy probes
nammn Jul 17, 2025
62e45ef
add dummy probes
nammn Jul 17, 2025
ae23cfd
update launcher and use different paths and make things writable
nammn Jul 17, 2025
c5511be
linter
nammn Jul 17, 2025
4e97519
fix container index ordering
nammn Jul 21, 2025
6382c69
move to linking
nammn Jul 21, 2025
c01249b
move to linking and cleanup and make it work
nammn Jul 21, 2025
38ad957
cleanup some tests
nammn Jul 21, 2025
c7749e4
fix unit test
nammn Jul 22, 2025
bdc1087
add appdb support
nammn Jul 23, 2025
12066cb
make appdb work
nammn Jul 23, 2025
9d57519
make appdb work and ensure tmp mount
nammn Jul 23, 2025
d10f852
fix launcher path for setup script and fix docker unit tests
nammn Jul 23, 2025
6689182
fix e2e tests and fix linter
nammn Jul 23, 2025
57681ad
fix monitoring and tests
nammn Jul 24, 2025
41e9d78
add wip context handling
nammn Jul 24, 2025
2082354
fix appdb assert costs
nammn Jul 24, 2025
f8fd451
remove not used agent
nammn Jul 24, 2025
ba4755d
fix appdb spec
nammn Jul 25, 2025
a976ddb
fix appdb spec
nammn Jul 25, 2025
58a5ba7
remove rebuild logic
nammn Jul 25, 2025
6ab8ad7
Merge branch 'master' of github.com:mongodb/mongodb-kubernetes into r…
nammn Aug 5, 2025
ee2bbc8
fix merge
nammn Aug 5, 2025
46c4076
fix merge
nammn Aug 5, 2025
2d1d915
fix merge
nammn Aug 5, 2025
b926669
fix merge
nammn Aug 5, 2025
7c8f6e6
add release notes and add keyfile for all containers
nammn Aug 6, 2025
7ca0741
fix release notes
nammn Aug 6, 2025
580aea5
fix mongod pid handling by redirecting as shared processnamespace doe…
nammn Aug 7, 2025
c7e9fa5
add static and non static handling, change log file, re-add tests
nammn Aug 7, 2025
0cf761a
rename file to feature
nammn Aug 7, 2025
475b966
Merge branch 'master' into remove-agent-matrix
nammn Aug 8, 2025
89866b7
change rn
nammn Aug 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 0 additions & 3 deletions .evergreen.yml
Original file line number Diff line number Diff line change
Expand Up @@ -462,9 +462,6 @@ tasks:
skip_tags: ubuntu,release

- name: build_agent_images_ubi
depends_on:
- name: build_init_database_image_ubi
variant: init_test_run
commands:
- func: clone
- func: setup_building_host
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: Changing container setup of static architecture
kind: fix
date: 2025-08-06
---

* This change fixes the current complex and difficult-to-maintain architecture for stateful set containers, which relies on an "agent matrix" to map operator and agent versions which led to a sheer amount of images.
* We solve this by shifting to a 3-container setup. This new design eliminates the need for the operator-version/agent-version matrix by adding one additional container containing all required binaries. This architecture maps to what we already do with the mongodb-database container.
5 changes: 1 addition & 4 deletions controllers/operator/common_controller.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,6 @@ import (
"github.com/mongodb/mongodb-kubernetes/pkg/util/architectures"
"github.com/mongodb/mongodb-kubernetes/pkg/util/env"
"github.com/mongodb/mongodb-kubernetes/pkg/util/stringutil"
"github.com/mongodb/mongodb-kubernetes/pkg/util/versionutil"
"github.com/mongodb/mongodb-kubernetes/pkg/vault"
)

Expand Down Expand Up @@ -684,9 +683,7 @@ func (r *ReconcileCommonController) getAgentVersion(conn om.Connection, omVersio
return "", err
} else {
log.Debugf("Using agent version %s", agentVersion)
currentOperatorVersion := versionutil.StaticContainersOperatorVersion()
log.Debugf("Using Operator version: %s", currentOperatorVersion)
return agentVersion + "_" + currentOperatorVersion, nil
return agentVersion, nil
}
}

Expand Down
27 changes: 15 additions & 12 deletions controllers/operator/construct/appdb_construction.go
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,8 @@ func appDbPodSpec(initContainerImage string, om om.MongoDBOpsManager) podtemplat
construct.AgentName,
container.WithResourceRequirements(buildRequirementsFromPodSpec(*appdbPodSpec)),
)
scriptsVolumeMount := statefulset.CreateVolumeMount("agent-scripts", "/opt/scripts", statefulset.WithReadOnly(false))
hooksVolumeMount := statefulset.CreateVolumeMount("hooks", "/hooks", statefulset.WithReadOnly(false))

initUpdateFunc := podtemplatespec.NOOP()
if !architectures.IsRunningStaticArchitecture(om.Annotations) {
Expand All @@ -130,8 +132,6 @@ func appDbPodSpec(initContainerImage string, om om.MongoDBOpsManager) podtemplat
// volumes of different containers.
initUpdateFunc = func(templateSpec *corev1.PodTemplateSpec) {
templateSpec.Spec.InitContainers = []corev1.Container{}
scriptsVolumeMount := statefulset.CreateVolumeMount("agent-scripts", "/opt/scripts", statefulset.WithReadOnly(false))
hooksVolumeMount := statefulset.CreateVolumeMount("hooks", "/hooks", statefulset.WithReadOnly(false))
podtemplatespec.WithInitContainer(InitAppDbContainerName, buildAppDBInitContainer(initContainerImage, []corev1.VolumeMount{scriptsVolumeMount, hooksVolumeMount}))(templateSpec)
}
}
Expand Down Expand Up @@ -233,6 +233,12 @@ func CAConfigMapName(appDb om.AppDBSpec, log *zap.SugaredLogger) string {
// and volumemounts for TLS.
func tlsVolumes(appDb om.AppDBSpec, podVars *env.PodEnvVars, log *zap.SugaredLogger) podtemplatespec.Modification {
volumesToAdd, volumeMounts := getTLSVolumesAndVolumeMounts(appDb, podVars, log)

// Add agent API key volume mount if not using vault and monitoring is enabled
if !vault.IsVaultSecretBackend() && ShouldEnableMonitoring(podVars) {
volumeMounts = append(volumeMounts, statefulset.CreateVolumeMount(AgentAPIKeyVolumeName, AgentAPIKeySecretPath))
}

volumesFunc := func(spec *corev1.PodTemplateSpec) {
for _, v := range volumesToAdd {
podtemplatespec.WithVolume(v)(spec)
Expand Down Expand Up @@ -380,7 +386,7 @@ func AppDbStatefulSet(opsManager om.MongoDBOpsManager, podVars *env.PodEnvVars,
externalDomain := appDb.GetExternalDomainForMemberCluster(scaler.MemberClusterName())

if ShouldEnableMonitoring(podVars) {
monitoringModification = addMonitoringContainer(*appDb, *podVars, opts, externalDomain, log)
monitoringModification = addMonitoringContainer(*appDb, *podVars, opts, externalDomain, architectures.IsRunningStaticArchitecture(opsManager.Annotations), log)
} else {
// Otherwise, let's remove for now every podTemplateSpec related to monitoring
// We will apply them when enabling monitoring
Expand All @@ -390,7 +396,7 @@ func AppDbStatefulSet(opsManager om.MongoDBOpsManager, podVars *env.PodEnvVars,
}

// We copy the Automation Agent command from community and add the agent startup parameters
automationAgentCommand := construct.AutomationAgentCommand(true, opsManager.Spec.AppDB.GetAgentLogLevel(), opsManager.Spec.AppDB.GetAgentLogFile(), opsManager.Spec.AppDB.GetAgentMaxLogFileDurationHours())
automationAgentCommand := construct.AutomationAgentCommand(architectures.IsRunningStaticArchitecture(opsManager.Annotations), true, opsManager.Spec.AppDB.GetAgentLogLevel(), opsManager.Spec.AppDB.GetAgentLogFile(), opsManager.Spec.AppDB.GetAgentMaxLogFileDurationHours())
idx := len(automationAgentCommand) - 1
automationAgentCommand[idx] += appDb.AutomationAgent.StartupParameters.ToCommandLineArgs()

Expand All @@ -403,13 +409,10 @@ func AppDbStatefulSet(opsManager om.MongoDBOpsManager, podVars *env.PodEnvVars,
MountPath: "/var/lib/automation/config/acVersion",
}

// Here we ask to craete init containers which also creates required volumens.
// Here we ask to create init containers which also creates required volumes.
// Note that we provide empty images for init containers. They are not important
// at this stage beucase later we will define our own init containers for non-static architecture.
mod := construct.BuildMongoDBReplicaSetStatefulSetModificationFunction(&opsManager.Spec.AppDB, scaler, opts.MongodbImage, opts.AgentImage, "", "", true)
if architectures.IsRunningStaticArchitecture(opsManager.Annotations) {
mod = construct.BuildMongoDBReplicaSetStatefulSetModificationFunction(&opsManager.Spec.AppDB, scaler, opts.MongodbImage, opts.AgentImage, "", "", false)
}
// at this stage because later we will define our own init containers for non-static architecture.
mod := construct.BuildMongoDBReplicaSetStatefulSetModificationFunction(&opsManager.Spec.AppDB, scaler, opts.MongodbImage, opts.AgentImage, "", "", !architectures.IsRunningStaticArchitecture(opsManager.Annotations), opts.InitAppDBImage)

sts := statefulset.New(
mod,
Expand Down Expand Up @@ -493,7 +496,7 @@ func getVolumeMountIndexByName(mounts []corev1.VolumeMount, name string) int {
// addMonitoringContainer returns a podtemplatespec modification that adds the monitoring container to the AppDB Statefulset.
// Note that this replicates some code from the functions that do this for the base AppDB Statefulset. After many iterations
// this was deemed to be an acceptable compromise to make code clearer and more maintainable.
func addMonitoringContainer(appDB om.AppDBSpec, podVars env.PodEnvVars, opts AppDBStatefulSetOptions, externalDomain *string, log *zap.SugaredLogger) podtemplatespec.Modification {
func addMonitoringContainer(appDB om.AppDBSpec, podVars env.PodEnvVars, opts AppDBStatefulSetOptions, externalDomain *string, isStatic bool, log *zap.SugaredLogger) podtemplatespec.Modification {
var monitoringAcVolume corev1.Volume
var monitoringACFunc podtemplatespec.Modification

Expand All @@ -516,7 +519,7 @@ func addMonitoringContainer(appDB om.AppDBSpec, podVars env.PodEnvVars, opts App
}
// Construct the command by concatenating:
// 1. The base command - from community
command := construct.MongodbUserCommandWithAPIKeyExport
command := construct.GetMongodbUserCommandWithAPIKeyExport(isStatic)
command += "agent/mongodb-agent"
command += " -healthCheckFilePath=" + monitoringAgentHealthStatusFilePathValue
command += " -serveStatusPort=5001"
Expand Down
7 changes: 4 additions & 3 deletions controllers/operator/construct/construction_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -28,20 +28,20 @@ func TestBuildStatefulSet_PersistentFlagStatic(t *testing.T) {
mdb := mdbv1.NewReplicaSetBuilder().SetPersistent(nil).Build()
set := DatabaseStatefulSet(*mdb, ReplicaSetOptions(GetPodEnvOptions()), zap.S())
assert.Len(t, set.Spec.VolumeClaimTemplates, 1)
assert.Len(t, set.Spec.Template.Spec.Containers[0].VolumeMounts, 7)
assert.Len(t, set.Spec.Template.Spec.Containers[0].VolumeMounts, 8)
assert.Len(t, set.Spec.Template.Spec.Containers[1].VolumeMounts, 7)

mdb = mdbv1.NewReplicaSetBuilder().SetPersistent(util.BooleanRef(true)).Build()
set = DatabaseStatefulSet(*mdb, ReplicaSetOptions(GetPodEnvOptions()), zap.S())
assert.Len(t, set.Spec.VolumeClaimTemplates, 1)
assert.Len(t, set.Spec.Template.Spec.Containers[0].VolumeMounts, 7)
assert.Len(t, set.Spec.Template.Spec.Containers[0].VolumeMounts, 8)
assert.Len(t, set.Spec.Template.Spec.Containers[1].VolumeMounts, 7)

// If no persistence is set then we still mount init scripts
mdb = mdbv1.NewReplicaSetBuilder().SetPersistent(util.BooleanRef(false)).Build()
set = DatabaseStatefulSet(*mdb, ReplicaSetOptions(GetPodEnvOptions()), zap.S())
assert.Len(t, set.Spec.VolumeClaimTemplates, 0)
assert.Len(t, set.Spec.Template.Spec.Containers[0].VolumeMounts, 7)
assert.Len(t, set.Spec.Template.Spec.Containers[0].VolumeMounts, 8)
assert.Len(t, set.Spec.Template.Spec.Containers[1].VolumeMounts, 7)
}

Expand Down Expand Up @@ -111,6 +111,7 @@ func TestBuildStatefulSet_PersistentVolumeClaimSingleStatic(t *testing.T) {
{Name: util.PvcNameData, MountPath: util.PvcMountPathData, SubPath: util.PvcNameData},
{Name: util.PvcNameData, MountPath: util.PvcMountPathJournal, SubPath: util.PvcNameJournal},
{Name: util.PvcNameData, MountPath: util.PvcMountPathLogs, SubPath: util.PvcNameLogs},
{Name: PvcNameDatabaseScripts, MountPath: PvcMountPathScripts, ReadOnly: false},
})
}

Expand Down
Loading