Setup docker-compose like k8s deployment #1054

gregschohn · 2024-10-08T13:14:00Z

Description

Longer term, Kubernetes (K8s) will allow us

To unify the single-node developer experience with and on prem experience and the cloud one.
Simplify and speed up deployment, especially around testing.
To enable auto-scaling

Category Enhancement / New feature
Why these changes are required? See above
What is the old behavior before changes and new behavior after changes? CDK deployment should still work as it did before.

For this first cut...

Installing

The k8s directory contains helm charts that can be installed with helm install charts/aggregate/migrationAssistant. Optionally, -n ma can be added to add the helm installation and MA resources to the "ma" namespace (or whatever is specified). aggregate/mockCustomerClusters is a helm chart for testing purposes to bring up the resources external to the migration assistant that are required to facilitate testing all of the migration assistant functionality.

Those aggregate charts pull other helm charts, either local to this repo (charts/components/(captureProxy|migrationConsole...) or 3rd party charts (opensearch, elasticsearch, prometheus, etc) into them. Each chart defines default values in the values.yaml contained within the charts directory, alongside the Chart.yaml manifest file. Those default values can be used to install a working version of the MA solution. Today's functionality is limited and buggy, but doing a reasonable demo install w/out providing any value overrides will be an ongoing requirement.

Before running helm install, you'll likely need to run helm dependency build CHART_DIR, which downloads or copies charts into the CHART_DIR/charts directory. There's a script (linkSubChartsToDependencies.sh), which still has some bugs, that attempts do do some of this manually w/ symlinks to local directories when possible. Symlinks are preferable to tarballs because you don't need to keep rebuilding the dependencies, which can be time consuming.

Configurations

Command line parameters are configured via helm through the values (such as values.yaml). Here's sample contents for the bulkLoad deployment.

parameters:
  initialLeaseDuration:
    value: PT10M
  documentsPerBulkRequest:
    value: 1000

All of Migration Assistant's custom processes use the same paradigm to translate those yaml parameters into the command line parameters. Shared helm charts (helmCommon) provide helper macros that allow consistent handling across our fleet of helm packages. For command line parsing, helm charts create deployments, mostly via the common code that does the following. 1) Create config map stores for each of the parameters specified in the values.yaml file; 2) load the values from those config maps as environment variables into an init container; and then 3) that init container runs shell script commands to construct the arguments that will be passed into the main program. For the last part, those arguments are passed via a file (vars.sh) within a shared mount and 4) the main container loads those variables before running the program.

The migration console has an extra init container that constructs the migration_services.yaml - which, like vars.sh is written into a shared container. However, since pods can't refresh environment variables when config maps update and because we're bundling those into a single services yaml file, we have a separate init container to maintain that. That container uses a custom shell script that uses the k8s client to watch config maps and formats the configured values themselves as yaml.

Issues Resolved

https://opensearch.atlassian.net/browse/MIGRATIONS-2287

Testing

Manual testing only w/ minikube on my mac. See localTesting.sh for some rough examples.

Check List

New functionality includes testing
- All tests pass, including unit test, integration test and doctest
New functionality has been documented
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Greg Schohn <[email protected]>

There's a testObservability chart that will deploy jaeger, prometheus, and grafana - testing values are within the helmValues/localTesting/testObservability.yaml file. Signed-off-by: Greg Schohn <[email protected]>

… to be deployed hierarchically via helm packages Signed-off-by: Greg Schohn <[email protected]>

…to work as expected. Jaeger & Prometheus have been deployed, but I don't have a collector and haven't confirmed that they're behaving as expected. I still need to 1) setup kafka, flip the proxy to write to kafka; 2) setup the replayer 3) setup otel-collectors; 4) setup RFS. Signed-off-by: Greg Schohn <[email protected]>

… package. Signed-off-by: Greg Schohn <[email protected]>

Signed-off-by: Greg Schohn <[email protected]>

codecov · 2024-10-08T13:15:50Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.48%. Comparing base (38513f9) to head (a31e9dc).
Report is 57 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1054      +/-   ##
============================================
- Coverage     80.49%   80.48%   -0.02%     
+ Complexity     3080     3077       -3     
============================================
  Files           421      421              
  Lines         15682    15683       +1     
  Branches       1062     1062              
============================================
- Hits          12624    12623       -1     
+ Misses         2411     2410       -1     
- Partials        647      650       +3

Flag	Coverage Δ
unittests	`80.48% <ø> (-0.02%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…HEAD

Signed-off-by: Tanner Lewis <[email protected]>

…ared library WIP Signed-off-by: Tanner Lewis <[email protected]>

Signed-off-by: Tanner Lewis <[email protected]>

Signed-off-by: Greg Schohn <[email protected]>

Signed-off-by: Tanner Lewis <[email protected]>

…nsearch-migrations into KubernetesExperimentation

…refactoring Signed-off-by: Tanner Lewis <[email protected]>

Signed-off-by: Tanner Lewis <[email protected]>

Signed-off-by: Greg Schohn <[email protected]>

Signed-off-by: Tanner Lewis <[email protected]>

…agement for resources. Signed-off-by: Greg Schohn <[email protected]>

…ith the k8s config infrastructure Signed-off-by: Greg Schohn <[email protected]>

…onAssistant and see the replayer and console come up. I'm no longer setting the namespaces manually, but relying upon helm to do that w/ `helm install -n ma ...`. The buildArgumentsBuilderScript can now handle positional arguments. It takes in a list of keys that should be used positionally instead of via flag. Those values can still be lists or scalar values. It's expected that the positional mappings are specified by the chart templates rather than the values. Environment variables are still setup and bound to snake-case variants of the parameter keys pulled in as values. The MA chart now includes all of the charts that could be required, but installs them conditionally. Kafka charts have been moved into shared (for the base kafka chart) and components (for the traffic capture one) - I'll have more to do on integrating the kafka chart and operator, but this is just enough to hold progress on kafka work before really taking it on. The kafka topic chart is probably gone permanently - or for quite a while - since the proxy will auto-create the topic anyway. Install the bulk loader w/ 0 replicas so that it won't run before a snapshot has been created. This puts it roughly where we are w/ a CDK deployment. The capture proxy also starts in a state w/ 0 replicas. The replayer DOES have replicas, but that will likely change down to 0 eventually too (and kafka will be loaded on-demand too). Some of those lazy features will require some more support in the migration console though, so we'll continue along w/ a simpler, if costlier, deployment until then. I'm setting the replica count to 0 for a number of services Signed-off-by: Greg Schohn <[email protected]>

Signed-off-by: Greg Schohn <[email protected]>

…nsearch-migrations into KubernetesExperimentation Signed-off-by: Greg Schohn <[email protected]> # Conflicts: # deployment/k8s/charts/aggregates/migrationAssistant/Chart.yaml # deployment/k8s/charts/aggregates/migrationAssistant/values.yaml

…c kafka broker, and console install cleanly. Run `helm install -n ma ma charts/aggregates/migrationAssistant` and wait for the kafka cluster to come up. Once it does, the replayer logs show that the replayer has joined the group. I still need to verify that the proxy can be instantiated and that e2e traffic can flow from a source to a target. Signed-off-by: Greg Schohn <[email protected]>

lewijacn · 2025-01-07T20:04:52Z

deployment/k8s/charts/aggregates/testObservability/Chart.yaml

I assume this testObservability directory will be deleted and its functionality merged into MA

Done. Thanks. I've moved the one dashboard for grafana that was in it into the MA chart (and made it conditional)

lewijacn · 2025-01-07T20:06:00Z

deployment/k8s/helmValues/localTesting/captureProxy.yaml

Sounds like we could remove this entire localTesting directory since we are using default values from values.yaml for the given charts

Thanks, good point. I've pulled the remaining ones into the MA chart. I've been less and less concerned with deploying just one pod as I've been developing & conditional installs on the top-level chart to deploy specific things might be easier for users and us anyway.

lewijacn · 2025-01-07T20:08:42Z

...cCapture/trafficReplayer/src/main/java/org/opensearch/migrations/replay/TrafficReplayer.java

@@ -105,19 +105,19 @@ public static class Parameters {

        @Parameter(
            required = false,
-            names = {REMOVE_AUTH_HEADER_VALUE_ARG },
+            names = {REMOVE_AUTH_HEADER_VALUE_ARG, "--removeAuthHeader" },


nit: A bit weird having static field variables as well as hardcoded values for the same pattern in these files

Yes - the static field is used to print out some error messages. It seemed like it would be even more complicated though to get the language agreements right if both aliases were included

lewijacn · 2025-01-07T20:13:58Z

deployment/k8s/charts/sharedResources/helmCommon/templates/_command.tpl

Reminder from chat: Would be nice to have a README explaining the purpose/usage of each of these helper templates

lewijacn · 2025-01-07T20:22:48Z

deployment/k8s/charts/components/captureProxy/templates/deployment.yaml

+    spec:
+      initContainers:
+        {{- include "generic.setupEnvLoadInitContainer" (merge . (dict
+           "MountName" $mountName
+           "include" .Template.Include)) | nindent 8 }}
+        - name: wait-for-kafka
+          image: bitnami/kubectl:latest  # or any image with curl/kubectl
+          command: [ 'sh', '-c',
+                     'until kubectl get configmap kafka-config; do echo waiting for kafka config; sleep 1; done' ]


Reminder to think through what happens if the loaded environment changes while Kafka is still spinning up and our wait condition hasn't been met

lewijacn · 2025-01-07T20:57:47Z

deployment/k8s/charts/components/migrationConsole/sample.yaml

I assume we will remove this

Thanks - this was just to support my development (contents are also in the console_link/README.md

lewijacn · 2025-01-07T20:58:22Z

deployment/k8s/charts/components/migrationConsole/templates/rbac.yaml

What are your intentions with the RBAC setup here? I was gonna hold on this for a bit

This was required because by default, which is "deny", those containers couldn't get access to the config maps.

lewijacn · 2025-01-07T21:00:07Z

deployment/k8/.gitignore

I assume this k8 directory will get dropped from this PR when its ready. The README.md is likely worth copying over with some minor modification. The minikube stuff I have is maybe a more useful reference as we figure out how we want to test with minikube

I remember that I need to change something in the readme or build scripts but I don't remember what. Please help to remind me?

I believe I had at least one section in the README.md that was referencing the minikubeLocal.sh script that I had. Since we have kept that script for now, I think we are fine

…nd in some cases more consistent. Signed-off-by: Greg Schohn <[email protected]>

Choose helm builtins for snake/kebab casing and put a condition on the grafana dashboard for the aggregate package. Signed-off-by: Greg Schohn <[email protected]>

Fixed a couple other issues along the way and found an issue that helm's merge function mutates the left dictionary, which was surprising and will require re-evaluation of every application. See helm/helm#13308 Signed-off-by: Greg Schohn <[email protected]>

Signed-off-by: Greg Schohn <[email protected]> # Conflicts: # deployment/k8/migration-assistant/Chart.yaml

lewijacn · 2025-01-09T21:45:54Z

TrafficCapture/dockerSolution/src/main/docker/k8sConfigMapUtilScripts/Pipfile

We also need a Pipfile.lock in this directory

:-). Yeah, the tests just told me that!

lewijacn · 2025-01-09T21:48:17Z

TrafficCapture/dockerSolution/src/main/docker/migrationConsole/lib/console_link/services.yaml

We should probably revert changes to this file and have our own services.yaml we feed in for now as I'm not sure what impact this will cause

WOW - thank you. I clearly didn't look at a merge that I took in carefully enough. I didn't even realize that this had happened.

…yet. Signed-off-by: Greg Schohn <[email protected]>

Signed-off-by: Greg Schohn <[email protected]>

lewijacn · 2025-01-10T16:11:48Z

deployment/k8s/charts/components/captureProxy/templates/deployment.yaml

+        - name: wait-for-kafka
+          image: bitnami/kubectl:latest  # or any image with curl/kubectl
+          command: [ 'sh', '-c',
+                     'until kubectl wait --for=condition=Ready kafka/captured-traffic -n {{.Release.Namespace }} --timeout=10s; do echo waiting for kafka cluster is ready; sleep 1; done' ]


If kafka is deployed in a different context won't this wait check always fail?

define different context? Generally, yes - Kafka will need to be installed in the same namespace and it will need to have that specific name. We'll want to improve this to support a bring-your-own-kafka at some point, but I don't know how big of a concern that is. I'd like to have somebody ask for that.

My main point here was that we probably need some way to provide the namespace that kafka is in otherwise it seems like it will fail if it continues to check its own namespace and kafka is in a different namespace

lewijacn · 2025-01-10T16:18:25Z

deployment/k8s/charts/tests/testConsole/templates/deployment.yaml

+                "MountName" "merged-config"
+                "include" .Template.Include) .) | nindent 8 }}
+      containers:
+        - name: console


Can we rename to test-console

peternied

Thanks for getting this out for review @gregschohn great work, still digesting the k8 directory

DocumentsFromSnapshotMigration/src/main/java/org/opensearch/migrations/RfsMigrateDocuments.java

TrafficCapture/dockerSolution/src/main/docker/migrationConsole/lib/console_link/services.yaml

TrafficCapture/dockerSolution/src/main/docker/k8sConfigMapUtilScripts/Dockerfile

peternied

While I've got outstanding comments, this is a great step forward thanks @gregschohn and @lewijacn

TrafficCapture/dockerSolution/src/main/docker/k8sConfigMapUtilScripts/README.md

peternied · 2025-01-10T18:31:15Z

...ture/dockerSolution/src/main/docker/k8sConfigMapUtilScripts/configmap2yaml/config_watcher.py

Lets add tests for this tool.

config_watcher will at the very least change significantly once we start to extend the migration console to work directly with k8s. At this point, config_watcher isn't really being tested (because we don't update config values at runtime). Once we do - if we're still in need of services yaml, we'll want to put some integ tests together - and unit tests on the specific contract that we're converging to will also make sense. At this point, the whole k8sConfigMapUtilScripts image should be considered an experimental bridge.

I'm OK if we don't have a good test story so long as we have a JIRA task that includes this testing as an AC. Can you point a link to one here?

...pture/dockerSolution/src/main/docker/k8sConfigMapUtilScripts/configmap2yaml/template.yaml.j2

peternied · 2025-01-10T22:40:21Z

deployment/k8s/README.md

The project structure is throwing me a little bit here, I'm not sure why we need .../k8s/charts/..., it seems like we could get away with structuring, is there something I'm missing about what we put where?

deployment/ -> charts/ -> README.md -> {chart deployment scripts} -> aggregates/... -> components/... -> sharedResources/...

Charts are a helm construct. We'll also want to deploy EKS and AWS specific things for an EKS related deployment. We also might want tests, etc that are specific to k8s or helm. Everything w/in charts today is specific to helm. I can see it having siblings that are for k8s, but not specific to helm charts. Does that help?

I certainly agree we might have AWS / non-AWS specific functionality. I'm not sure I see the difference between k8s and helm, or to be more specific, I don't see k8s without helm.

When talking about support for other platforms my preference would be a single declaration for the canonical Migration Assistant. Similarly there would be one component that represents the Migration Console. Does this align with what you are thinking?

deployment/k8s/aws/ack-resource-setup/.helmignore

…/sprig Signed-off-by: Greg Schohn <[email protected]>

gregschohn added 8 commits September 1, 2024 13:48

Unstructured configs that configure and deploy things to k8s

19007a5

Signed-off-by: Greg Schohn <[email protected]>

more k8s & helm experimentation.

3e7d858

There's a testObservability chart that will deploy jaeger, prometheus, and grafana - testing values are within the helmValues/localTesting/testObservability.yaml file. Signed-off-by: Greg Schohn <[email protected]>

Another checkpoint in slowly (& inconsistently) getting more services…

785312e

… to be deployed hierarchically via helm packages Signed-off-by: Greg Schohn <[email protected]>

K8s checkpoint. Working on getting kafka resources deployed as a helm…

f79142e

… package. Signed-off-by: Greg Schohn <[email protected]>

Merge branch 'main' into KubernetesExperimentation

0bffeca

Minor cleanup

0e201c2

Signed-off-by: Greg Schohn <[email protected]>

another checkpoint

9c6f6f4

Signed-off-by: Greg Schohn <[email protected]>

lewijacn and others added 20 commits October 10, 2024 12:45

Merge remote-tracking branch 'schohn/KubernetesExperimentation' into …

e33f89a

…HEAD

Experiment with k8 structure pattern

1cd8782

Signed-off-by: Tanner Lewis <[email protected]>

Partial working state for test environment

4abd50e

Signed-off-by: Tanner Lewis <[email protected]>

Checkpoint - Basic replayer functionality

cdbe2b5

Signed-off-by: Tanner Lewis <[email protected]>

Update services.yaml endpoint

03e3431

Signed-off-by: Tanner Lewis <[email protected]>

Basic working Replayer and RFS in local k8

032227b

Signed-off-by: Tanner Lewis <[email protected]>

Swap out to use Kafka operator

8054cdc

Signed-off-by: Tanner Lewis <[email protected]>

Add opensearch operator (not working)

53057ae

Signed-off-by: Tanner Lewis <[email protected]>

Checkpoint after moving to shared volume charts, otel collector in sh…

8614014

…ared library WIP Signed-off-by: Tanner Lewis <[email protected]>

Add documentation for getting started with K8 and minor changes

e174444

Signed-off-by: Tanner Lewis <[email protected]>

another checkpoint

29ff913

Signed-off-by: Greg Schohn <[email protected]>

Start structure change

cc31d0d

Signed-off-by: Tanner Lewis <[email protected]>

Merge branch 'main' into KubernetesExperimentation

9b6115e

Merge branch 'KubernetesExperimentation' of github.com:gregschohn/ope…

5981d40

…nsearch-migrations into KubernetesExperimentation

Checkpoint 10-24 fill in structure more, a couple more services need …

233ac38

…refactoring Signed-off-by: Tanner Lewis <[email protected]>

Modify volume structure and checkpoint

b1567d4

Signed-off-by: Tanner Lewis <[email protected]>

Checkpoint probably working state after refactoring

76cab7f

Signed-off-by: Tanner Lewis <[email protected]>

Reorder MA helm chart dependencies

00001e4

Signed-off-by: Greg Schohn <[email protected]>

Merge branch 'tanners-initial-k8' into KubernetesExperimentation

c70cf6d

Update README for restructuring

0a07d37

Signed-off-by: Tanner Lewis <[email protected]>

gregschohn force-pushed the KubernetesExperimentation branch from 763de17 to ad8be18 Compare October 31, 2024 15:16

gregschohn added 7 commits December 18, 2024 14:15

Checkpoint to start cleaning up namespaces and unifying namespace man…

a4a2011

…agement for resources. Signed-off-by: Greg Schohn <[email protected]>

update RFS to accept camel case arguments to make it easier to work w…

eb0f534

…ith the k8s config infrastructure Signed-off-by: Greg Schohn <[email protected]>

wait for kafka before starting up the capture proxy

1d21b9d

Signed-off-by: Greg Schohn <[email protected]>

Merge branch 'main' into KubernetesExperimentation

8e10ce9

lewijacn reviewed Jan 7, 2025

View reviewed changes

gregschohn added 4 commits January 8, 2025 07:17

Refactoring helm charts - especially common templates to be clearer a…

b6d3bae

…nd in some cases more consistent. Signed-off-by: Greg Schohn <[email protected]>

Refactorings to simplify to helm packages

5e51f5b

Choose helm builtins for snake/kebab casing and put a condition on the grafana dashboard for the aggregate package. Signed-off-by: Greg Schohn <[email protected]>

Merge branch 'tanners-initial-k8' into KubernetesExperimentation

9443402

Signed-off-by: Greg Schohn <[email protected]> # Conflicts: # deployment/k8/migration-assistant/Chart.yaml

gregschohn changed the title ~~Kubernetes experimentation~~ Setup docker-compose like k8s deployment Jan 9, 2025

lewijacn reviewed Jan 9, 2025

View reviewed changes

Remove old code and preserve a few bits that haven't been integrated …

98d4fa3

…yet. Signed-off-by: Greg Schohn <[email protected]>

gregschohn force-pushed the KubernetesExperimentation branch from 1653442 to 98d4fa3 Compare January 10, 2025 13:51

Improvements based upon PR feedback.

98a3385

Signed-off-by: Greg Schohn <[email protected]>

gregschohn marked this pull request as ready for review January 10, 2025 14:15

gregschohn requested review from AndreKurait, chelma, mikaylathompson, peternied and sumobrian as code owners January 10, 2025 14:15

Merge branch 'main' into KubernetesExperimentation

722b073

lewijacn reviewed Jan 10, 2025

View reviewed changes

peternied reviewed Jan 10, 2025

View reviewed changes

peternied approved these changes Jan 10, 2025

View reviewed changes

PR feedback + a simple bugfix to compensate for the merge bug in helm…

a31e9dc

…/sprig Signed-off-by: Greg Schohn <[email protected]>

lewijacn approved these changes Jan 13, 2025

View reviewed changes

lewijacn merged commit de5ef3a into opensearch-project:main Jan 13, 2025
22 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup docker-compose like k8s deployment #1054

Setup docker-compose like k8s deployment #1054

gregschohn commented Oct 8, 2024 •

edited

Loading

codecov bot commented Oct 8, 2024 •

edited

Loading

lewijacn Jan 7, 2025

gregschohn Jan 10, 2025

lewijacn Jan 7, 2025

gregschohn Jan 10, 2025

lewijacn Jan 7, 2025

gregschohn Jan 9, 2025

lewijacn Jan 7, 2025

lewijacn Jan 7, 2025

lewijacn Jan 7, 2025

gregschohn Jan 10, 2025

lewijacn Jan 7, 2025

gregschohn Jan 10, 2025

lewijacn Jan 7, 2025

gregschohn Jan 10, 2025

lewijacn Jan 13, 2025

lewijacn Jan 9, 2025

gregschohn Jan 10, 2025

lewijacn Jan 9, 2025

gregschohn Jan 10, 2025

lewijacn Jan 10, 2025

gregschohn Jan 11, 2025

lewijacn Jan 13, 2025

lewijacn Jan 10, 2025

gregschohn Jan 11, 2025

peternied left a comment

peternied left a comment

peternied Jan 10, 2025

gregschohn Jan 11, 2025

peternied Jan 13, 2025 •

edited

Loading

peternied Jan 10, 2025

gregschohn Jan 11, 2025

peternied Jan 13, 2025

Setup docker-compose like k8s deployment #1054

Setup docker-compose like k8s deployment #1054

Conversation

gregschohn commented Oct 8, 2024 • edited Loading

Description

Installing

Configurations

Issues Resolved

Testing

Check List

codecov bot commented Oct 8, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peternied left a comment

Choose a reason for hiding this comment

peternied left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

peternied Jan 13, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregschohn commented Oct 8, 2024 •

edited

Loading

codecov bot commented Oct 8, 2024 •

edited

Loading

peternied Jan 13, 2025 •

edited

Loading