Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setup docker-compose like k8s deployment #1054

Merged
merged 56 commits into from
Jan 13, 2025

Conversation

gregschohn
Copy link
Collaborator

@gregschohn gregschohn commented Oct 8, 2024

Description

Longer term, Kubernetes (K8s) will allow us

  1. To unify the single-node developer experience with and on prem experience and the cloud one.
  2. Simplify and speed up deployment, especially around testing.
  3. To enable auto-scaling
  • Category Enhancement / New feature
  • Why these changes are required? See above
  • What is the old behavior before changes and new behavior after changes? CDK deployment should still work as it did before.

For this first cut...

Installing

The k8s directory contains helm charts that can be installed with helm install charts/aggregate/migrationAssistant. Optionally, -n ma can be added to add the helm installation and MA resources to the "ma" namespace (or whatever is specified). aggregate/mockCustomerClusters is a helm chart for testing purposes to bring up the resources external to the migration assistant that are required to facilitate testing all of the migration assistant functionality.

Those aggregate charts pull other helm charts, either local to this repo (charts/components/(captureProxy|migrationConsole...) or 3rd party charts (opensearch, elasticsearch, prometheus, etc) into them. Each chart defines default values in the values.yaml contained within the charts directory, alongside the Chart.yaml manifest file. Those default values can be used to install a working version of the MA solution. Today's functionality is limited and buggy, but doing a reasonable demo install w/out providing any value overrides will be an ongoing requirement.

Before running helm install, you'll likely need to run helm dependency build CHART_DIR, which downloads or copies charts into the CHART_DIR/charts directory. There's a script (linkSubChartsToDependencies.sh), which still has some bugs, that attempts do do some of this manually w/ symlinks to local directories when possible. Symlinks are preferable to tarballs because you don't need to keep rebuilding the dependencies, which can be time consuming.

Configurations

Command line parameters are configured via helm through the values (such as values.yaml). Here's sample contents for the bulkLoad deployment.

parameters:
  initialLeaseDuration:
    value: PT10M
  documentsPerBulkRequest:
    value: 1000

All of Migration Assistant's custom processes use the same paradigm to translate those yaml parameters into the command line parameters. Shared helm charts (helmCommon) provide helper macros that allow consistent handling across our fleet of helm packages. For command line parsing, helm charts create deployments, mostly via the common code that does the following. 1) Create config map stores for each of the parameters specified in the values.yaml file; 2) load the values from those config maps as environment variables into an init container; and then 3) that init container runs shell script commands to construct the arguments that will be passed into the main program. For the last part, those arguments are passed via a file (vars.sh) within a shared mount and 4) the main container loads those variables before running the program.

The migration console has an extra init container that constructs the migration_services.yaml - which, like vars.sh is written into a shared container. However, since pods can't refresh environment variables when config maps update and because we're bundling those into a single services yaml file, we have a separate init container to maintain that. That container uses a custom shell script that uses the k8s client to watch config maps and formats the configured values themselves as yaml.

Issues Resolved

https://opensearch.atlassian.net/browse/MIGRATIONS-2287

Testing

Manual testing only w/ minikube on my mac. See localTesting.sh for some rough examples.

Check List

  • New functionality includes testing
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

There's a testObservability chart that will deploy jaeger, prometheus, and grafana - testing values are within the helmValues/localTesting/testObservability.yaml file.

Signed-off-by: Greg Schohn <[email protected]>
… to be deployed hierarchically via helm packages

Signed-off-by: Greg Schohn <[email protected]>
…to work as expected.

Jaeger & Prometheus have been deployed, but I don't have a collector and haven't confirmed that they're behaving as expected.
I still need to 1) setup kafka, flip the proxy to write to kafka; 2) setup the replayer 3) setup otel-collectors; 4) setup RFS.

Signed-off-by: Greg Schohn <[email protected]>
Signed-off-by: Greg Schohn <[email protected]>
Signed-off-by: Greg Schohn <[email protected]>
Copy link

codecov bot commented Oct 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.48%. Comparing base (38513f9) to head (a31e9dc).
Report is 57 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1054      +/-   ##
============================================
- Coverage     80.49%   80.48%   -0.02%     
+ Complexity     3080     3077       -3     
============================================
  Files           421      421              
  Lines         15682    15683       +1     
  Branches       1062     1062              
============================================
- Hits          12624    12623       -1     
+ Misses         2411     2410       -1     
- Partials        647      650       +3     
Flag Coverage Δ
unittests 80.48% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lewijacn and others added 20 commits October 10, 2024 12:45
Signed-off-by: Tanner Lewis <[email protected]>
Signed-off-by: Greg Schohn <[email protected]>
Signed-off-by: Tanner Lewis <[email protected]>
…nsearch-migrations into KubernetesExperimentation
@gregschohn gregschohn force-pushed the KubernetesExperimentation branch from 763de17 to ad8be18 Compare October 31, 2024 15:16
…ith the k8s config infrastructure

Signed-off-by: Greg Schohn <[email protected]>
…onAssistant and see the replayer and console come up.

I'm no longer setting the namespaces manually, but relying upon helm to do that w/ `helm install -n ma ...`.
The buildArgumentsBuilderScript can now handle positional arguments.  It takes in a list of keys that should be used positionally instead of via flag.  Those values can still be lists or scalar values.  It's expected that the positional mappings are specified by the chart templates rather than the values.  Environment variables are still setup and bound to snake-case variants of the parameter keys pulled in as values.
The MA chart now includes all of the charts that could be required, but installs them conditionally.
Kafka charts have been moved into shared (for the base kafka chart) and components (for the traffic capture one) - I'll have more to do on integrating the kafka chart and operator, but this is just enough to hold progress on kafka work before really taking it on.  The kafka topic chart is probably gone permanently - or for quite a while - since the proxy will auto-create the topic anyway.
Install the bulk loader w/ 0 replicas so that it won't run before a snapshot has been created.  This puts it roughly where we are w/ a CDK deployment.  The capture proxy also starts in a state w/ 0 replicas.  The replayer DOES have replicas, but that will likely change down to 0 eventually too (and kafka will be loaded on-demand too).  Some of those lazy features will require some more support in the migration console though, so we'll continue along w/ a simpler, if costlier, deployment until then.
I'm setting the replica count to 0 for a number of services

Signed-off-by: Greg Schohn <[email protected]>
…nsearch-migrations into KubernetesExperimentation

Signed-off-by: Greg Schohn <[email protected]>

# Conflicts:
#	deployment/k8s/charts/aggregates/migrationAssistant/Chart.yaml
#	deployment/k8s/charts/aggregates/migrationAssistant/values.yaml
…c kafka broker, and console install cleanly.

Run `helm install -n ma ma charts/aggregates/migrationAssistant` and wait for the kafka cluster to come up.  Once it does, the replayer logs show that the replayer has joined the group.
I still need to verify that the proxy can be instantiated and that e2e traffic can flow from a source to a target.

Signed-off-by: Greg Schohn <[email protected]>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this testObservability directory will be deleted and its functionality merged into MA

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Thanks. I've moved the one dashboard for grafana that was in it into the MA chart (and made it conditional)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like we could remove this entire localTesting directory since we are using default values from values.yaml for the given charts

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good point. I've pulled the remaining ones into the MA chart. I've been less and less concerned with deploying just one pod as I've been developing & conditional installs on the top-level chart to deploy specific things might be easier for users and us anyway.

@@ -105,19 +105,19 @@ public static class Parameters {

@Parameter(
required = false,
names = {REMOVE_AUTH_HEADER_VALUE_ARG },
names = {REMOVE_AUTH_HEADER_VALUE_ARG, "--removeAuthHeader" },
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: A bit weird having static field variables as well as hardcoded values for the same pattern in these files

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes - the static field is used to print out some error messages. It seemed like it would be even more complicated though to get the language agreements right if both aliases were included

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder from chat: Would be nice to have a README explaining the purpose/usage of each of these helper templates

Comment on lines 18 to 26
spec:
initContainers:
{{- include "generic.setupEnvLoadInitContainer" (merge . (dict
"MountName" $mountName
"include" .Template.Include)) | nindent 8 }}
- name: wait-for-kafka
image: bitnami/kubectl:latest # or any image with curl/kubectl
command: [ 'sh', '-c',
'until kubectl get configmap kafka-config; do echo waiting for kafka config; sleep 1; done' ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to think through what happens if the loaded environment changes while Kafka is still spinning up and our wait condition hasn't been met

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we will remove this

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - this was just to support my development (contents are also in the console_link/README.md

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are your intentions with the RBAC setup here? I was gonna hold on this for a bit

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was required because by default, which is "deny", those containers couldn't get access to the config maps.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this k8 directory will get dropped from this PR when its ready. The README.md is likely worth copying over with some minor modification. The minikube stuff I have is maybe a more useful reference as we figure out how we want to test with minikube

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember that I need to change something in the readme or build scripts but I don't remember what. Please help to remind me?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe I had at least one section in the README.md that was referencing the minikubeLocal.sh script that I had. Since we have kept that script for now, I think we are fine

…nd in some cases more consistent.

Signed-off-by: Greg Schohn <[email protected]>
Choose helm builtins for snake/kebab casing and put a condition on the grafana dashboard for the aggregate package.

Signed-off-by: Greg Schohn <[email protected]>
Fixed a couple other issues along the way and found an issue that helm's merge function mutates the left dictionary, which was surprising and will require re-evaluation of every application.  See helm/helm#13308

Signed-off-by: Greg Schohn <[email protected]>
Signed-off-by: Greg Schohn <[email protected]>

# Conflicts:
#	deployment/k8/migration-assistant/Chart.yaml
@gregschohn gregschohn changed the title Kubernetes experimentation Setup docker-compose like k8s deployment Jan 9, 2025
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need a Pipfile.lock in this directory

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:-). Yeah, the tests just told me that!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably revert changes to this file and have our own services.yaml we feed in for now as I'm not sure what impact this will cause

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WOW - thank you. I clearly didn't look at a merge that I took in carefully enough. I didn't even realize that this had happened.

@gregschohn gregschohn force-pushed the KubernetesExperimentation branch from 1653442 to 98d4fa3 Compare January 10, 2025 13:51
@gregschohn gregschohn marked this pull request as ready for review January 10, 2025 14:15
- name: wait-for-kafka
image: bitnami/kubectl:latest # or any image with curl/kubectl
command: [ 'sh', '-c',
'until kubectl wait --for=condition=Ready kafka/captured-traffic -n {{.Release.Namespace }} --timeout=10s; do echo waiting for kafka cluster is ready; sleep 1; done' ]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If kafka is deployed in a different context won't this wait check always fail?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

define different context? Generally, yes - Kafka will need to be installed in the same namespace and it will need to have that specific name. We'll want to improve this to support a bring-your-own-kafka at some point, but I don't know how big of a concern that is. I'd like to have somebody ask for that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main point here was that we probably need some way to provide the namespace that kafka is in otherwise it seems like it will fail if it continues to check its own namespace and kafka is in a different namespace

"MountName" "merged-config"
"include" .Template.Include) .) | nindent 8 }}
containers:
- name: console
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename to test-console

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for getting this out for review @gregschohn great work, still digesting the k8 directory

Copy link
Member

@peternied peternied left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I've got outstanding comments, this is a great step forward thanks @gregschohn and @lewijacn

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets add tests for this tool.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

config_watcher will at the very least change significantly once we start to extend the migration console to work directly with k8s. At this point, config_watcher isn't really being tested (because we don't update config values at runtime). Once we do - if we're still in need of services yaml, we'll want to put some integ tests together - and unit tests on the specific contract that we're converging to will also make sense. At this point, the whole k8sConfigMapUtilScripts image should be considered an experimental bridge.

Copy link
Member

@peternied peternied Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm OK if we don't have a good test story so long as we have a JIRA task that includes this testing as an AC. Can you point a link to one here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The project structure is throwing me a little bit here, I'm not sure why we need .../k8s/charts/..., it seems like we could get away with structuring, is there something I'm missing about what we put where?

deployment/
-> charts/
   -> README.md
   -> {chart deployment scripts}
   -> aggregates/...
   -> components/...
   -> sharedResources/...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Charts are a helm construct. We'll also want to deploy EKS and AWS specific things for an EKS related deployment. We also might want tests, etc that are specific to k8s or helm. Everything w/in charts today is specific to helm. I can see it having siblings that are for k8s, but not specific to helm charts. Does that help?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I certainly agree we might have AWS / non-AWS specific functionality. I'm not sure I see the difference between k8s and helm, or to be more specific, I don't see k8s without helm.

When talking about support for other platforms my preference would be a single declaration for the canonical Migration Assistant. Similarly there would be one component that represents the Migration Console. Does this align with what you are thinking?

deployment/k8s/aws/ack-resource-setup/.helmignore Outdated Show resolved Hide resolved
@lewijacn lewijacn merged commit de5ef3a into opensearch-project:main Jan 13, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants