Controlling the Migration Assistant Control Plane #1060

peternied · 2024-10-10T21:49:12Z

Description

I've taken a stab at writing a design, making changes to get metadata migration pulling from the services.yaml, updating how MA console handles tool arguments, and looking for some edge cases such as RFS where we do considerable work with parameters before the java process.

This was done quickly in the spirit of creating a discussions around interesting areas.

Check List

New functionality includes testing
- All tests pass, including unit test, integration test and doctest
New functionality has been documented
Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Peter Nied <[email protected]>

codecov · 2024-10-10T21:50:03Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.67%. Comparing base (bee316d) to head (c2920ff).
Report is 19 commits behind head on main.

Additional details and impacted files

@@             Coverage Diff              @@
##               main    #1060      +/-   ##
============================================
+ Coverage     80.16%   87.67%   +7.51%     
============================================
  Files           367       11     -356     
  Lines         13743      284   -13459     
  Branches        949        0     -949     
============================================
- Hits          11017      249   -10768     
+ Misses         2149       35    -2114     
+ Partials        577        0     -577

Flag	Coverage Δ
gradle-test	`?`
python-test	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

peternied · 2024-10-10T21:49:38Z

DocumentsFromSnapshotMigration/docker/entrypoint.sh

-}
-
-
+# Discussion needed: Container will have a minimum number of


🧵 For Discussion

peternied · 2024-10-10T21:49:50Z

DocumentsFromSnapshotMigration/docker/entrypoint.sh

+# Secrets from ARNs so this seems like it aligns well to do this
+# at the same time.
+
+# Discussion needed: Directly cleanup was being done by script based


🧵 For Discussion

peternied · 2024-10-10T21:50:07Z

MetadataMigration/src/main/java/org/opensearch/migrations/MetadataMigration.java

+            .parse(args);
+
+        // Then override with settings
+        var config = getConfig(args);


🧵 For Discussion

peternied · 2024-10-10T21:50:28Z

...erSolution/src/main/docker/migrationConsole/lib/console_link/console_link/models/metadata.py

-            else:
-                logger.warning(f"Ignoring extra value {arg}, there was no command name before it")
-                i += 1
+    # Discussion Needed: Verification of the config should be done by


🧵 For Discussion

peternied · 2024-10-10T21:53:32Z

...erSolution/src/main/docker/migrationConsole/lib/console_link/console_link/models/metadata.py

 SCHEMA = {
-    "from_snapshot": FROM_SNAPSHOT_SCHEMA,


I think we might need to reorganize the layout of the existing services.yaml, e.g. the snapshot definition seems like it should be separate from the snapshot tool

gregschohn

Please don't merge this.

I'm worried that this creates a strong dependency between the migration console's global (union'ed) schema and every single one of our projects, forcing people embedding just one program to need to know about all others. For example, a formatting change within the migration console would require a recompile for the application layers. As they are today, the command line parameters are self-documenting. That creates a contract that's easy to understand.

This tax of maintaining those makes the tools much more accessible (think about the governance proxies that have been forked). This project has more interesting tasks if we think about independent components rather than bundling/coupling them together.

Signed-off-by: Peter Nied <[email protected]>

gregschohn · 2024-10-16T04:33:00Z

coreUtilities/build.gradle

@@ -41,6 +41,7 @@ dependencies {

    // JCommander
    compileOnly group: 'org.jcommander', name: 'jcommander'
+    implementation group: 'org.yaml', name: 'snakeyaml'


why do you need snakeyaml over jackson? They both have a healthy stream of CVE's associated with them, but AFAIK currently, we don't have this as a dependency and we do have a dependency in pretty much every project on jackson. Can we keep out maintenance costs down & stick to only jackson for yaml needs too?

This is a prototype of what it would take to move in this direction where I quickly jumped to snakeyaml, make sense to use jackson when we implement the change.

gregschohn · 2024-10-16T04:39:18Z

coreUtilities/src/main/java/org/opensearch/migrations/config/Backfill.java

@@ -0,0 +1,7 @@
+package org.opensearch.migrations.config;


When I did the instrumentation work, I wanted to make sure that a change needed for one project wouldn't impact in any way any other project that didn't need to consume the change. At the every-day level, this includes builds and build graphs that are as tight as possible. It also self-documents the extent that datastructures and patterns are consumed. If I see Backfill and Cluster, I'm going to presume that every single project that uses parameters will need these. In fact though, some significant services don't - like the proxy or cluster have no use for many of these and never will.

We have a single repo for this project, but I would pretend that a different person were in the sole maintainer for each project. If that was the case, would this aggregation be worth the overhead?

I agree that any change in coreUtilities requires a rebuild. And as I understand it its purpose, we put shared functionality between all components in our migration ecosystem. If the services.yaml is used by all the projects in the solution including the java projects, the place that seems logical to put this shared parsing code would be in the coreUtilities project.

I believe the relationship is worthwhile to be sure that all components can read the same services.yaml definition in the project. How else would you have a shared definition, where else should it go?

Signed-off-by: Peter Nied <[email protected]>

gregschohn · 2024-10-17T14:16:33Z

I'm continuing to try to wrap my head around this. From the diagrams, it looks like there are two concerns. The first is the separation of deployment vs runtime parameters. The second is how applications are configured at runtime.

For the first case, as part of the K8s work, we're looking to redesign the deployment/runtime experience and split deployment parameters (vpcIds, service names) from runtime parameters (scale values, transforms to use, etc). We can also make it so that there are two tiers of deployment parameters so that a change in target endpoint, etc doesn't cause the rest of the resources (console, replayer, etc) to need to be deployed if not applicable. We can also determine what's best for the user and what should be deferred until runtime and is mutable vs what is set at deployment time and immutable and maybe what's set at deployment time but mutable.

I see the migration console (orchestrator from our architecture docs) managing all of this. Each application takes what it needs from that orchestrator that understands every interface and how to stitch them together.

The second issue here is how to manage the datamodel for the migration console and the consistency between the orchestrator and its worker nodes. The proposal here is unappealing to me because it doesn't scale immediately since every application needs to be aware of every other applications' interfaces. Today, we have a yaml database (services.yaml) keep the union of all the configurations and the orchestrator uses that. If we'd want to make it so that that file can be passed and unknown arguments were ignored, that's probably fine. But I wouldn't want to see that functionality within the application depend upon specific infrastructure, like requiring a shared fs or access to secrets manager, etc. That should go into the container definition for the application or outside of it.

If we want to make sure that the models are synchronized between the layers, we should build up model classes (like many web-services do) for each application and have the migration console take dependencies on all of those & concatenate them into one services.yaml spec. That eliminates the need for everybody knowing about everything and still gets the migration console a full-view.

I'd be happy to see us continue to use some shared libraries for things like transformation configuration like what we're doing here. We can continue to extend the functionality of our parameter passing logic be easier to share & to have more features - like being able to read from a file or command line parameters (there's a @file annotation w/ JCommander).

gregschohn · 2024-10-17T14:20:09Z

I think there's a good intention & I think there's a path forward, but I also doubt if this is a top priority over cleaning up the overall deployment story (K8s), scaling our services, or putting extensible transformations in place so that users can do more. I don't see how we can derive enough short term benefit to warrant an investment here right now for the first point. I haven't noticed a huge cost yet for the second point and it feels like it's mostly where it needs to be right now.

peternied · 2024-10-24T18:58:08Z

This prototype served its purpose, had discussion with the team, notes are captured on the JIRA https://opensearch.atlassian.net/browse/MIGRATIONS-2075

Will follow up with this when it is critical path to future efforts.

Controlling the Migration Assistant Control Plane

2c3393c

Signed-off-by: Peter Nied <[email protected]>

peternied commented Oct 10, 2024

View reviewed changes

gregschohn requested changes Oct 11, 2024

View reviewed changes

Add graph showing disconnect in extra cli args scenario

c2920ff

Signed-off-by: Peter Nied <[email protected]>

gregschohn reviewed Oct 16, 2024

View reviewed changes

Small notes around when parameters are read

8e06740

Signed-off-by: Peter Nied <[email protected]>

peternied closed this Oct 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Controlling the Migration Assistant Control Plane #1060

Controlling the Migration Assistant Control Plane #1060

peternied commented Oct 10, 2024

codecov bot commented Oct 10, 2024 •

edited

Loading

peternied Oct 10, 2024

peternied Oct 10, 2024

peternied Oct 10, 2024

peternied Oct 10, 2024

peternied Oct 10, 2024

gregschohn left a comment •

edited

Loading

gregschohn Oct 16, 2024

peternied Oct 16, 2024

gregschohn Oct 16, 2024

peternied Oct 16, 2024

gregschohn commented Oct 17, 2024 •

edited

Loading

gregschohn commented Oct 17, 2024

peternied commented Oct 24, 2024

		}


		# Discussion needed: Container will have a minimum number of

Controlling the Migration Assistant Control Plane #1060

Controlling the Migration Assistant Control Plane #1060

Conversation

peternied commented Oct 10, 2024

Description

Check List

codecov bot commented Oct 10, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregschohn left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gregschohn commented Oct 17, 2024 • edited Loading

gregschohn commented Oct 17, 2024

peternied commented Oct 24, 2024

codecov bot commented Oct 10, 2024 •

edited

Loading

gregschohn left a comment •

edited

Loading

gregschohn commented Oct 17, 2024 •

edited

Loading