Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controlling the Migration Assistant Control Plane #1060

Conversation

peternied
Copy link
Member

Description

I've taken a stab at writing a design, making changes to get metadata migration pulling from the services.yaml, updating how MA console handles tool arguments, and looking for some edge cases such as RFS where we do considerable work with parameters before the java process.

This was done quickly in the spirit of creating a discussions around interesting areas.

Check List

  • New functionality includes testing
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Oct 10, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.67%. Comparing base (bee316d) to head (c2920ff).
Report is 19 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1060      +/-   ##
============================================
+ Coverage     80.16%   87.67%   +7.51%     
============================================
  Files           367       11     -356     
  Lines         13743      284   -13459     
  Branches        949        0     -949     
============================================
- Hits          11017      249   -10768     
+ Misses         2149       35    -2114     
+ Partials        577        0     -577     
Flag Coverage Δ
gradle-test ?
python-test ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

}


# Discussion needed: Container will have a minimum number of
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧵 For Discussion

# Secrets from ARNs so this seems like it aligns well to do this
# at the same time.

# Discussion needed: Directly cleanup was being done by script based
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧵 For Discussion

.parse(args);

// Then override with settings
var config = getConfig(args);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧵 For Discussion

else:
logger.warning(f"Ignoring extra value {arg}, there was no command name before it")
i += 1
# Discussion Needed: Verification of the config should be done by
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧵 For Discussion

SCHEMA = {
"from_snapshot": FROM_SNAPSHOT_SCHEMA,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might need to reorganize the layout of the existing services.yaml, e.g. the snapshot definition seems like it should be separate from the snapshot tool

Copy link
Collaborator

@gregschohn gregschohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't merge this.

I'm worried that this creates a strong dependency between the migration console's global (union'ed) schema and every single one of our projects, forcing people embedding just one program to need to know about all others. For example, a formatting change within the migration console would require a recompile for the application layers. As they are today, the command line parameters are self-documenting. That creates a contract that's easy to understand.

This tax of maintaining those makes the tools much more accessible (think about the governance proxies that have been forked). This project has more interesting tasks if we think about independent components rather than bundling/coupling them together.

@@ -41,6 +41,7 @@ dependencies {

// JCommander
compileOnly group: 'org.jcommander', name: 'jcommander'
implementation group: 'org.yaml', name: 'snakeyaml'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do you need snakeyaml over jackson? They both have a healthy stream of CVE's associated with them, but AFAIK currently, we don't have this as a dependency and we do have a dependency in pretty much every project on jackson. Can we keep out maintenance costs down & stick to only jackson for yaml needs too?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a prototype of what it would take to move in this direction where I quickly jumped to snakeyaml, make sense to use jackson when we implement the change.

@@ -0,0 +1,7 @@
package org.opensearch.migrations.config;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I did the instrumentation work, I wanted to make sure that a change needed for one project wouldn't impact in any way any other project that didn't need to consume the change. At the every-day level, this includes builds and build graphs that are as tight as possible. It also self-documents the extent that datastructures and patterns are consumed. If I see Backfill and Cluster, I'm going to presume that every single project that uses parameters will need these. In fact though, some significant services don't - like the proxy or cluster have no use for many of these and never will.

We have a single repo for this project, but I would pretend that a different person were in the sole maintainer for each project. If that was the case, would this aggregation be worth the overhead?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that any change in coreUtilities requires a rebuild. And as I understand it its purpose, we put shared functionality between all components in our migration ecosystem. If the services.yaml is used by all the projects in the solution including the java projects, the place that seems logical to put this shared parsing code would be in the coreUtilities project.

I believe the relationship is worthwhile to be sure that all components can read the same services.yaml definition in the project. How else would you have a shared definition, where else should it go?

@gregschohn
Copy link
Collaborator

gregschohn commented Oct 17, 2024

I'm continuing to try to wrap my head around this. From the diagrams, it looks like there are two concerns. The first is the separation of deployment vs runtime parameters. The second is how applications are configured at runtime.

For the first case, as part of the K8s work, we're looking to redesign the deployment/runtime experience and split deployment parameters (vpcIds, service names) from runtime parameters (scale values, transforms to use, etc). We can also make it so that there are two tiers of deployment parameters so that a change in target endpoint, etc doesn't cause the rest of the resources (console, replayer, etc) to need to be deployed if not applicable. We can also determine what's best for the user and what should be deferred until runtime and is mutable vs what is set at deployment time and immutable and maybe what's set at deployment time but mutable.

I see the migration console (orchestrator from our architecture docs) managing all of this. Each application takes what it needs from that orchestrator that understands every interface and how to stitch them together.

The second issue here is how to manage the datamodel for the migration console and the consistency between the orchestrator and its worker nodes. The proposal here is unappealing to me because it doesn't scale immediately since every application needs to be aware of every other applications' interfaces. Today, we have a yaml database (services.yaml) keep the union of all the configurations and the orchestrator uses that. If we'd want to make it so that that file can be passed and unknown arguments were ignored, that's probably fine. But I wouldn't want to see that functionality within the application depend upon specific infrastructure, like requiring a shared fs or access to secrets manager, etc. That should go into the container definition for the application or outside of it.

If we want to make sure that the models are synchronized between the layers, we should build up model classes (like many web-services do) for each application and have the migration console take dependencies on all of those & concatenate them into one services.yaml spec. That eliminates the need for everybody knowing about everything and still gets the migration console a full-view.

I'd be happy to see us continue to use some shared libraries for things like transformation configuration like what we're doing here. We can continue to extend the functionality of our parameter passing logic be easier to share & to have more features - like being able to read from a file or command line parameters (there's a @file annotation w/ JCommander).

@gregschohn
Copy link
Collaborator

I think there's a good intention & I think there's a path forward, but I also doubt if this is a top priority over cleaning up the overall deployment story (K8s), scaling our services, or putting extensible transformations in place so that users can do more. I don't see how we can derive enough short term benefit to warrant an investment here right now for the first point. I haven't noticed a huge cost yet for the second point and it feels like it's mostly where it needs to be right now.

@peternied
Copy link
Member Author

This prototype served its purpose, had discussion with the team, notes are captured on the JIRA https://opensearch.atlassian.net/browse/MIGRATIONS-2075

Will follow up with this when it is critical path to future efforts.

@peternied peternied closed this Oct 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants