Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata add 1:many custom transformations #1225

Merged
merged 2 commits into from
Jan 9, 2025

Conversation

AndreKurait
Copy link
Member

@AndreKurait AndreKurait commented Jan 8, 2025

Description

  1. Updates MetadataMigration to receive and handle List returned by IJsonTransformer, this adds support for 1:many custom transforms (e.g. multi type index split)
    • Note: A future change will add support for built in transform to handle the split case.
  2. Updates the TypeSanitization built in transformer to throw an error when encountering multi type index and behavior is not set to Union. (Previous behavior was to return Unsupported which would ignore the transformation which would result in an error during cluster put)

Issues Resolved

MIGRATIONS-2271

Testing

Existing code paths tested by existing tests.

1:Many transformations will be tested during a followup PR which adds multi type split as a provided javascript transform.

Check List

  • New functionality includes testing
  • Public documentation issue/PR created, if applicable.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link

codecov bot commented Jan 8, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 80.50%. Comparing base (9781a9b) to head (3be669a).
Report is 1 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #1225      +/-   ##
============================================
- Coverage     80.59%   80.50%   -0.09%     
- Complexity     3079     3080       +1     
============================================
  Files           421      421              
  Lines         15658    15682      +24     
  Branches       1057     1062       +5     
============================================
+ Hits          12619    12625       +6     
- Misses         2395     2411      +16     
- Partials        644      646       +2     
Flag Coverage Δ
unittests 80.50% <ø> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@gregschohn gregschohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend trying a slightly deeper cut on changing the existing API contracts. It feels like we're jamming two things together that don't want to be - and that's going to create maintenance problems going forward.

return Stream.of(transformers)
.reduce(
Stream.of(indexData),
(stream, transformer) -> stream.flatMap(data -> transformer.transformIndexMetadata(data).stream()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The spirit of this call - slicing the output into separate subsequent transforms is really weird. For the JSON Transformer composite transformer code, we'll do direct composition. In this case, there's an implicit fanout because the type for the per-index metadata transform calls is a scalar Index - not a collection of Indices.

My opinion here, from the limited code that I've seen, would be to abstain from allowing the RFS-MetaData CompositeTransformer to host transforms that can return lists. Keep two types of transforms. Merge the outputs of the list one with a specific class that does fanout across another sequence of 'simple' transforms. That might be a few more lines of code, but it might be a lot clearer.

@@ -16,5 +18,5 @@ public interface Transformer {
/**
* Takes the raw JSON representing the Index Metadata of one version and returns a new, transformed copy of the JSON
*/
public IndexMetadata transformIndexMetadata(IndexMetadata indexData);
public List<IndexMetadata> transformIndexMetadata(IndexMetadata indexData);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the comment above about the composability problem. It's hard to get one's head around this once you know that we need to chain some transforms together.

What if you keep all of the "simple transforms" that are IndexMetadata->IndexMetadata. For when you need to return a list or take a list in, use an adapter that does the wrapping and any fanout. It feels capricious/fragile in the code below when we're boxing within a list.


/**
* Adapts an {@link IJsonTransformer} to a {@link Transformer}.
*/
@Slf4j
public class TransformerToIJsonTransformerAdapter implements Transformer {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the name of this class backward? It feels like the IJsonTransformerAdapter is being transformed into a Metadata Transformer interface, not the other way around.

transformedItems.add(MAPPER.convertValue(item, MigrationItem.class));
}
} else {
transformedItems.add(MAPPER.convertValue(transformedResult, MigrationItem.class));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: List.of() would work too


List<MigrationItem> transformedItems = new ArrayList<>();
if (transformedResult instanceof List) {
for (Object item : (List<Object>) transformedResult) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: map/collect would work too - this and the next comment could return immutable lists. If you can't do that, you should add why (as I think of it, you might not be able to do that - so it is tempting).

for (IndexMetadata transformedMetadata : transformedMetadataList) {
try {
creationResults.add(indexCreator.create(transformedMetadata, mode, context));
} catch (Exception e) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did sonarqube not flag catching a raw Exception?

try {
indexMetadata = transformer.transformIndexMetadata(indexMetadata);
return indexCreator.create(indexMetadata, mode, context);
List<IndexMetadata> transformedMetadataList = transformer.transformIndexMetadata(indexMetadata);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree w/ the SonarQube warning - if this fails, just return a List.of(...).
Then, if it succeeds, go into the for loop w/ the separate try-catch blocks

Comment on lines +37 to +56
List<CreationResult> creationResults;
if (skipCreation.test(index.getName())) {
log.atInfo()
.setMessage("Index {} was not part of the allowlist and will not be migrated.")
.addArgument(index.getName())
.log();
creationResult = CreationResult.builder()
creationResults = List.of(CreationResult.builder()
.name(index.getName())
.failureType(CreationFailureType.SKIPPED_DUE_TO_FILTER)
.build();
.build());
} else {
creationResult = createIndex(index.getName(), mode, context);
creationResults = createIndex(index.getName(), mode, context);
}

results.index(creationResult);
creationResults.forEach(results::index);

var indexMetadata = metadataFactory.fromRepo(snapshotName, index.getName());
indexMetadata.getAliases().fieldNames().forEachRemaining(alias -> {
var aliasResult = CreationResult.builder().name(alias);
aliasResult.failureType(creationResult.getFailureType());
aliasResult.failureType(creationResults.get(0).getFailureType());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, this code might be quicker to read if it were tighter (it would be easy to compress w/out the log statement - which could be moved to outside the construction.

Copy link
Collaborator

@gregschohn gregschohn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Provisionally approved with the understanding that there's a task filed to have all of the global metadata items (including indices) go through a single transformation as an index.

Comment on lines +135 to +136
.flatMap(item -> transformMigrationItem(item).stream())
.collect(Collectors.toList());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's some risk here. If a custom transform maps some components to OS2 and leaves others behind. We'd really want/need to let version transforms run for some things and not others. As discussed w/ Andre, adding a version key to each item would help us identify the version in the transform and also let subsequent transforms know that it's already been updated to that schema.

@@ -122,7 +132,8 @@ public GlobalMetadata transformGlobalMetadata(GlobalMetadata globalData) {
indexTemplates.stream()),
componentTemplates.stream()
)
.map(this::transformMigrationItem).collect(Collectors.toList());
.flatMap(item -> transformMigrationItem(item).stream())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Converting this calling pattern to be a transform on the whole list would be safer (no collisions), give the user more flexibility, & probably also make it easier for genAI to create complete transforms.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AndreKurait AndreKurait merged commit ee3fcc5 into opensearch-project:main Jan 9, 2025
22 checks passed
@AndreKurait AndreKurait deleted the Metadata1Many branch January 9, 2025 18:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants