-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata add 1:many custom transformations #1225
Metadata add 1:many custom transformations #1225
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1225 +/- ##
============================================
- Coverage 80.59% 80.50% -0.09%
- Complexity 3079 3080 +1
============================================
Files 421 421
Lines 15658 15682 +24
Branches 1057 1062 +5
============================================
+ Hits 12619 12625 +6
- Misses 2395 2411 +16
- Partials 644 646 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
c8067f3
to
9e2899f
Compare
Signed-off-by: Andre Kurait <[email protected]>
9e2899f
to
d549041
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd recommend trying a slightly deeper cut on changing the existing API contracts. It feels like we're jamming two things together that don't want to be - and that's going to create maintenance problems going forward.
...ataMigration/src/test/java/org/opensearch/migrations/MultiTypeMappingTransformationTest.java
Show resolved
Hide resolved
RFS/src/main/java/org/opensearch/migrations/bulkload/transformers/CompositeTransformer.java
Outdated
Show resolved
Hide resolved
return Stream.of(transformers) | ||
.reduce( | ||
Stream.of(indexData), | ||
(stream, transformer) -> stream.flatMap(data -> transformer.transformIndexMetadata(data).stream()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The spirit of this call - slicing the output into separate subsequent transforms is really weird. For the JSON Transformer composite transformer code, we'll do direct composition. In this case, there's an implicit fanout because the type for the per-index metadata transform calls is a scalar Index - not a collection of Indices.
My opinion here, from the limited code that I've seen, would be to abstain from allowing the RFS-MetaData CompositeTransformer to host transforms that can return lists. Keep two types of transforms. Merge the outputs of the list one with a specific class that does fanout across another sequence of 'simple' transforms. That might be a few more lines of code, but it might be a lot clearer.
@@ -16,5 +18,5 @@ public interface Transformer { | |||
/** | |||
* Takes the raw JSON representing the Index Metadata of one version and returns a new, transformed copy of the JSON | |||
*/ | |||
public IndexMetadata transformIndexMetadata(IndexMetadata indexData); | |||
public List<IndexMetadata> transformIndexMetadata(IndexMetadata indexData); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the comment above about the composability problem. It's hard to get one's head around this once you know that we need to chain some transforms together.
What if you keep all of the "simple transforms" that are IndexMetadata->IndexMetadata. For when you need to return a list or take a list in, use an adapter that does the wrapping and any fanout. It feels capricious/fragile in the code below when we're boxing within a list.
|
||
/** | ||
* Adapts an {@link IJsonTransformer} to a {@link Transformer}. | ||
*/ | ||
@Slf4j | ||
public class TransformerToIJsonTransformerAdapter implements Transformer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the name of this class backward? It feels like the IJsonTransformerAdapter is being transformed into a Metadata Transformer interface, not the other way around.
transformedItems.add(MAPPER.convertValue(item, MigrationItem.class)); | ||
} | ||
} else { | ||
transformedItems.add(MAPPER.convertValue(transformedResult, MigrationItem.class)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: List.of() would work too
|
||
List<MigrationItem> transformedItems = new ArrayList<>(); | ||
if (transformedResult instanceof List) { | ||
for (Object item : (List<Object>) transformedResult) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: map/collect would work too - this and the next comment could return immutable lists. If you can't do that, you should add why (as I think of it, you might not be able to do that - so it is tempting).
for (IndexMetadata transformedMetadata : transformedMetadataList) { | ||
try { | ||
creationResults.add(indexCreator.create(transformedMetadata, mode, context)); | ||
} catch (Exception e) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How did sonarqube not flag catching a raw Exception?
try { | ||
indexMetadata = transformer.transformIndexMetadata(indexMetadata); | ||
return indexCreator.create(indexMetadata, mode, context); | ||
List<IndexMetadata> transformedMetadataList = transformer.transformIndexMetadata(indexMetadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree w/ the SonarQube warning - if this fails, just return a List.of(...).
Then, if it succeeds, go into the for loop w/ the separate try-catch blocks
List<CreationResult> creationResults; | ||
if (skipCreation.test(index.getName())) { | ||
log.atInfo() | ||
.setMessage("Index {} was not part of the allowlist and will not be migrated.") | ||
.addArgument(index.getName()) | ||
.log(); | ||
creationResult = CreationResult.builder() | ||
creationResults = List.of(CreationResult.builder() | ||
.name(index.getName()) | ||
.failureType(CreationFailureType.SKIPPED_DUE_TO_FILTER) | ||
.build(); | ||
.build()); | ||
} else { | ||
creationResult = createIndex(index.getName(), mode, context); | ||
creationResults = createIndex(index.getName(), mode, context); | ||
} | ||
|
||
results.index(creationResult); | ||
creationResults.forEach(results::index); | ||
|
||
var indexMetadata = metadataFactory.fromRepo(snapshotName, index.getName()); | ||
indexMetadata.getAliases().fieldNames().forEachRemaining(alias -> { | ||
var aliasResult = CreationResult.builder().name(alias); | ||
aliasResult.failureType(creationResult.getFailureType()); | ||
aliasResult.failureType(creationResults.get(0).getFailureType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, this code might be quicker to read if it were tighter (it would be easy to compress w/out the log statement - which could be moved to outside the construction.
Signed-off-by: Andre Kurait <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Provisionally approved with the understanding that there's a task filed to have all of the global metadata items (including indices) go through a single transformation as an index.
.flatMap(item -> transformMigrationItem(item).stream()) | ||
.collect(Collectors.toList()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's some risk here. If a custom transform maps some components to OS2 and leaves others behind. We'd really want/need to let version transforms run for some things and not others. As discussed w/ Andre, adding a version key to each item would help us identify the version in the transform and also let subsequent transforms know that it's already been updated to that schema.
@@ -122,7 +132,8 @@ public GlobalMetadata transformGlobalMetadata(GlobalMetadata globalData) { | |||
indexTemplates.stream()), | |||
componentTemplates.stream() | |||
) | |||
.map(this::transformMigrationItem).collect(Collectors.toList()); | |||
.flatMap(item -> transformMigrationItem(item).stream()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Converting this calling pattern to be a transform on the whole list would be safer (no collisions), give the user more flexibility, & probably also make it easier for genAI to create complete transforms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, will address in https://opensearch.atlassian.net/browse/MIGRATIONS-2301
Description
Issues Resolved
MIGRATIONS-2271
Testing
Existing code paths tested by existing tests.
1:Many transformations will be tested during a followup PR which adds multi type split as a provided javascript transform.
Check List
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.