Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce Immutables library #282

Closed
wants to merge 11 commits into from
Closed

Introduce Immutables library #282

wants to merge 11 commits into from

Conversation

adutra
Copy link
Contributor

@adutra adutra commented Sep 11, 2024

This PR introduces the Immutables library:

https://immutables.github.io/

This library is widely used in Iceberg and many other projects. The ability to define immutable components in an easy way, without struggling with the nitty-gritty details (getters, builders, equals, hashCode, toString, etc.) is a nice productivity boost and improves code correctness by enforcing immutability by default.

This PR only adds the library and converts a few components that were good candidates for immutability. If the change is accepted, there are probably more components that could be progressively converted into immutables.

Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a nice improvement to me :)

@ashvina
Copy link

ashvina commented Sep 12, 2024

I agree, immutables and Lombok are valuable development tools, and I have used them in other projects. Thank you for the contribution.
IMO, the use of immutables is an internal detail, and immutables should be replaceable without breaking compatibility. It's not necessary for users to be aware of it. Perhaps it would be more appropriate to include the new annotations in the core module instead. WDYT?
Additionally, sharing high level guidelines with other developers on migrating existing code and documenting new annotations would be beneficial.

@adutra
Copy link
Contributor Author

adutra commented Sep 12, 2024

Perhaps it would be more appropriate to include the new annotations in the core module instead. WDYT?

Unfortunately, that's not feasible. The procedure for adding a custom annotation requires the annotation to be placed in a separate module, because it needs to be present on the annotation processor's classpath. It's explained here: https://immutables.github.io/style.html#custom-immutable-annotation.

Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is overall a very nice improvement!

The only thing that worries me a little bit is the change in the PolarisStorageConfigurationInfo hierarchy. I think we should keep the original test - it might work when overriding validate() in StorageConfigurationOverride. I suspect, this change didn't break anything, but to be on the safe side wrt polymorphism can you add a test to validate that the JSON-serialized representation didn't change? Some test that passes for the state before this change and with this change. I think, then we're good here.

this.externalId = externalId;
validateMaxAllowedLocations(MAX_ALLOWED_LOCATIONS);
@Override
public abstract List<String> getAllowedLocations();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think those overrides can go away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added these overrides just to avoid warnings like this one:

> Task :polaris-core:compileJava
polaris-core/src/main/java/org/apache/polaris/core/storage/aws/AwsStorageConfigurationInfo.java:35: warning: (immutables:subtype) Constructor parameters should be better defined on the same level of inheritance hierarchy,  otherwise generated constructor API would be unstable:  parameter list can change the order of arguments. It is better redeclare (override) each inherited attribute parameter in this abstract value type to avoid this warning. Or better have constructor parameters defined by only single supertype.
public abstract class AwsStorageConfigurationInfo extends PolarisStorageConfigurationInfo {
                ^
1 warning

Is there a better way to handle this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's a good idea to set allParameters = true by default in @PolarisImmutable, wdyt? As soon as the interface extends other interfaces, the warning pops up unless all methods are overridden.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - it's just a warning. But warnings are annoying.
It's okay to remove allParameters = true.

}

T cast(Object value) {
return this.typ.cast(value);
@Value.Lazy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@Value.Lazy
@Value.Derived

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Err, I meant - doesn't need to be stored.

Suggested change
@Value.Lazy
@Value.NonAttribute

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: this suggestion is NOT applied

Copy link
Contributor Author

@adutra adutra Sep 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I applied it to the wrong method 🤦‍♂️

@@ -94,26 +88,11 @@ public void testGetSubscopedCreds() {
.containsEntry(PolarisCredentialProperty.AWS_SECRET_KEY, "secretKey");
}

@ParameterizedTest
@ValueSource(strings = {AWS_PARTITION, "aws-cn", "aws-us-gov"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just comment out the last two ones for now?
Or probably better: override validate() in StorageConfigurationOverride, which seems to be the only case where the locations are not validated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right that I need to override validate() in StorageConfigurationOverride – the current code looks really fishy as it mixes validation on the override instance vs on the delegate instance.

But unfortunately the problem with ARN domains cannot be solved like this: the validateArn() method belongs to the AWS subtype and the subtype is being constructed directly in the test, with the forbidden domains. I will find something else.

@JsonSubTypes.Type(value = AzureStorageConfigurationInfo.class),
@JsonSubTypes.Type(value = GcpStorageConfigurationInfo.class),
@JsonSubTypes.Type(value = FileStorageConfigurationInfo.class),
@JsonSubTypes.Type(value = ImmutableAwsStorageConfigurationInfo.class),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're changing the types in this hierarchy quite a bit.
Can you add a test class (or tests in the individual sub-type test classes) to ensure that the serialized JSON format is still the same?

@MethodSource
void serialize(PolarisStorageConfigurationInfo instance, String typeName) {
String json = instance.serialize();
assertThat(json).contains("\"@type\":\"" + typeName + "\"");
Copy link
Contributor Author

@adutra adutra Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@snazy I manually ran this test against main and verified that the serialized json is similar: the @type attribute is the same and the rest is similar in the sense that the new json from this branch generally contains more properties than the json from main (you cannot just add @JsonIgnore and expect the immutables library to honor it: if the property is part of the object, it will be serialized – besides, the equals() method would fail after deserialization if the property was ignored).

But since the ObjectMapper is configured with FAIL_ON_UNKNOWN_PROPERTIES I think we are good.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But since the ObjectMapper is configured with FAIL_ON_UNKNOWN_PROPERTIES I think we are good.

Hm - I see DEFAULT_MAPPER.disable(DeserializationFeature.FAIL_ON_UNKNOWN_PROPERTIES); in PolarisStorageConfigurationInfo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry that's what I meant: FAIL_ON_UNKNOWN_PROPERTIES is disabled.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also looked at the serialized JSONs from main and this branch - and the result's equal, except for AWS (see above).

public void setExternalId(String externalId) {
this.externalId = externalId;
@Value.Default
protected Set<String> getAllowedArnDomains() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a new attribute - not sure whether it's good to add it in this PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I need to revert the ConfigurationInfo changes then. I don't see any other way to keep the current behavior of rejecting China/USGov while still allowing it in tests.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea - I guess that's fine. Those ARN domains "break" in prod code anyways.

snazy
snazy previously approved these changes Sep 16, 2024
Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change LGTM now!

I'll leave it unmerged for a bit for others to get a chance to look at it before merging.

snazy
snazy previously approved these changes Sep 16, 2024
@flyrain
Copy link
Contributor

flyrain commented Sep 16, 2024

Unfortunately, that's not feasible. The procedure for adding a custom annotation requires the annotation to be placed in a separate module, because it needs to be present on the annotation processor's classpath. It's explained here:

Hi @adutra, does that mean the third-party projects have to depend on two jar files(polaris-immutables.jar and polaris-cores) if they want to reuse module polaris-core?

Is it possible to upgrade polaris-core jdk from 11 to 17, so that we can use a native solution like Record?

cc @collado-mike @dennishuo @eric-maynard

@snazy
Copy link
Member

snazy commented Sep 16, 2024

@flyrain immutables does not have a runtime dependency. Java records are different - immutables and records are not exclusive. BTW: Java records are still "plain/ordinary" Java objects - rather syntactic sugar. In terms of runtime performance and footprint both are rather equal - just that immutables offers much richer functionalities.

(What we're all waiting for are value objects that can be "properly flattened" ;) )

@flyrain
Copy link
Contributor

flyrain commented Sep 16, 2024

@snazy sorry I wasn't clear. I was asking if we can use Java record instead of the immutables. This is only possible when we upgrade jdk, as record was introduced after jdk 11. I understand the immutables provides more functionalities, but if record is good enough, that'd be a cleaner solution.

@flyrain
Copy link
Contributor

flyrain commented Sep 16, 2024

Immutables does not have a runtime dependency.

The annotation jar is not needed at runtime.

Looking a bit more. The polaris-immutables.jar is not needed at the runtime. It's great that any other downstream project doesn't have to know the module polaris-immutables exists, the complexity is only added in the Polaris project.

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall with minor comments and questions.


// meta store which is used to persist Polaris entity metadata
private final PolarisMetaStoreSession metaStore;
@Value.Parameter(order = 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the order needed here? Looks like the constructor isn't used anywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's required to trigger the generation of ImmutablePolarisCallContext.of(...) factory methods.

}

T cast(Object value) {
return this.typ.cast(value);
@Value.Lazy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: this suggestion is NOT applied

Comment on lines 28 to 30
* {@code @Value.Immutable}</a> using {@code lazyhash=true, forceJacksonPropertyNames = false,
* clearBuilder = true, depluralize = true, and JavaBeans-style getters}.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we don't need to repeat the properties in the Java doc. Maybe we could add a bit reason why we choose these properties in the doc.

@@ -82,8 +82,6 @@ public Catalog createCallContextCatalog(
taskExecutor,
fileIOFactory);

context.contextVariables().put(CallContext.REQUEST_PATH_CATALOG_INSTANCE_KEY, catalogInstance);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious how the catalogInstance is set in context if we remove this line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The call context is re-created once this method exits:

ImmutableMap.Builder<String, Object> contextVariablesBuilder = ImmutableMap.builder();
contextVariablesBuilder.putAll(callContext.contextVariables());
contextVariablesBuilder.put(CallContext.REQUEST_PATH_CATALOG_INSTANCE_KEY, baseCatalog);
this.callContext =
ImmutableCallContext.builder()
.from(callContext)
.contextVariables(contextVariablesBuilder.buildKeepingLast())
.build();

}
return new PolarisConfiguration<>(key, description, defaultValue, catalogConfig);
}
default T cast(Object value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will also need the annotation @Value.NonAttribute here, as it is an utility method instead of an attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so, this method is not a setter nor a getter. If you look at the generated class, there is no cast method nor field.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. There is no such field. In that case, we don't have to add the annotation to method default String catalogConfigOrThrow() either, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, because given its signature, it would be considered as an accessor method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, this doesn't matter since you close it, but my test shows it doesn't generate the method in Immutable class without it.

Copy link
Contributor

@flyrain flyrain left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor comment

T cast(Object value) {
return this.typ.cast(value);
@Value.NonAttribute
default String catalogConfigOrThrow() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR seems to include a lot of generic refactoring outside of just adding a dependency.

@@ -111,7 +76,7 @@ public static <T> Builder<T> builder() {
.defaultValue(false)
.build();

public static final PolarisConfiguration<Boolean> ALLOW_NAMESPACE_LOCATION_OVERLAP =
PolarisConfiguration<Boolean> ALLOW_NAMESPACE_LOCATION_OVERLAP =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, refactoring

@@ -219,7 +220,7 @@ public void setStorageIntegrationProvider(PolarisStorageIntegrationProvider stor
private void checkPolarisServiceBootstrappedForRealm(
RealmContext realmContext, PolarisMetaStoreManager metaStoreManager) {
PolarisCallContext polarisContext =
new PolarisCallContext(
PolarisCallContext.of(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This syntactic sugar seems to be the primary "advantage" we are gaining from adding this framework. Is this really better?

Comment on lines +34 to +35
@JsonSerialize(as = ImmutableFileStorageConfigurationInfo.class)
@JsonDeserialize(as = ImmutableFileStorageConfigurationInfo.class)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need to add this to classes?

// Technically, it should be ^arn:(aws|aws-cn|aws-us-gov):iam::\d{12}:role/.+$,
@JsonIgnore public static final String ROLE_ARN_PATTERN = "^arn:aws:iam::\\d{12}:role/.+$";
private static final String ROLE_ARN_PATTERN =
"^arn:(aws|aws-cn|aws-us-gov):iam::\\d{12}:role/.+$";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a material change

@eric-maynard
Copy link
Contributor

This is a pretty large refactor that also includes a couple of nontrivial changes.

The ability to define immutable components in an easy way, without struggling with the nitty-gritty details (getters, builders, equals, hashCode, toString, etc.) is a nice productivity boost

In much of the code base, can't we just use record for this?

@adutra adutra closed this Sep 17, 2024
@adutra
Copy link
Contributor Author

adutra commented Sep 17, 2024

Ok at this point I don't see a huge appetite from the community for this change. Let's just wait until we can use records everywhere. Closing.

@flyrain
Copy link
Contributor

flyrain commented Sep 17, 2024

I guess a followup task is to figure out if we can compile the polaris-core with newer jdk, like jdk17 or jdk21.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants