Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Assign fresh IDs to view schema #10253

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

nastra
Copy link
Contributor

@nastra nastra commented Apr 30, 2024

This addresses #9596 (comment) and assigns fresh IDs to a View's schema when creating/replacing a view

@github-actions github-actions bot added the core label Apr 30, 2024
@nastra nastra force-pushed the view-assign-fresh-ids-for-schema branch 2 times, most recently from 0a21640 to 6d64913 Compare April 30, 2024 12:51
@github-actions github-actions bot added the spark label Apr 30, 2024
@nastra nastra force-pushed the view-assign-fresh-ids-for-schema branch from 6d64913 to f378c67 Compare April 30, 2024 13:35
private static final Schema OTHER_SCHEMA =
new Schema(7, required(1, "some_id", Types.IntegerType.get()));
new Schema(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changing this to show that the new column get's a re-assigned column ID

@nastra nastra requested a review from rdblue April 30, 2024 14:58
@nastra nastra requested a review from amogh-jahagirdar June 6, 2024 09:52
Copy link
Contributor

@rdblue rdblue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main issue is that the highest ID isn't correct. There should also be a test for a table with an old schema that contains a higher ID than the current schema to show it is not reassigned.

@nastra nastra force-pushed the view-assign-fresh-ids-for-schema branch 2 times, most recently from 5a4e96c to 5c05680 Compare June 21, 2024 15:01
Copy link

github-actions bot commented Nov 2, 2024

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Nov 2, 2024
@nastra nastra added not-stale and removed stale labels Nov 4, 2024
@nastra nastra force-pushed the view-assign-fresh-ids-for-schema branch from 5c05680 to 0214536 Compare December 4, 2024 06:57
Copy link
Contributor

@amogh-jahagirdar amogh-jahagirdar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (had a question that's worth double checking but I'm pretty sure there's no issue), thanks @nastra!

Comment on lines +1802 to +1803
Types.NestedField.optional(1, "new_id", Types.IntegerType.get(), "some ID"),
Types.NestedField.optional(2, "new_data", Types.StringType.get(), "some data"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the first field ID changed to 1 because highestFieldId is set to 0 and then incremented when assigning the new IDs? This should be fine and aligns with whatever we do for tables. It also does not impact existing views because we're assigning the fresh IDs from a higher field ID

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes correct, that's exactly how it is done

@nastra nastra requested a review from rdblue December 9, 2024 13:12
@nastra nastra force-pushed the view-assign-fresh-ids-for-schema branch from 0214536 to 8d71038 Compare December 9, 2024 13:30

public class ViewUtil {
private ViewUtil() {}

public static String fullViewName(String catalog, TableIdentifier ident) {
return catalog + "." + ident;
}

public static Schema assignFreshIds(ViewMetadata base, Schema newSchema) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that the current arguments to this method pass too much state. Utility methods should require a minimal amount of information so it is clear what they depend on when called. Rather than passing the entire ViewMetadata, this method should accept newSchema, the base schema, and the highest field ID.

I'm also not sure that it makes sense to have a util method that just calls another util method, TypeUtil.assignFreshIds. I think once you apply the principle of passing minimal state, you basically end up with a wrapper around the TypeUtil method that creates an atomic integer.

new AtomicInteger(highestFieldId(base.schemas()))::incrementAndGet);
}

public static int maxVersionId(ViewMetadata metadata) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method has the same issue. Instead of passing ViewMetadata, it should accept Collection<ViewVersion>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this refactoring around maxVersionId for now since assigning fresh ids moved into ViewMetadata and it didn't seem appropriate anymore to include this refactoring (I'll open a separate PR for this)

@@ -1341,6 +1343,8 @@ public View create() {
Preconditions.checkState(
null != defaultNamespace, "Cannot create view without specifying a default namespace");

this.schema = TypeUtil.assignFreshIds(schema, new AtomicInteger(0)::incrementAndGet);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is being done in the wrong place because we need to assign fresh IDs in both the REST catalog here and in BaseMetastoreViewCatalog. We don't want to create a situation where implementations need to know to assign fresh IDs.

For tables, the initial assignment is done in TableMetadata.newTableMetadata and the reassignment for replace happens in TableMetadata.buildReplacement. Views should have a similar approach, where creating a replacement ViewMetadata automatically handles the reassignment so callers don't need to know and do it manually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you're right. One of the issues was that there wasn't a straightforward way inside ViewMetadata of detecting whether ViewMetadata is created the first time or replaced. I think we can actually use the history field (or alternatively introduce a new field) to detect whether the view is created/replaced. I've updated the code so that no additional changes outside of ViewMetadata are required (and also moved the tests now into TestViewMetadata

schemaOne)
.build();

newSchema = ViewUtil.assignFreshIds(metadata, newSchema);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test demonstrates the problem that I pointed out above. It is necessary to construct a fake ViewMetadata to test the method.

@nastra nastra force-pushed the view-assign-fresh-ids-for-schema branch from 8d71038 to edffe00 Compare December 9, 2024 20:43
@nastra nastra requested a review from rdblue December 10, 2024 07:16
@nastra nastra force-pushed the view-assign-fresh-ids-for-schema branch from edffe00 to 00a3aa1 Compare December 11, 2024 09:41
@nastra nastra closed this Dec 11, 2024
@nastra nastra reopened this Dec 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants