-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core: Assign fresh IDs to view schema #10253
base: main
Are you sure you want to change the base?
Conversation
0a21640
to
6d64913
Compare
6d64913
to
f378c67
Compare
private static final Schema OTHER_SCHEMA = | ||
new Schema(7, required(1, "some_id", Types.IntegerType.get())); | ||
new Schema( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing this to show that the new column get's a re-assigned column ID
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue is that the highest ID isn't correct. There should also be a test for a table with an old schema that contains a higher ID than the current schema to show it is not reassigned.
core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java
Outdated
Show resolved
Hide resolved
core/src/main/java/org/apache/iceberg/view/ViewVersionReplace.java
Outdated
Show resolved
Hide resolved
5a4e96c
to
5c05680
Compare
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
5c05680
to
0214536
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me (had a question that's worth double checking but I'm pretty sure there's no issue), thanks @nastra!
Types.NestedField.optional(1, "new_id", Types.IntegerType.get(), "some ID"), | ||
Types.NestedField.optional(2, "new_data", Types.StringType.get(), "some data")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the first field ID changed to 1 because highestFieldId
is set to 0 and then incremented when assigning the new IDs? This should be fine and aligns with whatever we do for tables. It also does not impact existing views because we're assigning the fresh IDs from a higher field ID
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes correct, that's exactly how it is done
0214536
to
8d71038
Compare
|
||
public class ViewUtil { | ||
private ViewUtil() {} | ||
|
||
public static String fullViewName(String catalog, TableIdentifier ident) { | ||
return catalog + "." + ident; | ||
} | ||
|
||
public static Schema assignFreshIds(ViewMetadata base, Schema newSchema) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the current arguments to this method pass too much state. Utility methods should require a minimal amount of information so it is clear what they depend on when called. Rather than passing the entire ViewMetadata
, this method should accept newSchema
, the base schema, and the highest field ID.
I'm also not sure that it makes sense to have a util method that just calls another util method, TypeUtil.assignFreshIds
. I think once you apply the principle of passing minimal state, you basically end up with a wrapper around the TypeUtil
method that creates an atomic integer.
new AtomicInteger(highestFieldId(base.schemas()))::incrementAndGet); | ||
} | ||
|
||
public static int maxVersionId(ViewMetadata metadata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method has the same issue. Instead of passing ViewMetadata
, it should accept Collection<ViewVersion>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this refactoring around maxVersionId
for now since assigning fresh ids moved into ViewMetadata
and it didn't seem appropriate anymore to include this refactoring (I'll open a separate PR for this)
@@ -1341,6 +1343,8 @@ public View create() { | |||
Preconditions.checkState( | |||
null != defaultNamespace, "Cannot create view without specifying a default namespace"); | |||
|
|||
this.schema = TypeUtil.assignFreshIds(schema, new AtomicInteger(0)::incrementAndGet); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this is being done in the wrong place because we need to assign fresh IDs in both the REST catalog here and in BaseMetastoreViewCatalog
. We don't want to create a situation where implementations need to know to assign fresh IDs.
For tables, the initial assignment is done in TableMetadata.newTableMetadata
and the reassignment for replace happens in TableMetadata.buildReplacement
. Views should have a similar approach, where creating a replacement ViewMetadata
automatically handles the reassignment so callers don't need to know and do it manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes you're right. One of the issues was that there wasn't a straightforward way inside ViewMetadata
of detecting whether ViewMetadata
is created the first time or replaced. I think we can actually use the history
field (or alternatively introduce a new field) to detect whether the view is created/replaced. I've updated the code so that no additional changes outside of ViewMetadata
are required (and also moved the tests now into TestViewMetadata
schemaOne) | ||
.build(); | ||
|
||
newSchema = ViewUtil.assignFreshIds(metadata, newSchema); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test demonstrates the problem that I pointed out above. It is necessary to construct a fake ViewMetadata
to test the method.
8d71038
to
edffe00
Compare
edffe00
to
00a3aa1
Compare
This addresses #9596 (comment) and assigns fresh IDs to a View's schema when creating/replacing a view