-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[iceberg] Introduce feature to migrate table from iceberg to paimon #4639
Conversation
Please write [WIP] in PR title. |
7e6b49f
to
898eed1
Compare
this.paimonCatalog = paimonCatalog; | ||
this.paimonFileIO = paimonCatalog.fileIO(); | ||
this.paimonDatabaseName = paimonDatabaseName; | ||
this.paimonTableNameame = paimonTableNameame; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this.paimonTableNameame = paimonTableNameame; | |
this.paimonTableName = paimonTableName; |
Schema paimonSchema = icebergSchemaToPaimonSchema(icebergMetadata); | ||
Identifier paimonIdentifier = Identifier.create(paimonDatabaseName, paimonTableNameame); | ||
|
||
paimonCatalog.createDatabase(paimonDatabaseName, false); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why false? User must migrate to a non-existing database?
|
||
private List<IcebergManifestFileMeta> checkAndFilterManifestFiles( | ||
List<IcebergManifestFileMeta> icebergManifestFileMetas) { | ||
if (!ignoreDelete) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need such an option? If there are deletion vectors in Iceberg, and user uses this option, then the resulting data will be incorrect. Incorrect data are useless to the users.
@@ -190,6 +199,70 @@ private static Object toTypeObject(DataType dataType, int fieldId, int depth) { | |||
} | |||
} | |||
|
|||
public DataType getDataType() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This class already has a dataType
member. Check if dataType
is null, if not just return that object, otherwise calculate data type from type
string.
# Conflicts: # paimon-core/src/main/java/org/apache/paimon/iceberg/metadata/IcebergDataField.java
…ain metadata path
…tead from iceberg catalog # Conflicts: # paimon-core/src/main/resources/META-INF/services/org.apache.paimon.factories.Factory
cfa7ae2
to
a6fdc51
Compare
!simpleType.contains(delimiter) | ||
? simpleType | ||
: simpleType.substring(0, simpleType.indexOf(delimiter)); | ||
switch (typePrefix) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that you have calculated dataType
with these code, store the result in dataType
so we can directly use it next time.
if (meta.content() == IcebergManifestFileMeta.Content.DELETES) { | ||
throw new RuntimeException( | ||
"IcebergMigrator don't support analyzing manifest file with 'DELETE' content."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Preconditions.checkArgument(meta.content() != IcebergManifestFileMeta.Content.DELETES)
if (meta.content() != IcebergDataFileMeta.Content.DATA) { | ||
throw new RuntimeException( | ||
"IcebergMigrator don't support analyzing iceberg delete file."); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
public void deleteOriginTable(boolean delete) throws Exception {} | ||
|
||
@Override | ||
public void renameTable(boolean ignoreIfNotExists) throws Exception {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't you implement these?
public static List<DataFileMeta> construct( | ||
List<IcebergDataFileMeta> icebergDataFileMetas, | ||
FileIO fileIO, | ||
Table paimonTable, | ||
Path newDir, | ||
Map<Path, Path> rollback) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is only related to Iceberg, while other methods in this utility class are quite versatile. Move this method to IcebergMigrator
.
import static org.assertj.core.api.AssertionsForClassTypes.assertThatThrownBy; | ||
|
||
/** Tests for {@link IcebergMigrator}. */ | ||
public class IcebergMigrateTest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is quite a complex feature. I would add a random test.
import org.slf4j.LoggerFactory; | ||
|
||
/** Get iceberg table latest snapshot metadata in hive. */ | ||
public class IcebergMigrateHiveMetadata implements IcebergMigrateMetadata { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have tests for this class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test for hive metadata is not included in this pr, I'll move it to next pr.
…ead throwing exception manually
String simpleType = type.toString(); | ||
String delimiter = "("; | ||
if (simpleType.contains("[")) { | ||
delimiter = "["; | ||
} | ||
String typePrefix = | ||
!simpleType.contains(delimiter) | ||
? simpleType | ||
: simpleType.substring(0, simpleType.indexOf(delimiter)); | ||
switch (typePrefix) { | ||
case "boolean": | ||
dataType = new BooleanType(!required); | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT: Extract this into a separated method. This can remove all the break
s.
|
||
public IcebergManifestEntry fromRow(InternalRow row, IcebergManifestFileMeta meta) { | ||
IcebergManifestEntry.Status status = IcebergManifestEntry.Status.fromId(row.getInt(0)); | ||
long snapshotId = !row.isNullAt(1) ? row.getLong(1) : meta.addedSnapshotId(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
long snapshotId = !row.isNullAt(1) ? row.getLong(1) : meta.addedSnapshotId(); | |
long snapshotId = row.isNullAt(1) ? meta.addedSnapshotId() : row.getLong(1); |
LOG.warn( | ||
"exception occurred when deleting origin table, exception message:{}", | ||
e.getMessage()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LOG.warn( | |
"exception occurred when deleting origin table, exception message:{}", | |
e.getMessage()); | |
LOG.warn( | |
"exception occurred when deleting origin table", e); |
In this way the stack trace is also logged.
} catch (IOException e) { | ||
throw new RuntimeException( | ||
"read iceberg version-hint.text failed. Iceberg metadata path: " | ||
+ icebergMetaPathFactory.metadataDirectory()); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not add e
as cause? It will be difficult to debug without stack trace.
LOG.info("Last step: rename."); | ||
LOG.info("Iceberg migrator do not rename table now."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this log. Also why Iceberg migrator ignore this method? Add comments.
} | ||
|
||
@JsonIgnore | ||
public DataType getDataTypeFromType() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public DataType getDataTypeFromType() { | |
private DataType getDataTypeFromType() { |
@@ -190,6 +199,80 @@ private static Object toTypeObject(DataType dataType, int fieldId, int depth) { | |||
} | |||
} | |||
|
|||
@JsonIgnore | |||
public DataType getDataType() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace the old dataType
method with this one.
Purpose
Linked issue: close #xxx
Paimon has supported generating Iceberg compatible metadata, so that paimon tables can be consumed directly by Iceberg readers. Now paimon try to support an action or a procedure to support migrating iceberg table to paimon table.
The general implementation idea of this feature includes the following steps:
This pr supports basic ability to migrating iceberg table to paimon, including:
Procedure or action is not included in this pr.
Tests
org.apache.paimon.iceberg.IcebergMigrateTest
API and Format
Documentation