-
Notifications
You must be signed in to change notification settings - Fork 4.8k
HIVE-29028: Iceberg: Implement auto compaction #5886
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
7c50524
to
b75da71
Compare
b75da71
to
05f666a
Compare
05f666a
to
c6cbb86
Compare
c6cbb86
to
dc06a7c
Compare
|
plenty of reactions, but no reviews :) |
@@ -2381,4 +2384,15 @@ private static List<FieldSchema> schema(List<VirtualColumn> exprs) { | |||
private static List<FieldSchema> orderBy(VirtualColumn... exprs) { | |||
return schema(Arrays.asList(exprs)); | |||
} | |||
|
|||
@Override | |||
public CompactionType determineCompactionType(org.apache.hadoop.hive.metastore.api.Table table, CompactionInfo ci) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it doesn't belong here. maybe CompactionEvaluator
?
* @throws IOException If an I/O error occurs during the compaction evaluation process. | ||
* @throws UnsupportedOperationException if the implementing storage handler does not support this operation. | ||
*/ | ||
default CompactionType determineCompactionType(Table table, CompactionInfo ci) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that is not responsibility of StorageHandler
* A class to initiate Iceberg compactions. This will run in a separate thread. | ||
* It's critical that there exactly 1 of these in a given warehouse. | ||
*/ | ||
public class IcebergInitiator extends InitiatorBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please move it under iceberg-handler, org.apache.iceberg.mr.hive.compaction.
*/ | ||
public abstract class InitiatorBase extends MetaStoreCompactorThread { | ||
|
||
protected ExecutorService compactionExecutor; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to disable certain Initiators individually? I don't think we need 2 processes, but 1 running multiple tasks:
- AcidTableOptimizer
- IcebergTableOptimizer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both should return a list of tasks the Initiator needs to schedule
What changes were proposed in this pull request?
Iceberg auto compaction. In this initial version,
IcebergInitiator
remembers the last checked snapshot id of all Iceberg tables and it compares the current snapshot id to the cached snapshot id to determine if further evaluation is needed.Why are the changes needed?
Currently, Iceberg compaction can only be triggered manually.
This PR introduces automatic Iceberg compaction which periodically checks all Iceberg tables and schedules compactions when necessary.
Does this PR introduce any user-facing change?
No
How was this patch tested?
New unit tests, precommit tests.