-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HBASE-28988 Enhance WALPlayer for restore of BulkLoad #6523
base: HBASE-28957
Are you sure you want to change the base?
Conversation
Need to update for newly suggested backup directory structure |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
@ankitsol You need to run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I found the following issues:
- All the new log lines are at the INFO level, which may not be necessary. Consider reducing them to DEBUG or TRACE in some cases.
- Javadoc/comments have not been updated to reflect the latest changes.
- There are code style issues that need to be fixed so we can run the unit tests and update them if necessary.
- New unit tests need to be added.
@@ -156,7 +164,7 @@ protected static enum Counter { | |||
* A mapper that writes out {@link Mutation} to be directly applied to a running HBase instance. | |||
*/ | |||
protected static class WALMapper | |||
extends Mapper<WALKey, WALEdit, ImmutableBytesWritable, Mutation> { | |||
extends Mapper<WALKey, WALEdit, ImmutableBytesWritable, Pair<Mutation, List<String>>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of Pair here, can we use a Custom Class? So that The exclusivity between Mutation and BulkLoadFiles is enforced programmatically.
|
||
// Retrieve configuration and set up file systems for backup and staging locations | ||
Configuration conf = context.getConfiguration(); | ||
Path backupLocation = new Path(conf.get(BULKLOAD_BACKUP_LOCATION)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check for if backupLocation is not specified.
|
||
try { | ||
for (String file : bulkloadFilesWithFullPath) { | ||
// Full file path from S3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the hardcoded S3 here
List<String> stagingPaths = new ArrayList<>(); | ||
|
||
try { | ||
for (String file : bulkloadFilesWithFullPath) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are not full paths, but the relative paths from namespace
@@ -172,6 +180,52 @@ public void map(WALKey key, WALEdit value, Context context) throws IOException { | |||
ExtendedCell lastCell = null; | |||
for (ExtendedCell cell : WALEditInternalHelper.getExtendedCells(value)) { | |||
context.getCounter(Counter.CELLS_READ).increment(1); | |||
|
|||
if (CellUtil.matchingQualifier(cell, WALEdit.BULK_LOAD)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the processing of bulkloaded files can be simplified, and we could reduce the log level from INFO to DEBUG or TRACE in some cases.
@@ -293,6 +398,8 @@ public Job createSubmittableJob(String[] args) throws IOException { | |||
setupTime(conf, WALInputFormat.START_TIME_KEY); | |||
setupTime(conf, WALInputFormat.END_TIME_KEY); | |||
String inputDirs = args[0]; | |||
String walDir = new Path(inputDirs, "WALs").toString(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is correct. We are hard-coding the directories here.
We could introduce a new optional parameter that the user can specify if they have bulkloaded files for us to process.
For example:
hbase org.apache.hadoop.hbase.mapreduce.WALPlayer /backuplogdir oldTable1,oldTable2 newTable1,newTable2 -Dwal.bulk.backup.location=/bulkload-files-dir
Enhance WALPlayer for restore of BulkLoad WAL entries
https://issues.apache.org/jira/browse/HBASE-28988