Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HBASE-28988 Enhance WALPlayer for restore of BulkLoad #6523

Draft
wants to merge 1 commit into
base: HBASE-28957
Choose a base branch
from

Conversation

ankitsol
Copy link

@ankitsol ankitsol commented Dec 6, 2024

Enhance WALPlayer for restore of BulkLoad WAL entries

https://issues.apache.org/jira/browse/HBASE-28988

@ankitsol
Copy link
Author

ankitsol commented Dec 6, 2024

Need to update for newly suggested backup directory structure

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 31s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 0s codespell was not available.
+0 🆗 detsecrets 0m 0s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 hbaseanti 0m 0s Patch does not have any anti-patterns.
_ HBASE-28957 Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for branch
+1 💚 mvninstall 3m 8s HBASE-28957 passed
+1 💚 compile 1m 4s HBASE-28957 passed
+1 💚 checkstyle 0m 20s HBASE-28957 passed
+1 💚 spotbugs 0m 54s HBASE-28957 passed
+1 💚 spotless 0m 45s branch has no errors when running spotless:check.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for patch
+1 💚 mvninstall 2m 59s the patch passed
+1 💚 compile 1m 4s the patch passed
-0 ⚠️ javac 0m 34s /results-compile-javac-hbase-mapreduce.txt hbase-mapreduce generated 1 new + 197 unchanged - 1 fixed = 198 total (was 198)
+1 💚 blanks 0m 0s The patch has no blanks issues.
-0 ⚠️ checkstyle 0m 10s /results-checkstyle-hbase-mapreduce.txt hbase-mapreduce: The patch generated 13 new + 1 unchanged - 0 fixed = 14 total (was 1)
-0 ⚠️ checkstyle 0m 9s /results-checkstyle-hbase-it.txt hbase-it: The patch generated 2 new + 20 unchanged - 0 fixed = 22 total (was 20)
+1 💚 spotbugs 1m 9s the patch passed
+1 💚 hadoopcheck 11m 25s Patch does not cause any errors with Hadoop 3.3.6 3.4.0.
-1 ❌ spotless 0m 38s patch has 65 errors when running spotless:check, run spotless:apply to fix.
_ Other Tests _
+1 💚 asflicense 0m 17s The patch does not generate ASF License warnings.
32m 15s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6523/1/artifact/yetus-general-check/output/Dockerfile
GITHUB PR #6523
Optional Tests dupname asflicense javac spotbugs checkstyle codespell detsecrets compile hadoopcheck hbaseanti spotless
uname Linux b3087804891f 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-28957 / ff90ac9
Default Java Eclipse Adoptium-17.0.11+9
spotless https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6523/1/artifact/yetus-general-check/output/patch-spotless.txt
Max. process+thread count 83 (vs. ulimit of 30000)
modules C: hbase-mapreduce hbase-it U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6523/1/console
versions git=2.34.1 maven=3.9.8 spotbugs=4.7.3
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@Apache-HBase
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 0m 27s Docker mode activated.
-0 ⚠️ yetus 0m 3s Unprocessed flag(s): --brief-report-file --spotbugs-strict-precheck --author-ignore-list --blanks-eol-ignore-file --blanks-tabs-ignore-file --quick-hadoopcheck
_ Prechecks _
_ HBASE-28957 Compile Tests _
+0 🆗 mvndep 0m 11s Maven dependency ordering for branch
+1 💚 mvninstall 3m 47s HBASE-28957 passed
+1 💚 compile 0m 50s HBASE-28957 passed
+1 💚 javadoc 0m 37s HBASE-28957 passed
+1 💚 shadedjars 6m 16s branch has no errors when building our shaded downstream artifacts.
_ Patch Compile Tests _
+0 🆗 mvndep 0m 12s Maven dependency ordering for patch
+1 💚 mvninstall 3m 0s the patch passed
+1 💚 compile 0m 40s the patch passed
+1 💚 javac 0m 40s the patch passed
+1 💚 javadoc 0m 26s the patch passed
+1 💚 shadedjars 5m 43s patch has no errors when building our shaded downstream artifacts.
_ Other Tests _
-1 ❌ unit 27m 48s /patch-unit-hbase-mapreduce.txt hbase-mapreduce in the patch failed.
+1 💚 unit 1m 1s hbase-it in the patch passed.
52m 19s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6523/1/artifact/yetus-jdk17-hadoop3-check/output/Dockerfile
GITHUB PR #6523
Optional Tests javac javadoc unit compile shadedjars
uname Linux 1e184da7deae 5.4.0-1103-aws #111~18.04.1-Ubuntu SMP Tue May 23 20:04:10 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/hbase-personality.sh
git revision HBASE-28957 / ff90ac9
Default Java Eclipse Adoptium-17.0.11+9
Test Results https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6523/1/testReport/
Max. process+thread count 3361 (vs. ulimit of 30000)
modules C: hbase-mapreduce hbase-it U: .
Console output https://ci-hbase.apache.org/job/HBase-PreCommit-GitHub-PR/job/PR-6523/1/console
versions git=2.34.1 maven=3.9.8
Powered by Apache Yetus 0.15.0 https://yetus.apache.org

This message was automatically generated.

@vinayakphegde
Copy link
Contributor

@ankitsol You need to run mvn spotless:apply to fix code style issues.

Copy link
Contributor

@vinayakphegde vinayakphegde left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I found the following issues:

  • All the new log lines are at the INFO level, which may not be necessary. Consider reducing them to DEBUG or TRACE in some cases.
  • Javadoc/comments have not been updated to reflect the latest changes.
  • There are code style issues that need to be fixed so we can run the unit tests and update them if necessary.
  • New unit tests need to be added.

@@ -156,7 +164,7 @@ protected static enum Counter {
* A mapper that writes out {@link Mutation} to be directly applied to a running HBase instance.
*/
protected static class WALMapper
extends Mapper<WALKey, WALEdit, ImmutableBytesWritable, Mutation> {
extends Mapper<WALKey, WALEdit, ImmutableBytesWritable, Pair<Mutation, List<String>>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of Pair here, can we use a Custom Class? So that The exclusivity between Mutation and BulkLoadFiles is enforced programmatically.


// Retrieve configuration and set up file systems for backup and staging locations
Configuration conf = context.getConfiguration();
Path backupLocation = new Path(conf.get(BULKLOAD_BACKUP_LOCATION));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for if backupLocation is not specified.


try {
for (String file : bulkloadFilesWithFullPath) {
// Full file path from S3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the hardcoded S3 here

List<String> stagingPaths = new ArrayList<>();

try {
for (String file : bulkloadFilesWithFullPath) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are not full paths, but the relative paths from namespace

@@ -172,6 +180,52 @@ public void map(WALKey key, WALEdit value, Context context) throws IOException {
ExtendedCell lastCell = null;
for (ExtendedCell cell : WALEditInternalHelper.getExtendedCells(value)) {
context.getCounter(Counter.CELLS_READ).increment(1);

if (CellUtil.matchingQualifier(cell, WALEdit.BULK_LOAD)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the processing of bulkloaded files can be simplified, and we could reduce the log level from INFO to DEBUG or TRACE in some cases.

@@ -293,6 +398,8 @@ public Job createSubmittableJob(String[] args) throws IOException {
setupTime(conf, WALInputFormat.START_TIME_KEY);
setupTime(conf, WALInputFormat.END_TIME_KEY);
String inputDirs = args[0];
String walDir = new Path(inputDirs, "WALs").toString();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is correct. We are hard-coding the directories here.
We could introduce a new optional parameter that the user can specify if they have bulkloaded files for us to process.
For example:
hbase org.apache.hadoop.hbase.mapreduce.WALPlayer /backuplogdir oldTable1,oldTable2 newTable1,newTable2 -Dwal.bulk.backup.location=/bulkload-files-dir

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants