Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEZ-4540: Reading proto data more than 2GB from multiple splits fails #334

Merged
merged 3 commits into from
Jun 20, 2024

Conversation

Aggarwal-Raghav
Copy link
Contributor

Refer to this: HIVE-28026 and apache/hive#5033

@tez-yetus

This comment was marked as outdated.

Comment on lines 99 to 101
if (din.in != in) {
cin.resetSizeCounter();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The javadoc of CodedInputStream#setSizeLimit says the following:

If you want to read several messages from a single CodedInputStream, you could call resetSizeCounter() after each one to avoid hitting the size limit.

Based on that I would be inclined to reset the counter after every single message otherwise it still seems feasible to hit the same error if the DataInput is sufficiently large.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review @zabetak.
I missed this Java doc statement. I was suspecting that resetting the totalBytesRetired after every message read might have unexpected impact therefore, I resetted it after every hdfs split read. But based on the Javadoc, I think we can reset the counter after every mesage read. Will modify the patch.

Thanks.

@tez-yetus

This comment was marked as outdated.

Copy link
Contributor

@zabetak zabetak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for pushing this forward @Aggarwal-Raghav ! My approval is not binding so you will have to ping a Tez committer to merge this.

@Aggarwal-Raghav
Copy link
Contributor Author

@abstractdog @harishjp. Can you please help get this in tez 0.10.3

@abstractdog
Copy link
Contributor

@abstractdog @harishjp. Can you please help get this in tez 0.10.3

thanks @Aggarwal-Raghav for the patch, let me check soon
I'm really sorry but tez 0.10.3 rc1 is currently being released, so we cannot add this

@abstractdog
Copy link
Contributor

CodedInputStream.totalBytesRetired can be easily checked by CodedInputStream.getTotalBytesRead(), so can you include a unit test that reads at least twice with ProtoMessageWritable and validates that cin.resetSizeCounter() was indeed called?

@Aggarwal-Raghav
Copy link
Contributor Author

CodedInputStream.totalBytesRetired can be easily checked by CodedInputStream.getTotalBytesRead(), so can you include a unit test that reads at least twice with ProtoMessageWritable and validates that cin.resetSizeCounter() was indeed called?

Have added a basic UT for checking cin.resetSizeCounter() is called.

@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 14m 22s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+1 💚 mvninstall 15m 10s master passed
+1 💚 compile 0m 20s master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 💚 compile 0m 20s master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+1 💚 checkstyle 1m 8s master passed
+1 💚 javadoc 0m 30s master passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 💚 javadoc 0m 15s master passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+0 🆗 spotbugs 1m 4s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 2s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 11s the patch passed
+1 💚 compile 0m 12s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 💚 javac 0m 12s the patch passed
+1 💚 compile 0m 10s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+1 💚 javac 0m 10s the patch passed
-0 ⚠️ checkstyle 0m 5s tez-plugins/tez-protobuf-history-plugin: The patch generated 1 new + 7 unchanged - 0 fixed = 8 total (was 7)
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 7s the patch passed with JDK Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1
+1 💚 javadoc 0m 7s the patch passed with JDK Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
+1 💚 findbugs 0m 27s the patch passed
_ Other Tests _
+1 💚 unit 0m 27s tez-protobuf-history-plugin in the patch passed.
+1 💚 asflicense 0m 14s The patch does not generate ASF License warnings.
35m 49s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/artifact/out/Dockerfile
GITHUB PR #334
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 012dcf99c519 5.15.0-94-generic #104-Ubuntu SMP Tue Jan 9 15:25:40 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / b5b6226
Default Java Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.22+7-post-Ubuntu-0ubuntu222.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_402-8u402-ga-2ubuntu1~22.04-b06
checkstyle https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/artifact/out/diff-checkstyle-tez-plugins_tez-protobuf-history-plugin.txt
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/testReport/
Max. process+thread count 107 (vs. ulimit of 5500)
modules C: tez-plugins/tez-protobuf-history-plugin U: tez-plugins/tez-protobuf-history-plugin
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@Aggarwal-Raghav
Copy link
Contributor Author

@abstractdog, can you please help with the review.

@abstractdog
Copy link
Contributor

abstractdog commented Jun 20, 2024

@Aggarwal-Raghav : can you fix this minor checkstyle warning?
https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/3/artifact/out/diff-checkstyle-tez-plugins_tez-protobuf-history-plugin.txt

  • 1 more minor comment

other than that this LGTM

CodedInputStream cin = (CodedInputStream) c.get(writable);

// Goal is to get value of: reader.writable.cin.getTotalBytesRead()
int totalBytesRead = cin.getTotalBytesRead();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can return without declaring a new variable

return cin.getTotalBytesRead();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 0m 13s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+1 💚 mvninstall 19m 27s master passed
+1 💚 compile 0m 31s master passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1
+1 💚 compile 0m 31s master passed with JDK Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
+1 💚 checkstyle 1m 27s master passed
+1 💚 javadoc 0m 41s master passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1
+1 💚 javadoc 0m 24s master passed with JDK Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
+0 🆗 spotbugs 1m 29s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 27s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 17s the patch passed
+1 💚 compile 0m 17s the patch passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1
+1 💚 javac 0m 17s the patch passed
+1 💚 compile 0m 17s the patch passed with JDK Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
+1 💚 javac 0m 17s the patch passed
+1 💚 checkstyle 0m 8s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 9s the patch passed with JDK Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1
+1 💚 javadoc 0m 9s the patch passed with JDK Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
+1 💚 findbugs 0m 38s the patch passed
_ Other Tests _
+1 💚 unit 0m 34s tez-protobuf-history-plugin in the patch passed.
+1 💚 asflicense 0m 17s The patch does not generate ASF License warnings.
28m 31s
Subsystem Report/Notes
Docker ClientAPI=1.45 ServerAPI=1.45 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/4/artifact/out/Dockerfile
GITHUB PR #334
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux c5902e3a71a8 5.15.0-106-generic #116-Ubuntu SMP Wed Apr 17 09:17:56 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / e08d027
Default Java Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.23+9-post-Ubuntu-1ubuntu122.04.1 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_412-8u412-ga-1~22.04.1-b08
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/4/testReport/
Max. process+thread count 102 (vs. ulimit of 5500)
modules C: tez-plugins/tez-protobuf-history-plugin U: tez-plugins/tez-protobuf-history-plugin
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-334/4/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@abstractdog abstractdog self-requested a review June 20, 2024 14:25
@abstractdog abstractdog merged commit 0ac505b into apache:master Jun 20, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants