-
Notifications
You must be signed in to change notification settings - Fork 424
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEZ-4540: Reading proto data more than 2GB from multiple splits fails #334
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
if (din.in != in) { | ||
cin.resetSizeCounter(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The javadoc of CodedInputStream#setSizeLimit
says the following:
If you want to read several messages from a single CodedInputStream, you could call resetSizeCounter() after each one to avoid hitting the size limit.
Based on that I would be inclined to reset the counter after every single message otherwise it still seems feasible to hit the same error if the DataInput
is sufficiently large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review @zabetak.
I missed this Java doc statement. I was suspecting that resetting the totalBytesRetired after every message read might have unexpected impact therefore, I resetted it after every hdfs split read. But based on the Javadoc, I think we can reset the counter after every mesage read. Will modify the patch.
Thanks.
4090cbc
to
e3a247c
Compare
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for pushing this forward @Aggarwal-Raghav ! My approval is not binding so you will have to ping a Tez committer to merge this.
@abstractdog @harishjp. Can you please help get this in tez 0.10.3 |
thanks @Aggarwal-Raghav for the patch, let me check soon |
CodedInputStream.totalBytesRetired can be easily checked by CodedInputStream.getTotalBytesRead(), so can you include a unit test that reads at least twice with ProtoMessageWritable and validates that cin.resetSizeCounter() was indeed called? |
Have added a basic UT for checking cin.resetSizeCounter() is called. |
🎊 +1 overall
This message was automatically generated. |
@abstractdog, can you please help with the review. |
@Aggarwal-Raghav : can you fix this minor checkstyle warning?
other than that this LGTM |
CodedInputStream cin = (CodedInputStream) c.get(writable); | ||
|
||
// Goal is to get value of: reader.writable.cin.getTotalBytesRead() | ||
int totalBytesRead = cin.getTotalBytesRead(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can return without declaring a new variable
return cin.getTotalBytesRead();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
🎊 +1 overall
This message was automatically generated. |
Refer to this: HIVE-28026 and apache/hive#5033