Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEZ-4521: Partition stats should be always uncompressed size #317

Merged
merged 1 commit into from
Nov 28, 2023

Conversation

okumin
Copy link
Contributor

@okumin okumin commented Nov 27, 2023

Currently, we need to configure the compressed size for ordered outputs and the decompressed size for unordered output. It makes sense if we can have consistent semantics.
https://issues.apache.org/jira/browse/TEZ-4521

@okumin okumin marked this pull request as ready for review November 27, 2023 15:31
@okumin okumin changed the title [WIP] TEZ-4521: Partition stats should be always uncompressed size TEZ-4521: Partition stats should be always uncompressed size Nov 27, 2023
@tez-yetus
Copy link

🎊 +1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 13m 19s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
+1 💚 test4tests 0m 0s The patch appears to include 1 new or modified test files.
_ master Compile Tests _
+1 💚 mvninstall 15m 15s master passed
+1 💚 compile 0m 22s master passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04
+1 💚 compile 0m 19s master passed with JDK Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
+1 💚 checkstyle 1m 12s master passed
+1 💚 javadoc 0m 32s master passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 19s master passed with JDK Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
+0 🆗 spotbugs 1m 7s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 6s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 13s the patch passed
+1 💚 compile 0m 13s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04
+1 💚 javac 0m 13s the patch passed
+1 💚 compile 0m 12s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
+1 💚 javac 0m 12s the patch passed
+1 💚 checkstyle 0m 11s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 12s the patch passed with JDK Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 11s the patch passed with JDK Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
+1 💚 findbugs 0m 35s the patch passed
_ Other Tests _
+1 💚 unit 4m 23s tez-runtime-library in the patch passed.
+1 💚 asflicense 0m 15s The patch does not generate ASF License warnings.
39m 25s
Subsystem Report/Notes
Docker ClientAPI=1.43 ServerAPI=1.43 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-317/1/artifact/out/Dockerfile
GITHUB PR #317
JIRA Issue TEZ-4521
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux 45f7623bdf4c 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 51d6f53
Default Java Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.20.1+1-post-Ubuntu-0ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_382-8u382-ga-1~22.04.1-b05
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-317/1/testReport/
Max. process+thread count 2100 (vs. ulimit of 5500)
modules C: tez-runtime-library U: tez-runtime-library
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-317/1/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

// Use partition sizes to compute the total size.
if (partitionSizes != null) {
totalSize = estimatedUncompressedSum(partitionSizes);
totalSize = Arrays.stream(partitionSizes).sum();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does totalSize change with this patch? if it doesn't, why? if it does, can we validate it with this unit test or in anyway that makes sense to you @okumin ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This patch doesn't change the total size. That's because the total size is stored in a different field of Protbuf from partitions stats. The design is valid since users have an option not to take partition stats at all(tez.runtime.report.partition.stats=none).
TEZ-4521 would remove the possibility where partition stats contain the compressed size. That's why I revised this file to prevent future users from being confused.

@abstractdog
Copy link
Contributor

thanks @okumin for this patch, I've put a minor comment

@okumin
Copy link
Contributor Author

okumin commented Nov 28, 2023

@abstractdog Thanks! This patch is related to #306 and I'd be glad if you could take a look at it.

@abstractdog abstractdog self-requested a review November 28, 2023 15:30
@abstractdog abstractdog merged commit 43562ad into apache:master Nov 28, 2023
4 checks passed
@abstractdog
Copy link
Contributor

merged, thanks @okumin
I'll check the rest of this fair scheduling work as soon as I can, I promise :)

@okumin okumin deleted the TEZ-4521-uncompressed branch November 28, 2023 15:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants