Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEZ-4451: ThreadLevel IO Stats Support for TEZ. #331

Merged
merged 3 commits into from
Feb 6, 2024

Conversation

ayushtkn
Copy link
Member

Put IOStatistics in TaskRunner

@tez-yetus

This comment was marked as outdated.

@abstractdog
Copy link
Contributor

abstractdog commented Jan 25, 2024

thanks for this PR @ayushtkn

  1. can you please attach an example here and/or Jira of what a snippet of statistics looks like?
  2. does this stat object contain info that can be cumulated for a whole DAG? I feel useful DAG-level counters here, maybe in a follow-up ticket
  3. if this applies to tez container mode's task runner, let's consider adding the same to LLAP's task runner as well (or where it's applicable)

@tez-yetus

This comment was marked as outdated.

@ayushtkn
Copy link
Member Author

Thanx @abstractdog for the review!!!

can you please attach an example here and/or Jira of what a snippet of statistics looks like?

I have attached the snippet here, ran a small query to check if it is working, I chaged from previous to use IOStatisticsLogging,
https://issues.apache.org/jira/browse/TEZ-4451?focusedCommentId=17811682&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17811682

does this stat object contain info that can be cumulated for a whole DAG? I feel useful DAG-level counters here, maybe in a follow-up ticket

I believe yes, Should be possible, I need to propagate the results back to the DAGAppMaster from each task and put up a aggregate, propagating back might be challenging, I will create a followup and explore

if this applies to tez container mode's task runner, let's consider adding the same to LLAP's task runner as well (or where it's applicable)

You mean to say in the Hive code here? TaskRunnerCallable.java

@abstractdog
Copy link
Contributor

abstractdog commented Jan 29, 2024

I believe yes, Should be possible, I need to propagate the results back to the DAGAppMaster from each task and put up a aggregate, propagating back might be challenging, I will create a followup and explore

fortunately, it's not that hard, task-level counters are collected automatically by tez framework, if you navigate on some occurrences of this enum, you'll figure out, it would awesome as a followup task

You mean to say in the Hive code here? TaskRunnerCallable.java

yes, exactly, however, I just realized that due to the independent LLAP IO Elevator threads, these IO stats can easily become useless noise there...anyway, we can investigate in the scope of a Hive ticket there

@ayushtkn
Copy link
Member Author

Thanx, I have created TEZ-4539 for the DAG Level stuff

@@ -28,6 +30,8 @@
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import static org.apache.hadoop.fs.statistics.IOStatisticsContext.getCurrentIOStatisticsContext;
Copy link
Contributor

@abstractdog abstractdog Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: if we import some classes from org.apache.hadoop.fs.statistics, couldn't we import them in the same way? is there a specific reason for static importing this method?
I guess these would look better together, like:

import org.apache.hadoop.fs.statistics.IOStatisticsLogging;
import org.apache.hadoop.fs.statistics.IOStatisticsContext;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it, somehow my IDE did that automatically...

@tez-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Comment
+0 🆗 reexec 22m 54s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ master Compile Tests _
+1 💚 mvninstall 14m 2s master passed
+1 💚 compile 0m 28s master passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04
+1 💚 compile 0m 27s master passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08
+1 💚 checkstyle 1m 17s master passed
+1 💚 javadoc 0m 35s master passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 21s master passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08
+0 🆗 spotbugs 1m 16s Used deprecated FindBugs config; considering switching to SpotBugs.
+1 💚 findbugs 1m 13s master passed
_ Patch Compile Tests _
+1 💚 mvninstall 0m 16s the patch passed
+1 💚 compile 0m 16s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04
+1 💚 javac 0m 16s the patch passed
+1 💚 compile 0m 15s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08
+1 💚 javac 0m 15s the patch passed
+1 💚 checkstyle 0m 8s the patch passed
+1 💚 whitespace 0m 0s The patch has no whitespace issues.
+1 💚 javadoc 0m 9s the patch passed with JDK Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04
+1 💚 javadoc 0m 9s the patch passed with JDK Private Build-1.8.0_392-8u392-ga-1~22.04-b08
+1 💚 findbugs 0m 40s the patch passed
_ Other Tests _
+1 💚 unit 0m 46s tez-runtime-internals in the patch passed.
+1 💚 asflicense 0m 16s The patch does not generate ASF License warnings.
45m 11s
Subsystem Report/Notes
Docker ClientAPI=1.44 ServerAPI=1.44 base: https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-331/3/artifact/out/Dockerfile
GITHUB PR #331
JIRA Issue TEZ-4451
Optional Tests dupname asflicense javac javadoc unit spotbugs findbugs checkstyle compile
uname Linux e6ed98431096 5.15.0-88-generic #98-Ubuntu SMP Mon Oct 2 15:18:56 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality personality/tez.sh
git revision master / 5e1cdee
Default Java Private Build-1.8.0_392-8u392-ga-1~22.04-b08
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.21+9-post-Ubuntu-0ubuntu122.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_392-8u392-ga-1~22.04-b08
Test Results https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-331/3/testReport/
Max. process+thread count 103 (vs. ulimit of 5500)
modules C: tez-runtime-internals U: tez-runtime-internals
Console output https://ci-hadoop.apache.org/job/tez-multibranch/job/PR-331/3/console
versions git=2.34.1 maven=3.6.3 findbugs=3.0.1
Powered by Apache Yetus 0.12.0 https://yetus.apache.org

This message was automatically generated.

@@ -116,6 +120,11 @@ public TaskRunner2CallableResult run() throws Exception {
// For a successful task, however, this should be almost no delay since close has already happened.
maybeFixInterruptStatus();
LOG.info("Cleaning up task {}, stopRequested={}", task.getTaskAttemptID(), stopRequested.get());
String ioStats = IOStatisticsLogging.ioStatisticsToPrettyString(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. you know you can take a snapshot of it and serialize it as java serializable or json; been some discussion about making a Writable too.

Copy link
Contributor

@abstractdog abstractdog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@abstractdog abstractdog merged commit 8b40858 into apache:master Feb 6, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants