Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sparkRuntime property to capture runtime type in application_information #1414

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

parthosa
Copy link
Collaborator

@parthosa parthosa commented Nov 7, 2024

Fixes #1413

This PR adds a new sparkRuntime property to capture the Spark Runtime type (SPARK, PHOTON, SPARK_RAPIDS) and store this in application_information.csv

Changes

Profiling Enhancements:

  • Added sparkRuntime property to AppInfoProfileResults to capture the runtime environment and updated the outputHeaders and convertToSeq methods to include this new property. [1] [2]
  • Updated AppInformationViewTrait to map the new sparkRuntime property when creating AppInfoProfileResults instances.

Runtime Handling:

  • Introduced SparkRuntime enumeration to represent different Spark runtimes (SPARK, PHOTON, SPARK_RAPIDS).
  • Added sparkRuntime property to CacheablePropsHandler and set its default value.
  • Implemented setSparkRuntime method in AppBase to determine and set the runtime based on application properties.

Testing:

  • Added test cases in ApplicationInfoSuite to validate the sparkRuntime property for different event logs.

Output

File: application_information.csv

SPARK Runtime:

appIndex,appName,appId,sparkUser,startTime,endTime,duration,durationStr,sparkRuntime,sparkVersion,pluginEnabled
1,"Databricks Shell","app-20240827220408-0000","root",1724796242014,1724799713682,3471668,"58 min","SPARK","13.3.x-aarch64-scala2.12",false

SPARK_RAPIDS Runtime:

appIndex,appName,appId,sparkUser,startTime,endTime,duration,durationStr,sparkRuntime,sparkVersion,pluginEnabled
1,"Databricks Shell","app-20240827233829-0000","root",1724801903175,1724802355703,452528,"7.5 min","SPARK_RAPIDS","13.3.x-gpu-ml-scala2.12",true

PHOTON Runtime:

appIndex,appName,appId,sparkUser,startTime,endTime,duration,durationStr,sparkRuntime,sparkVersion,pluginEnabled
1,"Databricks Shell","app-20240818062343-0000","root",1723962217320,1723962595796,378476,"6.3 min","PHOTON","13.3.x-aarch64-photon-scala2.12",false

cc: @leewyang

@parthosa parthosa added the core_tools Scope the core module (scala) label Nov 7, 2024
@parthosa parthosa self-assigned this Nov 7, 2024
@parthosa parthosa marked this pull request as ready for review November 7, 2024 05:24
val sparkRuntimeTestCases: Seq[(SparkRuntime.Value, String)] = Seq(
SparkRuntime.SPARK -> s"$qualLogDir/nds_q86_test",
SparkRuntime.SPARK_RAPIDS -> s"$logDir/nds_q66_gpu.zstd",
SparkRuntime.PHOTON-> s"$qualLogDir/nds_q88_photon_db_13_3.zstd"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: space before ->

Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @parthosa! A minor nit.

Copy link
Collaborator

@nartal1 nartal1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @parthosa !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Store spark runtime for different application type
3 participants