[FEA] Add qualification support for Databricks Photon event logs #251

mattahrens · 2023-04-14T11:40:59Z

I would like to see estimated speedup on GPU compared against Databricks Photon. This work will include parsing Databricks Photon event logs and then generating speedup factors for Photon operators to Spark RAPIDS operators.

Tasks

Give feedback

Generate operator mapping between Photon execs and CPU execs to be used in plan parser #449

core_tools
Add support for processing Photon event logs in Scala #1338

core_tools feature request
[BUG] Missing Metrics in Photon Event Logs Affecting QualX Predictions #1388

bug core_tools
[FEA] Store spark runtime for different application type #1413

core_tools
Add qualification support for Photon jobs in the Python Tool #1409

affect-output feature request user_tools
Train qualx for Photon platforms (DB AWS, DB Azure)
Options

amahussein · 2024-05-01T14:34:50Z

@mattahrens do we still need this issue?
Currently we skip Photon jobs in the Qualification tool.

mattahrens · 2024-05-01T14:53:39Z

This still might be prioritized in the future so we can keep it open

parthosa · 2024-10-28T21:32:33Z

Discussed the next steps for Photon integration into QualX with @leewyang and @eordentlich.

Assumptions:

Users do not provide heterogenous event logs (i.e. mix of photon and spark event logs).

Solution:

QualX
- Reads the spark_properties.csv and based on the type for FIRST app, load the relevant model (photon or spark).
- There can be a check to ensure that heterogenous logs are not provided.
Python Tools CLI
- Reads spark_properties.csv and based on the type for FIRST app, select the strategy to use for speed up categorization for all event logs:
  - For Spark apps, recommend apps having speed up > 1.3x
  - For Photon apps, recommend apps having speed up > 1x
- Using this strategy approach, we can also have separate (i.e. Small/Medium/Large) categories for Photon, if needed.

Alternatives:

An alternative solution would be for the CLI to read spark_properties.csv and pass the Photon/Spark type as an additional column to QualX.
However, this could introduce complexity by adding unnecessary or potentially invalid columns (for other CSPs).
The proposed solution avoids this issue, keeping the design simpler and allowing QualX’s model to be updated independently of changes in the Python CLI.

cc: @amahussein @tgravescs

mattahrens · 2024-10-28T21:46:42Z

Agreed that heterogenous support makes sense, but can that be done in a follow-up PR? I don't think it's needed in this first iteration.

parthosa · 2024-10-28T22:17:22Z

Sure Matt. This would make QualX simpler. Updated the approach. We can add heterogenous support if needed later

tgravescs · 2024-10-29T13:54:31Z

Users do not provide heterogenous event logs

Are we going to fail or warn if we recognize this happening?
I think a lot of companies will have mixed eventlogs.

parthosa · 2024-10-29T17:53:18Z

Eventually we would want to add support for mixed set. This approach is mainly to simplify the development process and proceed iteratively.

Both approaches have pros and cons.

Approach 1: If users provide mixed set of event logs --> Fail

Pros: Users do not get incorrect recommendation
Cons: User experience may be compromised

Approach 2: If users provide mixed set of event logs --> Warn and fallback to use Spark CPU strategy

Pros: User experience is better. There are no failures
Cons: Users will get unexpected recommendation. It can cause silent errors/warnings.

IMO, Approach 1 makes more sense. Although, the user experience is compromised, any unexpected or silent errors will be avoided.

tgravescs · 2024-10-29T19:33:52Z

What is the expected time frame to add the heterogenous, if we are going to add soon then it might not matter to much.

We could always choose whatever the first eventlog has and log it, then if we come to one that is of the opposite type, we skip running on that eventlog but make sure we mark it as skipped because of this condition so that we try to make it obvious to the user. The question is do we make it obvious enough if skipping it.

parthosa · 2024-10-29T21:31:39Z

From development perspective, adding support for heterogenous would be a small change in the Python tools side.

@leewyang Would it be feasible for QualX to support heterogenous event logs (photon + spark) easily? If yes, then we can directly add heterogenous support.

leewyang · 2024-10-29T21:45:29Z

@parthosa We'd just need something that we could parse that identifies each uniquely. As you mentioned earlier, I think we could just parse the spark_properties.csv and add an indicator to our profile/features dataframe, then we'd group/filter by that indicator before loading the associated qualx model and running prediction. The trickiest part would be reconstructing the correct order (if required) by stitching the two results (for photon and spark) back together, but I think it's doable.

parthosa · 2024-10-29T22:10:20Z

That's great then.

trickiest part would be reconstructing the correct order

Ordering should not be a problem since we do a left join between output DF from JAR and resulting DF from QualX based on App Id

@mattahrens: Since it is quite feasible from both QualX and Python Tools to add support for heterogenous support, we should directly proceed to this instead of an intermediate stage that will be eventually modified.

mattahrens added feature request New feature or request ? - Needs Triage core_tools Scope the core module (scala) and removed ? - Needs Triage labels Apr 14, 2023

mattahrens assigned amahussein Jun 28, 2023

mattahrens assigned mattahrens and unassigned amahussein and mattahrens Jul 12, 2023

nartal1 mentioned this issue Feb 23, 2024

[FEA] Qualification tool: Disqualify Databricks Photon jobs at app level #804

Closed

parthosa self-assigned this Aug 23, 2024

parthosa mentioned this issue Sep 10, 2024

Add support for processing Photon event logs in Scala #1338

Merged

parthosa linked a pull request Sep 11, 2024 that will close this issue

Add support for processing Photon event logs in Scala #1338

Merged

parthosa closed this as completed in #1338 Oct 21, 2024

parthosa reopened this Oct 21, 2024

amahussein added the epic label Oct 23, 2024

parthosa mentioned this issue Oct 29, 2024

Add qualification support for Photon jobs in the Python Tool #1403

Closed

parthosa mentioned this issue Oct 29, 2024

Add support for disk spill metrics in Photon jobs #1405

Closed

parthosa mentioned this issue Nov 2, 2024

Add qualification support for Photon jobs in the Python Tool #1409

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Add qualification support for Databricks Photon event logs #251

[FEA] Add qualification support for Databricks Photon event logs #251

mattahrens commented Apr 14, 2023 •

edited by parthosa

Loading

Tasks

amahussein commented May 1, 2024

mattahrens commented May 1, 2024

parthosa commented Oct 28, 2024 •

edited

Loading

mattahrens commented Oct 28, 2024 •

edited

Loading

parthosa commented Oct 28, 2024

tgravescs commented Oct 29, 2024

parthosa commented Oct 29, 2024

tgravescs commented Oct 29, 2024

parthosa commented Oct 29, 2024

leewyang commented Oct 29, 2024

parthosa commented Oct 29, 2024

[FEA] Add qualification support for Databricks Photon event logs #251

[FEA] Add qualification support for Databricks Photon event logs #251

Comments

mattahrens commented Apr 14, 2023 • edited by parthosa Loading

Tasks

amahussein commented May 1, 2024

mattahrens commented May 1, 2024

parthosa commented Oct 28, 2024 • edited Loading

mattahrens commented Oct 28, 2024 • edited Loading

parthosa commented Oct 28, 2024

tgravescs commented Oct 29, 2024

parthosa commented Oct 29, 2024

Approach 1: If users provide mixed set of event logs --> Fail

Approach 2: If users provide mixed set of event logs --> Warn and fallback to use Spark CPU strategy

tgravescs commented Oct 29, 2024

parthosa commented Oct 29, 2024

leewyang commented Oct 29, 2024

parthosa commented Oct 29, 2024

mattahrens commented Apr 14, 2023 •

edited by parthosa

Loading

parthosa commented Oct 28, 2024 •

edited

Loading

mattahrens commented Oct 28, 2024 •

edited

Loading