Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Add qualification support for Databricks Photon event logs #251

Open
3 of 6 tasks
mattahrens opened this issue Apr 14, 2023 · 11 comments · Fixed by #1338
Open
3 of 6 tasks

[FEA] Add qualification support for Databricks Photon event logs #251

mattahrens opened this issue Apr 14, 2023 · 11 comments · Fixed by #1338
Assignees
Labels
core_tools Scope the core module (scala) epic feature request New feature or request

Comments

@mattahrens
Copy link
Collaborator

mattahrens commented Apr 14, 2023

I would like to see estimated speedup on GPU compared against Databricks Photon. This work will include parsing Databricks Photon event logs and then generating speedup factors for Photon operators to Spark RAPIDS operators.

Tasks

  1. core_tools
    mattahrens
  2. core_tools feature request
    parthosa
  3. bug core_tools
    parthosa
  4. core_tools
    parthosa
  5. affect-output feature request user_tools
    parthosa
@amahussein
Copy link
Collaborator

@mattahrens do we still need this issue?
Currently we skip Photon jobs in the Qualification tool.

@mattahrens
Copy link
Collaborator Author

This still might be prioritized in the future so we can keep it open

@parthosa
Copy link
Collaborator

parthosa commented Oct 28, 2024

Discussed the next steps for Photon integration into QualX with @leewyang and @eordentlich.

Assumptions:

  • Users do not provide heterogenous event logs (i.e. mix of photon and spark event logs).

Solution:

  • QualX

    • Reads the spark_properties.csv and based on the type for FIRST app, load the relevant model (photon or spark).
    • There can be a check to ensure that heterogenous logs are not provided.
  • Python Tools CLI

    • Reads spark_properties.csv and based on the type for FIRST app, select the strategy to use for speed up categorization for all event logs:
      • For Spark apps, recommend apps having speed up > 1.3x
      • For Photon apps, recommend apps having speed up > 1x
    • Using this strategy approach, we can also have separate (i.e. Small/Medium/Large) categories for Photon, if needed.

Alternatives:

  • An alternative solution would be for the CLI to read spark_properties.csv and pass the Photon/Spark type as an additional column to QualX.
  • However, this could introduce complexity by adding unnecessary or potentially invalid columns (for other CSPs).
  • The proposed solution avoids this issue, keeping the design simpler and allowing QualX’s model to be updated independently of changes in the Python CLI.

cc: @amahussein @tgravescs

@mattahrens
Copy link
Collaborator Author

mattahrens commented Oct 28, 2024

Agreed that heterogenous support makes sense, but can that be done in a follow-up PR? I don't think it's needed in this first iteration.

@parthosa
Copy link
Collaborator

Sure Matt. This would make QualX simpler. Updated the approach. We can add heterogenous support if needed later

@tgravescs
Copy link
Collaborator

Users do not provide heterogenous event logs

Are we going to fail or warn if we recognize this happening?
I think a lot of companies will have mixed eventlogs.

@parthosa
Copy link
Collaborator

Eventually we would want to add support for mixed set. This approach is mainly to simplify the development process and proceed iteratively.

Both approaches have pros and cons.

Approach 1: If users provide mixed set of event logs --> Fail

Pros: Users do not get incorrect recommendation
Cons: User experience may be compromised

Approach 2: If users provide mixed set of event logs --> Warn and fallback to use Spark CPU strategy

Pros: User experience is better. There are no failures
Cons: Users will get unexpected recommendation. It can cause silent errors/warnings.

IMO, Approach 1 makes more sense. Although, the user experience is compromised, any unexpected or silent errors will be avoided.

@tgravescs
Copy link
Collaborator

What is the expected time frame to add the heterogenous, if we are going to add soon then it might not matter to much.

We could always choose whatever the first eventlog has and log it, then if we come to one that is of the opposite type, we skip running on that eventlog but make sure we mark it as skipped because of this condition so that we try to make it obvious to the user. The question is do we make it obvious enough if skipping it.

@parthosa
Copy link
Collaborator

From development perspective, adding support for heterogenous would be a small change in the Python tools side.

@leewyang Would it be feasible for QualX to support heterogenous event logs (photon + spark) easily? If yes, then we can directly add heterogenous support.

@leewyang
Copy link
Collaborator

@parthosa We'd just need something that we could parse that identifies each uniquely. As you mentioned earlier, I think we could just parse the spark_properties.csv and add an indicator to our profile/features dataframe, then we'd group/filter by that indicator before loading the associated qualx model and running prediction. The trickiest part would be reconstructing the correct order (if required) by stitching the two results (for photon and spark) back together, but I think it's doable.

@parthosa
Copy link
Collaborator

That's great then.

trickiest part would be reconstructing the correct order

Ordering should not be a problem since we do a left join between output DF from JAR and resulting DF from QualX based on App Id

@mattahrens: Since it is quite feasible from both QualX and Python Tools to add support for heterogenous support, we should directly proceed to this instead of an intermediate stage that will be eventually modified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core_tools Scope the core module (scala) epic feature request New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants