-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Add qualification support for Databricks Photon event logs #251
Comments
@mattahrens do we still need this issue? |
This still might be prioritized in the future so we can keep it open |
Discussed the next steps for Photon integration into QualX with @leewyang and @eordentlich. Assumptions:
Solution:
Alternatives:
|
Agreed that heterogenous support makes sense, but can that be done in a follow-up PR? I don't think it's needed in this first iteration. |
Sure Matt. This would make QualX simpler. Updated the approach. We can add heterogenous support if needed later |
Are we going to fail or warn if we recognize this happening? |
Eventually we would want to add support for mixed set. This approach is mainly to simplify the development process and proceed iteratively. Both approaches have pros and cons. Approach 1: If users provide mixed set of event logs --> FailPros: Users do not get incorrect recommendation Approach 2: If users provide mixed set of event logs --> Warn and fallback to use Spark CPU strategyPros: User experience is better. There are no failures IMO, Approach 1 makes more sense. Although, the user experience is compromised, any unexpected or silent errors will be avoided. |
What is the expected time frame to add the heterogenous, if we are going to add soon then it might not matter to much. We could always choose whatever the first eventlog has and log it, then if we come to one that is of the opposite type, we skip running on that eventlog but make sure we mark it as skipped because of this condition so that we try to make it obvious to the user. The question is do we make it obvious enough if skipping it. |
From development perspective, adding support for heterogenous would be a small change in the Python tools side. @leewyang Would it be feasible for QualX to support heterogenous event logs (photon + spark) easily? If yes, then we can directly add heterogenous support. |
@parthosa We'd just need something that we could parse that identifies each uniquely. As you mentioned earlier, I think we could just parse the |
That's great then.
Ordering should not be a problem since we do a left join between output DF from JAR and resulting DF from QualX based on @mattahrens: Since it is quite feasible from both QualX and Python Tools to add support for heterogenous support, we should directly proceed to this instead of an intermediate stage that will be eventually modified. |
Closing this issue as all subtasks for adding support for Photon event logs have been completed. UsageTo run the tool with Photon event logs, use the following command: spark_rapids qualification --platform databricks-aws --eventlogs <photon-event-log> Note:
|
I would like to see estimated speedup on GPU compared against Databricks Photon. This work will include parsing Databricks Photon event logs and then generating speedup factors for Photon operators to Spark RAPIDS operators.
Tasks
The text was updated successfully, but these errors were encountered: