-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a TPC-DS SF 10 Notebook #448
Conversation
@gerashegalov Could you help add:
|
@viadea the notebook in this PR runs the same workload on CPU and then on GPU. Then visualizes the metrics of those runs as a chart: If the notebooks are separate for CPU and GPU we have to have exchange metrics via a file instead of dataframes. So not sure what is the ask in
|
Hi @gerashegalov, thanks for the contribution. Could you help add more background? Normally, the example repo showcases the plugin's capability and performance. For example, the example/MIG-Support showcases the support for GPU scalability, the ML+DL-Example showcases the plugin integration with machine learning and deep learning, and the SQL+DF-Examples showcases the acceleration for data processing, especially for the strong performance which we could observe from microbenchmark. The Intersection demo is imitated from TPC-DS Query14a, it seems we already have a similar case which has about 5x speedup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @nvliyuan
Value proposition 1:
This notebook is an example of a selfcontained notebook that requires none of the extra installation steps on most IPython kernels unlike most of the notebooks in this repository. This allows it to be open and run directly in Google Colab, Jupyter, VSCode directly without any modifications
Value proposition 2:
It runs TPC-DS queries both on CPU and GPU and plots metrics of these very runs (via pandas dataframe) in this notebook as charts, side by side.
It also shows how to see both the initial CPU Plan and the final GPU Plan.
@gerashegalov could you add a chart for comparing CPU vs GPU perf in a README for this example? we normally want to keep updating the perf chart each release for each example. |
@viadea thanks for review. Added a README with a chart https://github.com/NVIDIA/spark-rapids-examples/blob/e89e50a9c333b4b37196702f6d54eeb8c9ce4cdc/examples/SQL%2BDF-Examples/tpcds/README.md |
@nvliyuan is it ok to merge this PR? |
Hi @gerashegalov , I tested and it works well, just one nit, could you help update the spark version from 3.5.1 to 3.5.0(keep sync as the notebook)? thx |
@nvliyuan Thanks for the review. Fixed the spark version in the README |
can you help target to branch 24.12? @gerashegalov |
or Google Colab Signed-off-by: Gera Shegalov <[email protected]>
Signed-off-by: Gera Shegalov <[email protected]>
@nvliyuan done! I missed that because this repo does not follow the same convention as the others that switch the default branch every release. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Add a TPC-DS SF 10 Notebook for locall Jupyter or Google Colab Signed-off-by: Gera Shegalov <[email protected]> * Update link to the current blob Signed-off-by: Gera Shegalov <[email protected]> --------- Signed-off-by: Gera Shegalov <[email protected]>
This Notebook demonstrates GPU acceleration of TPC-DS queries, it is portable across: