-
Notifications
You must be signed in to change notification settings - Fork 53
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* Add a TPC-DS SF 10 Notebook for locall Jupyter or Google Colab Signed-off-by: Gera Shegalov <[email protected]> * Update link to the current blob Signed-off-by: Gera Shegalov <[email protected]> --------- Signed-off-by: Gera Shegalov <[email protected]>
- Loading branch information
1 parent
2121693
commit ca1555a
Showing
3 changed files
with
647 additions
and
0 deletions.
There are no files selected for viewing
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
# TPC-DS Scale Factor 10 (GiB) - CPU Spark vs GPU Spark | ||
|
||
[TPC-DS](https://www.tpc.org/tpcds/) is a decision support benchmark often used to evaluate | ||
performance of OLAP Databases and Big Data systems. | ||
|
||
The notebook in this folder runs a user-specified subset of the TPC-DS queries on the | ||
Scale Factor 10 (GiB) dataset. It uses [TPCDS PySpark](https://github.com/cerndb/SparkTraining/blob/master/notebooks/TPCDS_PySpark_CERN_SWAN_getstarted.ipynb) | ||
to execute TPC-DS queries with SparkSQL on GPU and CPU capturing the metrics | ||
as a Pandas dataframe. It then plots a comparison bar chart visualizing | ||
the GPU acceleration achieved for the queries run with RAPIDS Spark in this | ||
very notebook. | ||
|
||
This notebook can be opened and executed using standard | ||
|
||
- Jupyter(Lab) | ||
- in VSCode with Jupyter [extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter) | ||
|
||
It can also be opened and evaluated on hosted Notebook environments. Use the link below to launch on | ||
Google Colab and connect it to a [GPU instance](https://research.google.com/colaboratory/faq.html). | ||
|
||
<a target="_blank" href="https://github.com/NVIDIA/spark-rapids-examples/blob/b3a7016d1f608804fce8a2aa16e16fab8c819427/examples/SQL%2BDF-Examples/tpcds/notebooks/TPCDS-SF10.ipynb"> | ||
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/> | ||
</a> | ||
|
||
Here is the bar chart from a recent execution on Google Colab's T4 High RAM instance using | ||
RAPIDS Spark 24.10.0 with Apache Spark 3.5.0 | ||
|
||
![tpcds-speedup](/docs/img/guides/tpcds.png) |
Oops, something went wrong.