Skip to content

Commit

Permalink
Add a TPC-DS SF 10 Notebook (#448)
Browse files Browse the repository at this point in the history
* Add a TPC-DS SF 10 Notebook for locall Jupyter

or Google Colab

Signed-off-by: Gera Shegalov <[email protected]>

* Update link to the current blob

Signed-off-by: Gera Shegalov <[email protected]>

---------

Signed-off-by: Gera Shegalov <[email protected]>
  • Loading branch information
gerashegalov authored Oct 29, 2024
1 parent 2121693 commit ca1555a
Show file tree
Hide file tree
Showing 3 changed files with 647 additions and 0 deletions.
Binary file added docs/img/guides/tpcds.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
28 changes: 28 additions & 0 deletions examples/SQL+DF-Examples/tpcds/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# TPC-DS Scale Factor 10 (GiB) - CPU Spark vs GPU Spark

[TPC-DS](https://www.tpc.org/tpcds/) is a decision support benchmark often used to evaluate
performance of OLAP Databases and Big Data systems.

The notebook in this folder runs a user-specified subset of the TPC-DS queries on the
Scale Factor 10 (GiB) dataset. It uses [TPCDS PySpark](https://github.com/cerndb/SparkTraining/blob/master/notebooks/TPCDS_PySpark_CERN_SWAN_getstarted.ipynb)
to execute TPC-DS queries with SparkSQL on GPU and CPU capturing the metrics
as a Pandas dataframe. It then plots a comparison bar chart visualizing
the GPU acceleration achieved for the queries run with RAPIDS Spark in this
very notebook.

This notebook can be opened and executed using standard

- Jupyter(Lab)
- in VSCode with Jupyter [extension](https://marketplace.visualstudio.com/items?itemName=ms-toolsai.jupyter)

It can also be opened and evaluated on hosted Notebook environments. Use the link below to launch on
Google Colab and connect it to a [GPU instance](https://research.google.com/colaboratory/faq.html).

<a target="_blank" href="https://github.com/NVIDIA/spark-rapids-examples/blob/b3a7016d1f608804fce8a2aa16e16fab8c819427/examples/SQL%2BDF-Examples/tpcds/notebooks/TPCDS-SF10.ipynb">
<img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/>
</a>

Here is the bar chart from a recent execution on Google Colab's T4 High RAM instance using
RAPIDS Spark 24.10.0 with Apache Spark 3.5.0

![tpcds-speedup](/docs/img/guides/tpcds.png)
Loading

0 comments on commit ca1555a

Please sign in to comment.