Name	Name	Last commit message	Last commit date
Latest commit thirtiseven and jlowe Support profiling for specific stages on a limited number of tasks (#… Nov 20, 2024 3d26c4c · Nov 20, 2024 History 7,000 Commits
.github	.github	Use mvn -f scala2.13/ in the build scripts to build the 2.13 jars (#1…	Oct 16, 2024
aggregator	aggregator	Make delta-lake shim dependencies parametrizable [databricks] (#11697 )	Nov 8, 2024
api_validation	api_validation	Init version 24.12.0-SNAPSHOT	Sep 24, 2024
build	build	Added Shims for adding Databricks 14.3 Support [databricks] (#11635 )	Nov 13, 2024
datagen	datagen	Added Shims for adding Databricks 14.3 Support [databricks] (#11635 )	Nov 13, 2024
delta-lake	delta-lake	Added Shims for adding Databricks 14.3 Support [databricks] (#11635 )	Nov 13, 2024
dev/host_memory_leaks	dev/host_memory_leaks	Clean up leak detection code (#4888 )	Mar 4, 2022
dist	dist	Init version 24.12.0-SNAPSHOT	Sep 24, 2024
docs	docs	Merge pull request #11688 from NVIDIA/branch-24.10	Nov 4, 2024
integration_tests	integration_tests	Support multi string contains [databricks] (#11413 )	Nov 19, 2024
jdk-profiles	jdk-profiles	Init version 24.12.0-SNAPSHOT	Sep 24, 2024
jenkins	jenkins	Change Databricks 14.3 shim name to spark350db143 (#11728 )	Nov 18, 2024
python/rapids	python/rapids	Add shim layers for GpuWindowInPandas. (#1124 )	Nov 17, 2020
scala2.13	scala2.13	Added Shims for adding Databricks 14.3 Support [databricks] (#11635 )	Nov 13, 2024
scripts	scripts	Audit script - Check commits from sql-hive directory [skip ci] (#11340 )	Aug 16, 2024
shim-deps	shim-deps	Added Shims for adding Databricks 14.3 Support [databricks] (#11635 )	Nov 13, 2024
shuffle-plugin	shuffle-plugin	Init version 24.12.0-SNAPSHOT	Sep 24, 2024
sql-plugin-api	sql-plugin-api	Put DF_UDF plugin code into the main uber jar. (#11634 )	Oct 24, 2024
sql-plugin	sql-plugin	Support profiling for specific stages on a limited number of tasks (#…	Nov 20, 2024
tests	tests	Added Shims for adding Databricks 14.3 Support [databricks] (#11635 )	Nov 13, 2024
thirdparty	thirdparty	Add testing of Parquet files from apache/parquet-testing [databricks] (…	Jul 24, 2023
tools	tools	Added Shims for adding Databricks 14.3 Support [databricks] (#11635 )	Nov 13, 2024
udf-compiler	udf-compiler	Fix udf-compiler scala2.13 internal return statements (#11553 )	Oct 22, 2024
.gitignore	.gitignore	Dynamic Shim Detection for `build` Process [databricks] (#11308 )	Sep 6, 2024
.gitmodules	.gitmodules	Add testing of Parquet files from apache/parquet-testing [databricks] (…	Jul 24, 2023
.pre-commit-config.yaml	.pre-commit-config.yaml	Add a pre-commit hook to reject large files (#2699 )	Jun 11, 2021
CHANGELOG.md	CHANGELOG.md	Update latest changelog [skip ci] (#11680 )	Oct 31, 2024
CODE_OF_CONDUCT.md	CODE_OF_CONDUCT.md	Add code of conduct	May 18, 2020
CONTRIBUTING.md	CONTRIBUTING.md	Init version 24.12.0-SNAPSHOT	Sep 24, 2024
DF_UDF_README.md	DF_UDF_README.md	Put DF_UDF plugin code into the main uber jar. (#11634 )	Oct 24, 2024
LICENSE	LICENSE	Remove the MIT license from tools jar[skip ci] (#7489 )	Jan 12, 2023
NOTICE	NOTICE	initial PR for the framework reusing Vanilla Spark's unit tests (#10743 )	May 10, 2024
NOTICE-binary	NOTICE-binary	Update NOTICE-binary (#10399 )	Feb 9, 2024
README.md	README.md	Merge pull request #11625 from NVIDIA/branch-24.10	Oct 18, 2024
SECURITY.md	SECURITY.md	Add documentation about reporting security issues [skip ci] (#5290 )	Apr 21, 2022
pom.xml	pom.xml	Added Shims for adding Databricks 14.3 Support [databricks] (#11635 )	Nov 13, 2024
scalastyle-config.xml	scalastyle-config.xml	Scala 2.13 Support (#8592 )	Oct 27, 2023

Repository files navigation

RAPIDS Accelerator For Apache Spark

NOTE: For the latest stable README.md ensure you are on the main branch.

The RAPIDS Accelerator for Apache Spark provides a set of plugins for Apache Spark that leverage GPUs to accelerate processing via the RAPIDS libraries.

Documentation on the current release can be found here.

To get started and try the plugin out use the getting started guide.

Compatibility

The SQL plugin tries to produce results that are bit for bit identical with Apache Spark. Operator compatibility is documented here

Tuning

To get started tuning your job and get the most performance out of it please start with the tuning guide.

Configuration

The plugin has a set of Spark configs that control its behavior and are documented here.

Issues & Questions

We use github to track bugs, feature requests, and answer questions. File an issue for a bug or feature request. Ask or answer a question on the discussion board.

Download

The jar files for the most recent release can be retrieved from the download page.

Building From Source

See the build instructions in the contributing guide.

Testing

Tests are described here.

Integration

The RAPIDS Accelerator For Apache Spark does provide some APIs for doing zero copy data transfer into other GPU enabled applications. It is described here.

Currently, we are working with XGBoost to try to provide this integration out of the box.

You may need to disable RMM caching when exporting data to an ML library as that library will likely want to use all of the GPU's memory and if it is not aware of RMM it will not have access to any of the memory that RMM is holding.

Qualification and Profiling tools

The Qualification and Profiling tools have been moved to nvidia/spark-rapids-tools repo.

Please refer to Qualification tool documentation and Profiling tool documentation for more details on how to use the tools.

Dependency for External Projects

If you need to develop some functionality on top of RAPIDS Accelerator For Apache Spark (we currently limit support to GPU-accelerated UDFs) we recommend you declare our distribution artifact as a provided dependency.

<dependency>
    <groupId>com.nvidia</groupId>
    <artifactId>rapids-4-spark_2.12</artifactId>
    <version>24.12.0-SNAPSHOT</version>
    <scope>provided</scope>
</dependency>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAPIDS Accelerator For Apache Spark

Compatibility

Tuning

Configuration

Issues & Questions

Download

Building From Source

Testing

Integration

Qualification and Profiling tools

Dependency for External Projects

About

Releases 35

Contributors 76

Languages

License

NVIDIA/spark-rapids

Folders and files

Latest commit

History

Repository files navigation

RAPIDS Accelerator For Apache Spark

Compatibility

Tuning

Configuration

Issues & Questions

Download

Building From Source

Testing

Integration

Qualification and Profiling tools

Dependency for External Projects

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 35

Contributors 76

Languages