These are example notebooks to showcase cuxfilter with cuDF. If you want to distribute your workflow across multiple GPUs, have more data than you can fit in memory on a single GPU, or want to visualize data spread across many files at once, you would want to use Dask-cuDF with cuxfilter. The examples notebooks can be found here.
Amazon SageMaker Studio Lab is a free ML development environment that provides the compute, storage (up to 15GB), and security —all at no cost (currently). This includes GPU notebook instances.
Once you have registered with your email address, simply sign in to your account, start a CPU or GPU runtime, and open your project - all in your browser.
To setup a rapids environment in studio lab(you only need to do this the first time, since studio lab has 15GB of persistent storage across sessions), open a new terminal:
conda install ipykernel
# for stable rapids version
conda install -c rapidsai -c numba -c conda-forge -c nvidia \
cuxfilter=23.02 python=3.10 cudatoolkit=11.8
# for nightly rapids version
conda install -c rapidsai-nightly -c numba -c conda-forge -c nvidia \
cuxfilter python=3.10 cudatoolkit=11.8
Above are sample install snippets for cuxfilter, see the Get RAPIDS version picker for installing the latest
cuxfilter
version.
Once installed, you should see a card in the launcher for that environment and kernel after about a minute.
Google Colab, or "Colaboratory", allows you to write and execute Python in your browser, with
- Zero configuration required
- Free access to GPUs
- Easy sharing
To launch cuxfilter notebooks on the colab environment, you need to follow the the RAPIDS installation instructions guide by clicking . Once the RAPIDS libraries are installed, you can run the cuxfilter notebooks.
Note: Unlike Studio Lab, environment storage is not persistent and each notebook needs a separate RAPIDS installation every time you start a new session.
Copy the installation notebook cells to the top of the cuxfilter notebooks and install RAPIDS before executing the cuxfilter code.
Note: Auto Accidents dataset has corrupted coordinate data from the years 2012-2014