LLM tutorial materials include but not limited to NVIDIA NeMo, TensorRT-LLM, Triton Inference Server, and NeMo Guardrails.
This material is used in the NCHC LLM Bootcamp.
Running on TWCC
Please follow this TWCC README to run the tutorials on TWCC.
Install docker and nvidia container toolkit. Then add your user to the docker group and re-login/restart.
git clone https://github.com/j3soon/LLM-Tutorial.git
cd LLM-Tutorial
# (a) NeMo
docker run --rm -it --gpus=all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -v $(pwd)/workspace:/workspace --network=host nvcr.io/nvidia/nemo:24.05
# in the container
jupyter lab
# open the notebook URL in your browser
# (b) TensorRT-LLM
docker run --rm -it --gpus=all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -v $(pwd)/workspace:/workspace --network=host nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3
# in the container
jupyter lab
# open the notebook URL in your browser
Make sure to run the following before committing:
pip install nb-clean
nb-clean clean workspace/NeMo_Training_TinyLlama.ipynb
nb-clean clean workspace/TensorRT-LLM.ipynb
nb-clean clean workspace/NeMo_Guardrails.ipynb
The code was primarily written by Cliff, with assistance from others listed in the contributor list.
We would like to thank NVIDIA, OpenACC, and NCHC (National Center for High-performance Computing) for making this bootcamp happen.