Skip to content

Commit

Permalink
Add default-nvidia-tensorrtllm variant
Browse files Browse the repository at this point in the history
  • Loading branch information
Atinoda committed Jul 26, 2024
1 parent 4676bf2 commit f51fab4
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 0 deletions.
13 changes: 13 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,19 @@ RUN echo "Nvidia Extended (No AVX2)" > /variant.txt
ENV EXTRA_LAUNCH_ARGS=""
CMD ["python3", "/app/server.py"]

# Extended with TensorRT-LLM
FROM run_base AS default-nvidia-tensorrtllm
# Copy venv
COPY --from=app_nvidia_x $VIRTUAL_ENV $VIRTUAL_ENV
# Install TensorRT-LLM
RUN apt install -y openmpi-bin libopenmpi-dev
RUN pip3 install tensorrt_llm==0.10.0 -U --pre --extra-index-url https://pypi.nvidia.com
ENV LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
# Variant parameters
RUN echo "Nvidia Extended (TensorRT-LLM)" > /variant.txt
ENV EXTRA_LAUNCH_ARGS=""
CMD ["python3", "/app/server.py"]


# ROCM
# Base
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,7 @@ Choose the desired variant by setting the image `:tag` in `docker-compose.yml` u
|---|---|
| `*-nvidia` | CUDA 12.1 inference acceleration. |
| `*-nvidia-noavx2` | CUDA 12.1 inference acceleration with no AVX2 CPU instructions. *Typical use-case is legacy CPU with modern GPU.* |
| `*-nvidia-tenssorrtllm` | CUDA 12.1 inference acceleration with additional TensorRT-LLM library pre-installed. |
| `*-cpu` | CPU-only inference. *Has become surprisingly fast since the early days!* |
| `*-rocm` | ROCM 5.6 inference acceleration. *Experimental and unstable.* |
| `*-arc` | Intel Arc XPU and oneAPI inference acceleration. **Not compatible with Intel integrated GPU (iGPU).** *Experimental and unstable.* |
Expand Down

0 comments on commit f51fab4

Please sign in to comment.