vLLM adapter for a TGIS-compatible grpc server.
vllm-tgis-adapter is available on PyPi
pip install vllm-tgis-adapter
python -m vllm_tgis_adapter
Installing the adapter also install a grpc healthcheck cli that can be used to monitor the status of the grpc server:
$ grpc_healtheck
health check...status: SERVING
See usage with
grpc_healthcheck --help
python -m build
pip install dist/*whl
python -m vllm_tgis_adapter
This will start serving a grpc server on port 8033. This can be queried with grpcurl:
bash examples/inference.sh
Image available at quay.io/opendatahub/vllm, built from opendatahub-io/vllm's Dockerfile.ubi
docker pull quay.io/opendatahub/vllm
See examples
Set up pre-commit
for linting/style/misc fixes:
pip install pre-commit
pre-commit install
# to run on all files
pre-commit run --all-files
This project uses nox
to manage test automation:
pip install nox
nox --list # list available sessions
nox -s tests-3.10 # run tests session for a specific python version
nox -s build-3.11 # build the wheel package
nox -s lint-3.11 -- --mypy # run linting with type checks
The standard vllm built requires an Nvidia GPU. When this is not available, it is possible to compile vllm
from source with CPU support:
env \
VLLM_CPU_DISABLE_AVX512=true VLLM_TARGET_DEVICE=cpu \
PIP_EXTRA_INDEX_URL=https://download.pytorch.org/whl/cpu \
pip install git+https://github.com/vllm-project/vllm
making it possible to run the tests on most hardware. Please note that the pip
extra index url is required in order to install the torch CPU version.