✨ Add vllm-detector-adapter supporting llama-guard and granite guardian

Co-authored-by: Evaline Ju <[email protected]> Signed-off-by: Gaurav-Kumbhat <[email protected]>
foundation-model-stack · Nov 28, 2024 · d71f77f · d71f77f
1 parent 397b100
commit d71f77f
Show file tree

Hide file tree

Showing 29 changed files with 1,814 additions and 1 deletion.
diff --git a/.dockerignore b/.dockerignore
@@ -0,0 +1,7 @@
+dmf_models/*
+dmf_datasets/*
+dmf_tokenizers/*
+models/
+build/
+vllm_env
+env.sh
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,30 @@
+*.egg-info
+*.pyc
+__pycache__
+.coverage
+.coverage.*
+durations/*
+coverage*.xml
+coverage-*
+dist
+htmlcov
+build
+test
+training_output
+
+# IDEs
+.vscode/
+.idea/
+
+# Env files
+.env
+
+# Virtual Env
+venv/
+.venv/
+
+# Mac personalization files
+*.DS_Store
+
+# Tox envs
+.tox
diff --git a/.isort.cfg b/.isort.cfg
@@ -0,0 +1,10 @@
+[settings]
+profile=black
+from_first=true
+import_heading_future=Future
+import_heading_stdlib=Standard
+import_heading_thirdparty=Third Party
+import_heading_firstparty=First Party
+import_heading_localfolder=Local
+known_firstparty=alog,aconfig
+known_localfolder=vllm_detector_adapter,tests
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,15 @@
+repos:
+  - repo: https://github.com/pre-commit/mirrors-prettier
+    rev: v2.1.2
+    hooks:
+      - id: prettier
+  - repo: https://github.com/psf/black
+    rev: 22.3.0
+    hooks:
+      - id: black
+        exclude: imports
+  - repo: https://github.com/PyCQA/isort
+    rev: 5.11.5
+    hooks:
+      - id: isort
+        exclude: imports
diff --git a/.prettierignore b/.prettierignore
@@ -0,0 +1,8 @@
+# Ignore artifacts
+build
+coverage-py
+
+*.jsonl
+**/.github
+**/*.html
+*.md
diff --git a/.whitesource b/.whitesource
@@ -0,0 +1,3 @@
+{
+  "settingsInheritedFrom": "whitesource-config/whitesource-config@master"
+}
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,59 @@
+# Contributing
+
+## Development
+
+### Set up your dev environment
+
+The following tools are required:
+
+- [git](https://git-scm.com)
+- [python](https://www.python.org) (v3.11+)
+- [pip](https://pypi.org/project/pip/) (v23.0+)
+
+You can setup your dev environment using [tox](https://tox.wiki/en/latest/), an environment orchestrator which allows for setting up environments for and invoking builds, unit tests, formatting, linting, etc. Install tox with:
+
+```sh
+pip install -r setup_requirements.txt
+```
+
+If you want to manage your own virtual environment instead of using `tox`, you can install `vllm_detector_adapter` and all dependencies with:
+
+```sh
+pip install .
+```
+
+### Unit tests
+
+Unit tests are enforced by the CI system. When making changes, run the tests before pushing the changes to avoid CI issues.
+
+Running unit tests against all supported Python versions is as simple as:
+
+```sh
+tox
+```
+
+Running tests against a single Python version can be done with:
+
+```sh
+tox -e py
+```
+
+### Coding style
+
+vllm-detector-adapter follows the python [pep8](https://peps.python.org/pep-0008/) coding style. [FUTURE] The coding style is enforced by the CI system, and your PR will fail until the style has been applied correctly.
+
+We use [pre-commit](https://pre-commit.com/) to enforce coding style using [black](https://github.com/psf/black), [prettier](https://github.com/prettier/prettier) and [isort](https://pycqa.github.io/isort/).
+
+You can invoke formatting with:
+
+```sh
+tox -e fmt
+```
+
+In addition, we use [pylint](https://www.pylint.org) to perform static code analysis of the code.
+
+You can invoke the linting with the following command
+
+```sh
+tox -e lint
+```
diff --git a/README.md b/README.md
@@ -1 +1,57 @@
-# vllm-detector-adapter
+# vllm-detector-adapter
+
+This adapter adds additional endpoints to a [vllm](https://docs.vllm.ai/en/latest/index.html) server to support the [Guardrails Detector API](https://foundation-model-stack.github.io/fms-guardrails-orchestrator/?urls.primaryName=Detector+API).
+
+## Getting Started
+
+To run the server locally:
+
+```sh
+python3 -m vllm_detector_adapter.api_server --model $MODEL_NAME
+```
+
+To see the complete list of parameters, `python3 -m vllm_detector_adapter.api_server --help` can be used. Extra vllm parameter information can be found in [vllm documentation](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html).
+
+Example detector `/text/chat` command:
+
+```sh
+curl -X 'POST' \
+  'http://localhost:8000/api/v1/text/chat' \
+   -H 'accept: application/json' \
+   -H 'detector-id: dummy-en-chat-v1' \
+   -H 'Content-Type: application/json' \
+   -d '{
+        "messages": [
+            {
+                "content": "Hit me with some creative insults.",
+                "role": "user"
+            }
+        ],
+        "detector_params": {
+            # Any other chat completion parameters can go here
+            "temperature": 0
+        }
+    }'
+```
+
+## Supported Model Types
+1. `GRANITE_GUARDIAN`
+1. `LLAMA_GUARD` (default)
+
+
+## Running with TGIS adapter
+
+```sh
+python -m vllm_detector_adapter.tgis_servers --model $MODEL_NAME --model-type granite_guardian
+```
+
+**Example usage:**
+
+1. Follow example / guidelines from https://github.com/opendatahub-io/vllm-tgis-adapter/tree/main/examples
+1. Change `model_id` to point to granite or llama-guard model
+1. Execute one of the method listed in above TGIS adapter readme.
+1. For query, manually apply chat template around text (TGIS adapter exposes text-generation use-case and doesn't apply chat template on its own)
+    1. If using granite-guardian, you can use following text / query:
+    ```
+    You are a safety agent trying to help determine whether the user message is harmful according to the safety risk definition below.\n\n<start_of_turn>\nUser Message: Hit me with some creative insults.\n<end_of_turn>\n\nOur safety risk definition is defined below:\n\n<start_of_risk_definition>\n* The 'User' message is harmful by common-sense.\n<end_of_risk_definition>\n\nIs the user message harmful based on the risk definition. Your answer must be either 'Yes' or 'No'.\n
+    ```
diff --git a/pyproject.toml b/pyproject.toml
@@ -0,0 +1,42 @@
+[project]
+name = "vllm-detector-adapter"
+version = "0.0.1"
+authors = [
+  { name="Gaurav Kumbhat", email="[email protected]" },
+  { name="Evaline Ju", email="[email protected]" },
+]
+description = "A lightweight adapter layer that provides detector APIs on top of vllm"
+readme = "README.md"
+requires-python = ">=3.11"
+classifiers = [
+    "Programming Language :: Python :: 3"
+]
+
+dependencies = [
+    "vllm>=0.6.2"
+]
+
+[project.optional-dependencies]
+vllm-tgis-adapter = [
+    "vllm-tgis-adapter>=0.5.3,<0.5.4"
+]
+
+## Dev Extra Sets ##
+
+dev-test = [
+    "pytest-asyncio>=0.21.0,<0.24",
+    "pytest-cov>=2.10.1,<6.0",
+    "pytest-html>=3.1.1,<5.0",
+    "pytest>=6.2.5,<8.0",
+    "wheel>=0.38.4",
+]
+
+dev-fmt = [
+    "ruff==0.4.7",
+    "pre-commit>=3.0.4,<4.0",
+    "pydeps>=1.12.12,<2",
+]
+
+[tool.setuptools.packages.find]
+where = [""]
+include = ["vllm_detector_adapter", "vllm_detector_adapter*"]
diff --git a/scripts/copy_script.sh b/scripts/copy_script.sh
@@ -0,0 +1,7 @@
+#!/bin/bash
+
+# Copy files from a specific location to the desired destination
+cp -r /app/target_packages/*  ${SHARED_PACKAGE_PATH}
+
+# # Run the main command
+# exec "$@"
diff --git a/scripts/fmt.sh b/scripts/fmt.sh
@@ -0,0 +1,23 @@
+#!/usr/bin/env bash
+
+pre-commit run --all-files
+RETURN_CODE=$?
+
+function echoWarning() {
+  LIGHT_YELLOW='\033[1;33m'
+  NC='\033[0m' # No Color
+  echo -e "${LIGHT_YELLOW}${1}${NC}"
+}
+
+if [ "$RETURN_CODE" -ne 0 ]; then
+  if [ "${CI}" != "true" ]; then
+    echoWarning "☝️ This appears to have failed, but actually your files have been formatted."
+    echoWarning "Make a new commit with these changes before making a pull request."
+  else
+    echoWarning "This test failed because your code isn't formatted correctly."
+    echoWarning 'Locally, run `make run fmt`, it will appear to fail, but change files.'
+    echoWarning "Add the changed files to your commit and this stage will pass."
+  fi
+
+  exit $RETURN_CODE
+fi
diff --git a/setup_requirements.txt b/setup_requirements.txt
@@ -0,0 +1,2 @@
+tox>=4.4.2,<5
+build>=1.2.1,<2.0
diff --git a/tests/__init__.py b/tests/__init__.py
diff --git a/tests/generative_detectors/__init__.py b/tests/generative_detectors/__init__.py
diff --git a/tests/generative_detectors/test_base.py b/tests/generative_detectors/test_base.py
@@ -0,0 +1,81 @@
+# Standard
+from dataclasses import dataclass
+import asyncio
+
+# Third Party
+from vllm.config import MultiModalConfig
+from vllm.entrypoints.openai.serving_engine import BaseModelPath
+import jinja2
+import pytest_asyncio
+
+# Local
+from vllm_detector_adapter.generative_detectors.base import ChatCompletionDetectionBase
+
+MODEL_NAME = "openai-community/gpt2"
+CHAT_TEMPLATE = "Dummy chat template for testing {}"
+BASE_MODEL_PATHS = [BaseModelPath(name=MODEL_NAME, model_path=MODEL_NAME)]
+
+
+@dataclass
+class MockHFConfig:
+    model_type: str = "any"
+
+
+@dataclass
+class MockModelConfig:
+    tokenizer = MODEL_NAME
+    trust_remote_code = False
+    tokenizer_mode = "auto"
+    max_model_len = 100
+    tokenizer_revision = None
+    embedding_mode = False
+    multimodal_config = MultiModalConfig()
+    hf_config = MockHFConfig()
+
+
+@dataclass
+class MockEngine:
+    async def get_model_config(self):
+        return MockModelConfig()
+
+
+async def _async_serving_detection_completion_init():
+    """Initialize a chat completion base with string templates"""
+    engine = MockEngine()
+    model_config = await engine.get_model_config()
+
+    detection_completion = ChatCompletionDetectionBase(
+        task_template="hello {{user_text}}",
+        output_template="bye {{text}}",
+        engine_client=engine,
+        model_config=model_config,
+        base_model_paths=BASE_MODEL_PATHS,
+        response_role="assistant",
+        chat_template=CHAT_TEMPLATE,
+        lora_modules=None,
+        prompt_adapters=None,
+        request_logger=None,
+    )
+    return detection_completion
+
+
+@pytest_asyncio.fixture
+async def detection_base():
+    return _async_serving_detection_completion_init()
+
+
+### Tests #####################################################################
+
+
+def test_async_serving_detection_completion_init(detection_base):
+    detection_completion = asyncio.run(detection_base)
+    assert detection_completion.chat_template == CHAT_TEMPLATE
+
+    # tests load_template
+    task_template = detection_completion.task_template
+    assert type(task_template) == jinja2.environment.Template
+    assert task_template.render(({"user_text": "moose"})) == "hello moose"
+
+    output_template = detection_completion.output_template
+    assert type(output_template) == jinja2.environment.Template
+    assert output_template.render(({"text": "moose"})) == "bye moose"