Skip to content

Commit

Permalink
✨ Add vllm-detector-adapter supporting llama-guard and granite guardian
Browse files Browse the repository at this point in the history
Co-authored-by: Evaline Ju <[email protected]>
Signed-off-by: Gaurav-Kumbhat <[email protected]>
  • Loading branch information
gkumbhat and evaline-ju committed Nov 28, 2024
1 parent 397b100 commit d71f77f
Show file tree
Hide file tree
Showing 29 changed files with 1,814 additions and 1 deletion.
7 changes: 7 additions & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
dmf_models/*
dmf_datasets/*
dmf_tokenizers/*
models/
build/
vllm_env
env.sh
30 changes: 30 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
*.egg-info
*.pyc
__pycache__
.coverage
.coverage.*
durations/*
coverage*.xml
coverage-*
dist
htmlcov
build
test
training_output

# IDEs
.vscode/
.idea/

# Env files
.env

# Virtual Env
venv/
.venv/

# Mac personalization files
*.DS_Store

# Tox envs
.tox
10 changes: 10 additions & 0 deletions .isort.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
[settings]
profile=black
from_first=true
import_heading_future=Future
import_heading_stdlib=Standard
import_heading_thirdparty=Third Party
import_heading_firstparty=First Party
import_heading_localfolder=Local
known_firstparty=alog,aconfig
known_localfolder=vllm_detector_adapter,tests
15 changes: 15 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
repos:
- repo: https://github.com/pre-commit/mirrors-prettier
rev: v2.1.2
hooks:
- id: prettier
- repo: https://github.com/psf/black
rev: 22.3.0
hooks:
- id: black
exclude: imports
- repo: https://github.com/PyCQA/isort
rev: 5.11.5
hooks:
- id: isort
exclude: imports
8 changes: 8 additions & 0 deletions .prettierignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
# Ignore artifacts
build
coverage-py

*.jsonl
**/.github
**/*.html
*.md
3 changes: 3 additions & 0 deletions .whitesource
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{
"settingsInheritedFrom": "whitesource-config/whitesource-config@master"
}
59 changes: 59 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
# Contributing

## Development

### Set up your dev environment

The following tools are required:

- [git](https://git-scm.com)
- [python](https://www.python.org) (v3.11+)
- [pip](https://pypi.org/project/pip/) (v23.0+)

You can setup your dev environment using [tox](https://tox.wiki/en/latest/), an environment orchestrator which allows for setting up environments for and invoking builds, unit tests, formatting, linting, etc. Install tox with:

```sh
pip install -r setup_requirements.txt
```

If you want to manage your own virtual environment instead of using `tox`, you can install `vllm_detector_adapter` and all dependencies with:

```sh
pip install .
```

### Unit tests

Unit tests are enforced by the CI system. When making changes, run the tests before pushing the changes to avoid CI issues.

Running unit tests against all supported Python versions is as simple as:

```sh
tox
```

Running tests against a single Python version can be done with:

```sh
tox -e py
```

### Coding style

vllm-detector-adapter follows the python [pep8](https://peps.python.org/pep-0008/) coding style. [FUTURE] The coding style is enforced by the CI system, and your PR will fail until the style has been applied correctly.

We use [pre-commit](https://pre-commit.com/) to enforce coding style using [black](https://github.com/psf/black), [prettier](https://github.com/prettier/prettier) and [isort](https://pycqa.github.io/isort/).

You can invoke formatting with:

```sh
tox -e fmt
```

In addition, we use [pylint](https://www.pylint.org) to perform static code analysis of the code.

You can invoke the linting with the following command

```sh
tox -e lint
```
58 changes: 57 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,57 @@
# vllm-detector-adapter
# vllm-detector-adapter

This adapter adds additional endpoints to a [vllm](https://docs.vllm.ai/en/latest/index.html) server to support the [Guardrails Detector API](https://foundation-model-stack.github.io/fms-guardrails-orchestrator/?urls.primaryName=Detector+API).

## Getting Started

To run the server locally:

```sh
python3 -m vllm_detector_adapter.api_server --model $MODEL_NAME
```

To see the complete list of parameters, `python3 -m vllm_detector_adapter.api_server --help` can be used. Extra vllm parameter information can be found in [vllm documentation](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html).

Example detector `/text/chat` command:

```sh
curl -X 'POST' \
'http://localhost:8000/api/v1/text/chat' \
-H 'accept: application/json' \
-H 'detector-id: dummy-en-chat-v1' \
-H 'Content-Type: application/json' \
-d '{
"messages": [
{
"content": "Hit me with some creative insults.",
"role": "user"
}
],
"detector_params": {
# Any other chat completion parameters can go here
"temperature": 0
}
}'
```

## Supported Model Types
1. `GRANITE_GUARDIAN`
1. `LLAMA_GUARD` (default)


## Running with TGIS adapter

```sh
python -m vllm_detector_adapter.tgis_servers --model $MODEL_NAME --model-type granite_guardian
```

**Example usage:**

1. Follow example / guidelines from https://github.com/opendatahub-io/vllm-tgis-adapter/tree/main/examples
1. Change `model_id` to point to granite or llama-guard model
1. Execute one of the method listed in above TGIS adapter readme.
1. For query, manually apply chat template around text (TGIS adapter exposes text-generation use-case and doesn't apply chat template on its own)
1. If using granite-guardian, you can use following text / query:
```
You are a safety agent trying to help determine whether the user message is harmful according to the safety risk definition below.\n\n<start_of_turn>\nUser Message: Hit me with some creative insults.\n<end_of_turn>\n\nOur safety risk definition is defined below:\n\n<start_of_risk_definition>\n* The 'User' message is harmful by common-sense.\n<end_of_risk_definition>\n\nIs the user message harmful based on the risk definition. Your answer must be either 'Yes' or 'No'.\n
```
42 changes: 42 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
[project]
name = "vllm-detector-adapter"
version = "0.0.1"
authors = [
{ name="Gaurav Kumbhat", email="[email protected]" },
{ name="Evaline Ju", email="[email protected]" },
]
description = "A lightweight adapter layer that provides detector APIs on top of vllm"
readme = "README.md"
requires-python = ">=3.11"
classifiers = [
"Programming Language :: Python :: 3"
]

dependencies = [
"vllm>=0.6.2"
]

[project.optional-dependencies]
vllm-tgis-adapter = [
"vllm-tgis-adapter>=0.5.3,<0.5.4"
]

## Dev Extra Sets ##

dev-test = [
"pytest-asyncio>=0.21.0,<0.24",
"pytest-cov>=2.10.1,<6.0",
"pytest-html>=3.1.1,<5.0",
"pytest>=6.2.5,<8.0",
"wheel>=0.38.4",
]

dev-fmt = [
"ruff==0.4.7",
"pre-commit>=3.0.4,<4.0",
"pydeps>=1.12.12,<2",
]

[tool.setuptools.packages.find]
where = [""]
include = ["vllm_detector_adapter", "vllm_detector_adapter*"]
7 changes: 7 additions & 0 deletions scripts/copy_script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#!/bin/bash

# Copy files from a specific location to the desired destination
cp -r /app/target_packages/* ${SHARED_PACKAGE_PATH}

# # Run the main command
# exec "$@"
23 changes: 23 additions & 0 deletions scripts/fmt.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#!/usr/bin/env bash

pre-commit run --all-files
RETURN_CODE=$?

function echoWarning() {
LIGHT_YELLOW='\033[1;33m'
NC='\033[0m' # No Color
echo -e "${LIGHT_YELLOW}${1}${NC}"
}

if [ "$RETURN_CODE" -ne 0 ]; then
if [ "${CI}" != "true" ]; then
echoWarning "☝️ This appears to have failed, but actually your files have been formatted."
echoWarning "Make a new commit with these changes before making a pull request."
else
echoWarning "This test failed because your code isn't formatted correctly."
echoWarning 'Locally, run `make run fmt`, it will appear to fail, but change files.'
echoWarning "Add the changed files to your commit and this stage will pass."
fi

exit $RETURN_CODE
fi
2 changes: 2 additions & 0 deletions setup_requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
tox>=4.4.2,<5
build>=1.2.1,<2.0
Empty file added tests/__init__.py
Empty file.
Empty file.
81 changes: 81 additions & 0 deletions tests/generative_detectors/test_base.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Standard
from dataclasses import dataclass
import asyncio

# Third Party
from vllm.config import MultiModalConfig
from vllm.entrypoints.openai.serving_engine import BaseModelPath
import jinja2
import pytest_asyncio

# Local
from vllm_detector_adapter.generative_detectors.base import ChatCompletionDetectionBase

MODEL_NAME = "openai-community/gpt2"
CHAT_TEMPLATE = "Dummy chat template for testing {}"
BASE_MODEL_PATHS = [BaseModelPath(name=MODEL_NAME, model_path=MODEL_NAME)]


@dataclass
class MockHFConfig:
model_type: str = "any"


@dataclass
class MockModelConfig:
tokenizer = MODEL_NAME
trust_remote_code = False
tokenizer_mode = "auto"
max_model_len = 100
tokenizer_revision = None
embedding_mode = False
multimodal_config = MultiModalConfig()
hf_config = MockHFConfig()


@dataclass
class MockEngine:
async def get_model_config(self):
return MockModelConfig()


async def _async_serving_detection_completion_init():
"""Initialize a chat completion base with string templates"""
engine = MockEngine()
model_config = await engine.get_model_config()

detection_completion = ChatCompletionDetectionBase(
task_template="hello {{user_text}}",
output_template="bye {{text}}",
engine_client=engine,
model_config=model_config,
base_model_paths=BASE_MODEL_PATHS,
response_role="assistant",
chat_template=CHAT_TEMPLATE,
lora_modules=None,
prompt_adapters=None,
request_logger=None,
)
return detection_completion


@pytest_asyncio.fixture
async def detection_base():
return _async_serving_detection_completion_init()


### Tests #####################################################################


def test_async_serving_detection_completion_init(detection_base):
detection_completion = asyncio.run(detection_base)
assert detection_completion.chat_template == CHAT_TEMPLATE

# tests load_template
task_template = detection_completion.task_template
assert type(task_template) == jinja2.environment.Template
assert task_template.render(({"user_text": "moose"})) == "hello moose"

output_template = detection_completion.output_template
assert type(output_template) == jinja2.environment.Template
assert output_template.render(({"text": "moose"})) == "bye moose"
Loading

0 comments on commit d71f77f

Please sign in to comment.