-
Notifications
You must be signed in to change notification settings - Fork 2
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
✨ Add vllm-detector-adapter supporting llama-guard and granite guardian
Co-authored-by: Evaline Ju <[email protected]> Signed-off-by: Gaurav-Kumbhat <[email protected]>
- Loading branch information
1 parent
397b100
commit d71f77f
Showing
29 changed files
with
1,814 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
dmf_models/* | ||
dmf_datasets/* | ||
dmf_tokenizers/* | ||
models/ | ||
build/ | ||
vllm_env | ||
env.sh |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
*.egg-info | ||
*.pyc | ||
__pycache__ | ||
.coverage | ||
.coverage.* | ||
durations/* | ||
coverage*.xml | ||
coverage-* | ||
dist | ||
htmlcov | ||
build | ||
test | ||
training_output | ||
|
||
# IDEs | ||
.vscode/ | ||
.idea/ | ||
|
||
# Env files | ||
.env | ||
|
||
# Virtual Env | ||
venv/ | ||
.venv/ | ||
|
||
# Mac personalization files | ||
*.DS_Store | ||
|
||
# Tox envs | ||
.tox |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
[settings] | ||
profile=black | ||
from_first=true | ||
import_heading_future=Future | ||
import_heading_stdlib=Standard | ||
import_heading_thirdparty=Third Party | ||
import_heading_firstparty=First Party | ||
import_heading_localfolder=Local | ||
known_firstparty=alog,aconfig | ||
known_localfolder=vllm_detector_adapter,tests |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
repos: | ||
- repo: https://github.com/pre-commit/mirrors-prettier | ||
rev: v2.1.2 | ||
hooks: | ||
- id: prettier | ||
- repo: https://github.com/psf/black | ||
rev: 22.3.0 | ||
hooks: | ||
- id: black | ||
exclude: imports | ||
- repo: https://github.com/PyCQA/isort | ||
rev: 5.11.5 | ||
hooks: | ||
- id: isort | ||
exclude: imports |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
# Ignore artifacts | ||
build | ||
coverage-py | ||
|
||
*.jsonl | ||
**/.github | ||
**/*.html | ||
*.md |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
{ | ||
"settingsInheritedFrom": "whitesource-config/whitesource-config@master" | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# Contributing | ||
|
||
## Development | ||
|
||
### Set up your dev environment | ||
|
||
The following tools are required: | ||
|
||
- [git](https://git-scm.com) | ||
- [python](https://www.python.org) (v3.11+) | ||
- [pip](https://pypi.org/project/pip/) (v23.0+) | ||
|
||
You can setup your dev environment using [tox](https://tox.wiki/en/latest/), an environment orchestrator which allows for setting up environments for and invoking builds, unit tests, formatting, linting, etc. Install tox with: | ||
|
||
```sh | ||
pip install -r setup_requirements.txt | ||
``` | ||
|
||
If you want to manage your own virtual environment instead of using `tox`, you can install `vllm_detector_adapter` and all dependencies with: | ||
|
||
```sh | ||
pip install . | ||
``` | ||
|
||
### Unit tests | ||
|
||
Unit tests are enforced by the CI system. When making changes, run the tests before pushing the changes to avoid CI issues. | ||
|
||
Running unit tests against all supported Python versions is as simple as: | ||
|
||
```sh | ||
tox | ||
``` | ||
|
||
Running tests against a single Python version can be done with: | ||
|
||
```sh | ||
tox -e py | ||
``` | ||
|
||
### Coding style | ||
|
||
vllm-detector-adapter follows the python [pep8](https://peps.python.org/pep-0008/) coding style. [FUTURE] The coding style is enforced by the CI system, and your PR will fail until the style has been applied correctly. | ||
|
||
We use [pre-commit](https://pre-commit.com/) to enforce coding style using [black](https://github.com/psf/black), [prettier](https://github.com/prettier/prettier) and [isort](https://pycqa.github.io/isort/). | ||
|
||
You can invoke formatting with: | ||
|
||
```sh | ||
tox -e fmt | ||
``` | ||
|
||
In addition, we use [pylint](https://www.pylint.org) to perform static code analysis of the code. | ||
|
||
You can invoke the linting with the following command | ||
|
||
```sh | ||
tox -e lint | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1,57 @@ | ||
# vllm-detector-adapter | ||
# vllm-detector-adapter | ||
|
||
This adapter adds additional endpoints to a [vllm](https://docs.vllm.ai/en/latest/index.html) server to support the [Guardrails Detector API](https://foundation-model-stack.github.io/fms-guardrails-orchestrator/?urls.primaryName=Detector+API). | ||
|
||
## Getting Started | ||
|
||
To run the server locally: | ||
|
||
```sh | ||
python3 -m vllm_detector_adapter.api_server --model $MODEL_NAME | ||
``` | ||
|
||
To see the complete list of parameters, `python3 -m vllm_detector_adapter.api_server --help` can be used. Extra vllm parameter information can be found in [vllm documentation](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html). | ||
|
||
Example detector `/text/chat` command: | ||
|
||
```sh | ||
curl -X 'POST' \ | ||
'http://localhost:8000/api/v1/text/chat' \ | ||
-H 'accept: application/json' \ | ||
-H 'detector-id: dummy-en-chat-v1' \ | ||
-H 'Content-Type: application/json' \ | ||
-d '{ | ||
"messages": [ | ||
{ | ||
"content": "Hit me with some creative insults.", | ||
"role": "user" | ||
} | ||
], | ||
"detector_params": { | ||
# Any other chat completion parameters can go here | ||
"temperature": 0 | ||
} | ||
}' | ||
``` | ||
|
||
## Supported Model Types | ||
1. `GRANITE_GUARDIAN` | ||
1. `LLAMA_GUARD` (default) | ||
|
||
|
||
## Running with TGIS adapter | ||
|
||
```sh | ||
python -m vllm_detector_adapter.tgis_servers --model $MODEL_NAME --model-type granite_guardian | ||
``` | ||
|
||
**Example usage:** | ||
|
||
1. Follow example / guidelines from https://github.com/opendatahub-io/vllm-tgis-adapter/tree/main/examples | ||
1. Change `model_id` to point to granite or llama-guard model | ||
1. Execute one of the method listed in above TGIS adapter readme. | ||
1. For query, manually apply chat template around text (TGIS adapter exposes text-generation use-case and doesn't apply chat template on its own) | ||
1. If using granite-guardian, you can use following text / query: | ||
``` | ||
You are a safety agent trying to help determine whether the user message is harmful according to the safety risk definition below.\n\n<start_of_turn>\nUser Message: Hit me with some creative insults.\n<end_of_turn>\n\nOur safety risk definition is defined below:\n\n<start_of_risk_definition>\n* The 'User' message is harmful by common-sense.\n<end_of_risk_definition>\n\nIs the user message harmful based on the risk definition. Your answer must be either 'Yes' or 'No'.\n | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,42 @@ | ||
[project] | ||
name = "vllm-detector-adapter" | ||
version = "0.0.1" | ||
authors = [ | ||
{ name="Gaurav Kumbhat", email="[email protected]" }, | ||
{ name="Evaline Ju", email="[email protected]" }, | ||
] | ||
description = "A lightweight adapter layer that provides detector APIs on top of vllm" | ||
readme = "README.md" | ||
requires-python = ">=3.11" | ||
classifiers = [ | ||
"Programming Language :: Python :: 3" | ||
] | ||
|
||
dependencies = [ | ||
"vllm>=0.6.2" | ||
] | ||
|
||
[project.optional-dependencies] | ||
vllm-tgis-adapter = [ | ||
"vllm-tgis-adapter>=0.5.3,<0.5.4" | ||
] | ||
|
||
## Dev Extra Sets ## | ||
|
||
dev-test = [ | ||
"pytest-asyncio>=0.21.0,<0.24", | ||
"pytest-cov>=2.10.1,<6.0", | ||
"pytest-html>=3.1.1,<5.0", | ||
"pytest>=6.2.5,<8.0", | ||
"wheel>=0.38.4", | ||
] | ||
|
||
dev-fmt = [ | ||
"ruff==0.4.7", | ||
"pre-commit>=3.0.4,<4.0", | ||
"pydeps>=1.12.12,<2", | ||
] | ||
|
||
[tool.setuptools.packages.find] | ||
where = [""] | ||
include = ["vllm_detector_adapter", "vllm_detector_adapter*"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
#!/bin/bash | ||
|
||
# Copy files from a specific location to the desired destination | ||
cp -r /app/target_packages/* ${SHARED_PACKAGE_PATH} | ||
|
||
# # Run the main command | ||
# exec "$@" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
#!/usr/bin/env bash | ||
|
||
pre-commit run --all-files | ||
RETURN_CODE=$? | ||
|
||
function echoWarning() { | ||
LIGHT_YELLOW='\033[1;33m' | ||
NC='\033[0m' # No Color | ||
echo -e "${LIGHT_YELLOW}${1}${NC}" | ||
} | ||
|
||
if [ "$RETURN_CODE" -ne 0 ]; then | ||
if [ "${CI}" != "true" ]; then | ||
echoWarning "☝️ This appears to have failed, but actually your files have been formatted." | ||
echoWarning "Make a new commit with these changes before making a pull request." | ||
else | ||
echoWarning "This test failed because your code isn't formatted correctly." | ||
echoWarning 'Locally, run `make run fmt`, it will appear to fail, but change files.' | ||
echoWarning "Add the changed files to your commit and this stage will pass." | ||
fi | ||
|
||
exit $RETURN_CODE | ||
fi |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
tox>=4.4.2,<5 | ||
build>=1.2.1,<2.0 |
Empty file.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
# Standard | ||
from dataclasses import dataclass | ||
import asyncio | ||
|
||
# Third Party | ||
from vllm.config import MultiModalConfig | ||
from vllm.entrypoints.openai.serving_engine import BaseModelPath | ||
import jinja2 | ||
import pytest_asyncio | ||
|
||
# Local | ||
from vllm_detector_adapter.generative_detectors.base import ChatCompletionDetectionBase | ||
|
||
MODEL_NAME = "openai-community/gpt2" | ||
CHAT_TEMPLATE = "Dummy chat template for testing {}" | ||
BASE_MODEL_PATHS = [BaseModelPath(name=MODEL_NAME, model_path=MODEL_NAME)] | ||
|
||
|
||
@dataclass | ||
class MockHFConfig: | ||
model_type: str = "any" | ||
|
||
|
||
@dataclass | ||
class MockModelConfig: | ||
tokenizer = MODEL_NAME | ||
trust_remote_code = False | ||
tokenizer_mode = "auto" | ||
max_model_len = 100 | ||
tokenizer_revision = None | ||
embedding_mode = False | ||
multimodal_config = MultiModalConfig() | ||
hf_config = MockHFConfig() | ||
|
||
|
||
@dataclass | ||
class MockEngine: | ||
async def get_model_config(self): | ||
return MockModelConfig() | ||
|
||
|
||
async def _async_serving_detection_completion_init(): | ||
"""Initialize a chat completion base with string templates""" | ||
engine = MockEngine() | ||
model_config = await engine.get_model_config() | ||
|
||
detection_completion = ChatCompletionDetectionBase( | ||
task_template="hello {{user_text}}", | ||
output_template="bye {{text}}", | ||
engine_client=engine, | ||
model_config=model_config, | ||
base_model_paths=BASE_MODEL_PATHS, | ||
response_role="assistant", | ||
chat_template=CHAT_TEMPLATE, | ||
lora_modules=None, | ||
prompt_adapters=None, | ||
request_logger=None, | ||
) | ||
return detection_completion | ||
|
||
|
||
@pytest_asyncio.fixture | ||
async def detection_base(): | ||
return _async_serving_detection_completion_init() | ||
|
||
|
||
### Tests ##################################################################### | ||
|
||
|
||
def test_async_serving_detection_completion_init(detection_base): | ||
detection_completion = asyncio.run(detection_base) | ||
assert detection_completion.chat_template == CHAT_TEMPLATE | ||
|
||
# tests load_template | ||
task_template = detection_completion.task_template | ||
assert type(task_template) == jinja2.environment.Template | ||
assert task_template.render(({"user_text": "moose"})) == "hello moose" | ||
|
||
output_template = detection_completion.output_template | ||
assert type(output_template) == jinja2.environment.Template | ||
assert output_template.render(({"text": "moose"})) == "bye moose" |
Oops, something went wrong.