LLM API server mockup

This is a simple fastapi based server mock that implements the OpenAI API.

Available endpoints:

Instead of running a LLM model to generate completions, it simply returns a response generated by surrogate models. Available surrogate models are:

Run via docker:

docker pull ghcr.io/hummerichsander/llm_api_server_mock:latest
docker run -p 8000:8000 ghcr.io/hummerichsander/llm_api_server_mock:latest

Environment variables:

CONTEXT_SIZE: context size for the model (default: 4096)
SLEEP_TIME: sleep time in seconds before returning the response (default: 0)
MAX_CONCURRENT_REQUESTS: maximum number of concurrent requests (default: 10^9)

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
llm_api_server_mock		llm_api_server_mock
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
example_env		example_env
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
sandbox.ipynb		sandbox.ipynb