This is a simple fastapi based server mock that implements the OpenAI API.
Available endpoints:
- /v1/chat/completion
Instead of running a LLM model to generate completions, it simply returns a response generated by surrogate models. Available surrogate models are:
- "yes_no": returns random "Yes" or "No" response
- "ja_nein": returns random "Ja" or "Nein" response
- "lorem_ipsum": returns random "lorem ipsum" text
docker pull ghcr.io/hummerichsander/llm_api_server_mock:latest
docker run -p 8000:8000 ghcr.io/hummerichsander/llm_api_server_mock:latest
Environment variables:
CONTEXT_SIZE
: context size for the model (default: 4096)SLEEP_TIME
: sleep time in seconds before returning the response (default: 0)MAX_CONCURRENT_REQUESTS
: maximum number of concurrent requests (default: 10^9)