diff --git a/README.md b/README.md index c335298..c3ceb84 100644 --- a/README.md +++ b/README.md @@ -2,9 +2,11 @@ This is a simple fastapi based server mock that implements the OpenAI API. Available endpoints: + - /v1/chat/completion Instead of running a LLM model to generate completions, it simply returns a response generated by surrogate models. Available surrogate models are: + - "yes_no": returns random "Yes" or "No" response - "ja_nein": returns random "Ja" or "Nein" response - "lorem_ipsum": returns random "lorem ipsum" text @@ -15,6 +17,7 @@ docker pull ghcr.io/hummerichsander/openai_api_server_mock:v ... # replace ... w ``` Environment variables: + - `CONTEXT_SIZE`: context size for the model (default: 4096) - `SLEEP_TIME`: sleep time in seconds before returning the response (default: 0) - `MAX_CONCURRENT_REQUESTS`: maximum number of concurrent requests (default: 10^9)