docs(models-http-api): add vllm and deepseek completion kind (#3354)

* docs(models-http-api): add vllm and deepseek completion kind * docs: add prompt_template intro * docs: fix prompt template format
TabbyML · Nov 1, 2024 · f8f4e32 · f8f4e32
1 parent 9304869
commit f8f4e32
Show file tree

Hide file tree

Showing 3 changed files with 71 additions and 10 deletions.
diff --git a/website/docs/references/models-http-api/deepseek.md b/website/docs/references/models-http-api/deepseek.md
@@ -0,0 +1,24 @@
+# DeepSeek
+
+[DeepSeek](https://www.deepseek.com/) is a platform that offers a suite of AI models. Tabby supports DeepSeek's models for both code completion and chat.
+
+DeepSeek provides some OpenAI-compatible APIs, allowing us to use the OpenAI chat kinds directly.
+However, for completion, there are some differences in the implementation, so we should use the `deepseek/completion` kind.
+
+Below is an example
+
+```toml title="~/.tabby/config.toml"
+# Chat model
+[model.chat.http]
+kind = "openai/chat"
+model_name = "your_model"
+api_endpoint = "https://api.deepseek.com/chat"
+api_key = "secret-api-key"
+
+# Completion model
+[model.completion.http]
+kind = "deepseek/completion"
+model_name = "your_model"
+api_endpoint = "https://api.deepseek.com/beta"
+api_key = "secret-api-key"
+```
diff --git a/website/docs/references/models-http-api/openai.md b/website/docs/references/models-http-api/openai.md
@@ -1,19 +1,17 @@
 # OpenAI
 
-OpenAI is a leading AI company that has developed a range of language models. Tabby supports OpenAI's models for chat and embedding tasks.
+OpenAI is a leading AI company that has developed an extensive range of language models.
+Tabby supports OpenAI's API specifications for chat, completion, and embedding tasks.
 
-Tabby also supports its legacy `/v1/completions` API for code completion, although **OpenAI itself no longer supports it**; it is still the API offered by some other vendors, such as (vLLM, Nvidia NIM, LocalAI, ...).
+The OpenAI API is widely used and is also provided by other vendors,
+such as vLLM, Nvidia NIM, and LocalAI.
 
-Below is an example configuration:
+OpenAI has designated its `/v1/completions` API for code completion as legacy,
+and **OpenAI itself no longer supports it**.
 
-```toml title="~/.tabby/config.toml"
-# Completion model
-[model.completion.http]
-kind = "openai/completion"
-model_name = "your_model"
-api_endpoint = "https://url_to_your_backend_or_service"
-api_key = "secret-api-key"
+Tabby continues to support the OpenAI Completion API specifications due to its widespread usage.
 
+```toml title="~/.tabby/config.toml"
 # Chat model
 [model.chat.http]
 kind = "openai/chat"
@@ -27,4 +25,11 @@ kind = "openai/embedding"
 model_name = "text-embedding-3-small"
 api_endpoint = "https://api.openai.com/v1"
 api_key = "secret-api-key"
+
+# Completion model
+[model.completion.http]
+kind = "openai/completion"
+model_name = "your_model"
+api_endpoint = "https://url_to_your_backend_or_service"
+api_key = "secret-api-key"
 ```
diff --git a/website/docs/references/models-http-api/vllm.md b/website/docs/references/models-http-api/vllm.md
@@ -0,0 +1,32 @@
+# vLLM
+
+[vLLM](https://docs.vllm.ai/en/stable/) is a fast and user-friendly library for LLM inference and serving.
+
+vLLM offers an `OpenAI Compatible Server`, enabling us to use the OpenAI kinds for chat and embedding.
+However, for completion, there are certain differences in the implementation. Therefore, we should use the `vllm/completion` kind and provide a `prompt_template` depending on the specific models.
+
+Below is an example
+
+```toml title="~/.tabby/config.toml"
+# Chat model
+[model.chat.http]
+kind = "openai/chat"
+model_name = "your_model"
+api_endpoint = "https://url_to_your_backend_or_service"
+api_key = "secret-api-key"
+
+# Embedding model
+[model.embedding.http]
+kind = "openai/embedding"
+model_name = "your_model"
+api_endpoint = "https://url_to_your_backend_or_service"
+api_key = "secret-api-key"
+
+# Completion model
+[model.completion.http]
+kind = "vllm/completion"
+model_name = "your_model"
+api_endpoint = "https://url_to_your_backend_or_service"
+api_key = "secret-api-key"
+prompt_template = "<PRE> {prefix} <SUF>{suffix} <MID>"  # Example prompt template for the CodeLlama model series.
+```