Skip to content

Commit

Permalink
Update
Browse files Browse the repository at this point in the history
  • Loading branch information
infwinston committed Oct 20, 2023
1 parent ecceff5 commit 589526a
Show file tree
Hide file tree
Showing 17 changed files with 61 additions and 23 deletions.
2 changes: 1 addition & 1 deletion docker/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ services:
- driver: nvidia
count: 1
capabilities: [gpu]
entrypoint: ["python3.9", "-m", "fastchat.serve.model_worker", "--model-names", "${FASTCHAT_WORKER_MODEL_NAMES:-vicuna-7b-v1.3}", "--model-path", "${FASTCHAT_WORKER_MODEL_PATH:-lmsys/vicuna-7b-v1.3}", "--worker-address", "http://fastchat-model-worker:21002", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "21002"]
entrypoint: ["python3.9", "-m", "fastchat.serve.model_worker", "--model-names", "${FASTCHAT_WORKER_MODEL_NAMES:-vicuna-7b-v1.5}", "--model-path", "${FASTCHAT_WORKER_MODEL_PATH:-lmsys/vicuna-7b-v1.5}", "--worker-address", "http://fastchat-model-worker:21002", "--controller-address", "http://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "21002"]
fastchat-api-server:
build:
context: .
Expand Down
2 changes: 1 addition & 1 deletion docs/langchain_integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ Here, we use Vicuna as an example and use it for three endpoints: chat completio
See a full list of supported models [here](../README.md#supported-models).

```bash
python3 -m fastchat.serve.model_worker --model-names "gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002" --model-path lmsys/vicuna-7b-v1.3
python3 -m fastchat.serve.model_worker --model-names "gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002" --model-path lmsys/vicuna-7b-v1.5
```

Finally, launch the RESTful API server
Expand Down
4 changes: 2 additions & 2 deletions docs/model_support.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
- [meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf)
- example: `python3 -m fastchat.serve.cli --model-path meta-llama/Llama-2-7b-chat-hf`
- Vicuna, Alpaca, LLaMA, Koala
- example: `python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.3`
- example: `python3 -m fastchat.serve.cli --model-path lmsys/vicuna-7b-v1.5`
- [BAAI/AquilaChat-7B](https://huggingface.co/BAAI/AquilaChat-7B)
- [BAAI/bge-large-en](https://huggingface.co/BAAI/bge-large-en#using-huggingface-transformers)
- [baichuan-inc/baichuan-7B](https://huggingface.co/baichuan-inc/baichuan-7B)
Expand Down Expand Up @@ -67,7 +67,7 @@ python3 -m fastchat.serve.cli --model [YOUR_MODEL_PATH]
You can run this example command to learn the code logic.

```
python3 -m fastchat.serve.cli --model lmsys/vicuna-7b-v1.3
python3 -m fastchat.serve.cli --model lmsys/vicuna-7b-v1.5
```

You can add `--debug` to see the actual prompt sent to the model.
Expand Down
14 changes: 7 additions & 7 deletions docs/openai_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ python3 -m fastchat.serve.controller
Then, launch the model worker(s)

```bash
python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-7b-v1.3
python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-7b-v1.5
```

Finally, launch the RESTful API server
Expand All @@ -45,7 +45,7 @@ import openai
openai.api_key = "EMPTY"
openai.api_base = "http://localhost:8000/v1"

model = "vicuna-7b-v1.3"
model = "vicuna-7b-v1.5"
prompt = "Once upon a time"

# create a completion
Expand Down Expand Up @@ -77,7 +77,7 @@ Chat Completions:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "vicuna-7b-v1.3",
"model": "vicuna-7b-v1.5",
"messages": [{"role": "user", "content": "Hello! What is your name?"}]
}'
```
Expand All @@ -87,7 +87,7 @@ Text Completions:
curl http://localhost:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "vicuna-7b-v1.3",
"model": "vicuna-7b-v1.5",
"prompt": "Once upon a time",
"max_tokens": 41,
"temperature": 0.5
Expand All @@ -99,7 +99,7 @@ Embeddings:
curl http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{
"model": "vicuna-7b-v1.3",
"model": "vicuna-7b-v1.5",
"input": "Hello world!"
}'
```
Expand All @@ -111,8 +111,8 @@ you can replace the `model_worker` step above with a multi model variant:

```bash
python3 -m fastchat.serve.multi_model_worker \
--model-path lmsys/vicuna-7b-v1.3 \
--model-names vicuna-7b-v1.3 \
--model-path lmsys/vicuna-7b-v1.5 \
--model-names vicuna-7b-v1.5 \
--model-path lmsys/longchat-7b-16k \
--model-names longchat-7b-16k
```
Expand Down
4 changes: 2 additions & 2 deletions docs/vllm_integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,12 @@ See the supported models [here](https://vllm.readthedocs.io/en/latest/models/sup
2. When you launch a model worker, replace the normal worker (`fastchat.serve.model_worker`) with the vLLM worker (`fastchat.serve.vllm_worker`). All other commands such as controller, gradio web server, and OpenAI API server are kept the same.
```
python3 -m fastchat.serve.vllm_worker --model-path lmsys/vicuna-7b-v1.3
python3 -m fastchat.serve.vllm_worker --model-path lmsys/vicuna-7b-v1.5
```
If you see tokenizer errors, try
```
python3 -m fastchat.serve.vllm_worker --model-path lmsys/vicuna-7b-v1.3 --tokenizer hf-internal-testing/llama-tokenizer
python3 -m fastchat.serve.vllm_worker --model-path lmsys/vicuna-7b-v1.5 --tokenizer hf-internal-testing/llama-tokenizer
```
If you use an AWQ quantized model, try
Expand Down
2 changes: 1 addition & 1 deletion fastchat/llm_judge/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,7 @@ Arguments:

e.g.,
```
python gen_model_answer.py --model-path lmsys/vicuna-7b-v1.3 --model-id vicuna-7b-v1.3
python gen_model_answer.py --model-path lmsys/vicuna-7b-v1.5 --model-id vicuna-7b-v1.5
```
The answers will be saved to `data/mt_bench/model_answer/[MODEL-ID].jsonl`.

Expand Down
4 changes: 2 additions & 2 deletions fastchat/model/model_adapter.py
Original file line number Diff line number Diff line change
Expand Up @@ -384,7 +384,7 @@ def add_model_args(parser):
parser.add_argument(
"--model-path",
type=str,
default="lmsys/vicuna-7b-v1.3",
default="lmsys/vicuna-7b-v1.5",
help="The path to the weights. This can be a local folder or a Hugging Face repo ID.",
)
parser.add_argument(
Expand Down Expand Up @@ -572,7 +572,7 @@ def get_default_conv_template(self, model_path: str) -> Conversation:


class VicunaAdapter(BaseModelAdapter):
"Model adapater for Vicuna models (e.g., lmsys/vicuna-7b-v1.3)" ""
"Model adapater for Vicuna models (e.g., lmsys/vicuna-7b-v1.5)" ""

use_fast_tokenizer = False

Expand Down
2 changes: 1 addition & 1 deletion fastchat/serve/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Chat with a model with command line interface.
Usage:
python3 -m fastchat.serve.cli --model lmsys/vicuna-7b-v1.3
python3 -m fastchat.serve.cli --model lmsys/vicuna-7b-v1.5
python3 -m fastchat.serve.cli --model lmsys/fastchat-t5-3b-v1.0
Other commands:
Expand Down
2 changes: 1 addition & 1 deletion fastchat/serve/huggingface_api.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Use FastChat with Hugging Face generation APIs.
Usage:
python3 -m fastchat.serve.huggingface_api --model lmsys/vicuna-7b-v1.3
python3 -m fastchat.serve.huggingface_api --model lmsys/vicuna-7b-v1.5
python3 -m fastchat.serve.huggingface_api --model lmsys/fastchat-t5-3b-v1.0
"""
import argparse
Expand Down
2 changes: 1 addition & 1 deletion fastchat/serve/launch_all_serve.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@
parser.add_argument(
"--model-path",
type=str,
default="lmsys/vicuna-7b-v1.3",
default="lmsys/vicuna-7b-v1.5",
help="The path to the weights. This can be a local folder or a Hugging Face repo ID.",
)
parser.add_argument(
Expand Down
2 changes: 1 addition & 1 deletion fastchat/serve/vllm_worker.py
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ async def api_model_details(request: Request):
parser.add_argument(
"--controller-address", type=str, default="http://localhost:21001"
)
parser.add_argument("--model-path", type=str, default="lmsys/vicuna-7b-v1.3")
parser.add_argument("--model-path", type=str, default="lmsys/vicuna-7b-v1.5")
parser.add_argument(
"--model-names",
type=lambda s: s.split(","),
Expand Down
3 changes: 3 additions & 0 deletions scripts/serving/gradio.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
run: |
conda activate chatbot
python3 -m fastchat.serve.gradio_web_server --share --model-list-mode reload
34 changes: 34 additions & 0 deletions scripts/serving/launch.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
import argparse
import sky

if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument(
"--model-path",
type=str,
required=True,
help="The path to the weights. This can be a local folder or a Hugging Face repo ID.",
)
parser.add_argument("--num-gpus", type=int, default=1)
parser.add_argument("--spot", action="store_true")
parser.add_argument("--controller-name", type=str, default="fastchat-controller")
parser.add_argument("--worker-name", type=str, default="gpu-worker")

args = parser.parse_args()
if len(sky.status(args.controller_name)) == 0:
task = sky.Task.from_yaml("controller.yaml")
sky.launch(task, cluster_name=args.controller_name)
task = sky.Task.from_yaml("gradio.yaml")
sky.exec(task, cluster_name=args.controller_name)

task = sky.Task.from_yaml("model_worker.yaml")
head_ip = sky.status(args.controller_name)[0]['handle'].head_ip
envs = {"CONTROLLER_IP": head_ip}
task.update_envs(envs)

for i in range(args.num_gpus):
worker_name = f"{args.worker_name}-{i}"
if args.spot:
sky.spot_launch(task, name=worker_name)
else:
sky.launch(task, cluster_name=worker_name, detach_setup=True)
1 change: 1 addition & 0 deletions scripts/skyserve
Submodule skyserve added at 38cf57
2 changes: 1 addition & 1 deletion scripts/train_lora.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
deepspeed fastchat/train/train_lora.py \
--model_name_or_path lmsys/vicuna-7b-v1.3 \
--model_name_or_path lmsys/vicuna-7b-v1.5 \
--lora_r 8 \
--lora_alpha 16 \
--lora_dropout 0.05 \
Expand Down
2 changes: 1 addition & 1 deletion tests/test_cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def test_8bit():

def test_hf_api():
models = [
"lmsys/vicuna-7b-v1.3",
"lmsys/vicuna-7b-v1.5",
"lmsys/fastchat-t5-3b-v1.0",
]

Expand Down
2 changes: 1 addition & 1 deletion tests/test_openai_langchain.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Usage:
# python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-7b-v1.3 --model-names gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002
# python3 -m fastchat.serve.model_worker --model-path lmsys/vicuna-7b-v1.5 --model-names gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002
# export OPENAI_API_BASE=http://localhost:8000/v1
# export OPENAI_API_KEY=EMPTY
# wget https://raw.githubusercontent.com/hwchase17/langchain/v0.0.200/docs/modules/state_of_the_union.txt
Expand Down

0 comments on commit 589526a

Please sign in to comment.