Skip to content

Latest commit

 

History

History
30 lines (22 loc) · 815 Bytes

README.md

File metadata and controls

30 lines (22 loc) · 815 Bytes

TGI LLM Microservice

Text Generation Inference (TGI) is a toolkit for deploying and serving Large Language Models (LLMs). TGI enables high-performance text generation for the most popular open-source LLMs, including Llama, Falcon, StarCoder, BLOOM, GPT-NeoX, and more.

Start TGI with docker compose

Set up environment.

export LLM_ENDPOINT_PORT=8008
export host_ip=${host_ip}
export HF_TOKEN=${HF_TOKEN}
export LLM_MODEL_ID="Intel/neural-chat-7b-v3-3"
export MAX_INPUT_TOKENS=1024
export MAX_TOTAL_TOKENS=2048

Run tgi on xeon.

cd deplopyment/docker_compose
docker compose -f compose.yaml tgi-server up -d

Run tgi on gaudi.

cd deplopyment/docker_compose
docker compose -f compose.yaml tgi-gaudi-server up -d