Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
Atinoda committed Feb 9, 2024
1 parent 0b6b7bc commit 2a1dbbf
Showing 1 changed file with 32 additions and 20 deletions.
52 changes: 32 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# Introduction
This project dockerises the deployment of [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) and its variants. It provides a default configuration (corresponding to a vanilla deployment of the application) as well as pre-configured support for other set-ups (e.g., `llama-cpu` for CPU-only inferencing). Pre-built images are available on Docker Hub: [https://hub.docker.com/r/atinoda/text-generation-webui](https://hub.docker.com/r/atinoda/text-generation-webui) for convenience.
This project dockerises the deployment of [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui) and its variants. It provides a default configuration corresponding to a standard deployment of the application with all extensions enabled, and a base version without extensions. Versions are offered for Nvidia GPU `nvidia`, AMD GPU (unstable) `rocm`, Intel Arc (unstable) `arc`, and CPU-only `cpu`. Pre-built images are available on Docker Hub: [https://hub.docker.com/r/atinoda/text-generation-webui](https://hub.docker.com/r/atinoda/text-generation-webui) for convenience.

*The goal of this project is to be to [oobabooga/text-generation-webui](https://github.com/oobabooga/text-generation-webui), what [AbdBarho/stable-diffusion-webui-docker](https://github.com/AbdBarho/stable-diffusion-webui-docker) is to [AUTOMATIC1111/stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui).*

Expand All @@ -11,33 +11,30 @@ This project dockerises the deployment of [oobabooga/text-generation-webui](http
- Navigate to `127.0.0.1:7860` and enjoy your local instance of oobabooga's text-generation-webui!

# Usage
This repo provides a template `docker-compose.yml` and a structured `config` folder to store the application files. The project officially targets Linux as the deployment platform, however the images are reported to work on Docker Desktop for Windows, and this should continue to be the case. There may be some additional steps required for networking and file management when using WSL2. Some Mac users have been able to run the images, although the Dockerfile may require modifications for Apple Silicon compatibility.
This repo provides a template `docker-compose.yml` and a structured `config` folder to store the application files. The project officially targets Linux as the deployment platform, however the images will also work on Docker Desktop for Windows. There is no plan to support Apple Silicon because that platform runs docker inside a VM, and Apple has chosen not to support virtualised hardware access. Intel Macs should be fine to run with the variant suitable for their GPU.

*Check the issues for hints and tips for your platform (and remember to search closed issues too!)*

## Pre-Requisites
- docker
- docker compose
- CUDA docker runtime *(optional, for GPU-powered inferencing)*
- CUDA docker runtime *(optional, for Nvidia GPU-powered inferencing)*

*Ask your favourite LLM how to install and configure `docker`, `docker-compose`, and the Nvidia CUDA docker runtime for your platform!*

## Docker Compose
This is the recommended deployment method (it is the easiest and quickest way to manage folders and settings through updates and reinstalls). The recommend variant is `default` (it is an enhanced version of the vanilla application).
This is the recommended deployment method (it is the easiest and quickest way to manage folders and settings through updates and reinstalls). The recommend variant is `default-nvidia` (it is the fll version of the vanilla application with all default extensions installed).

### Select variant
Each variant has the 'extras' included in `default` but has some changes made as described in the table. Tagged release versions are published on a regular basis - check [hub.docker.com/r/atinoda/text-generation-webui](https://hub.docker.com/r/atinoda/text-generation-webui) for available tags. Pseudo-versions may be selected periodically from the main branch and uploaded with a date tag to establish more frequent stable milestones, *but this should be rare because the upstream project has implemented weekly rolling releaese snapshots.* Pulling an untagged variant will pull either the latest release version or latest pseudo-version, whichever is most recent. Bleeding-edge is available via nightly builds of each variant. Choose the desired variant by setting the image `:tag` in `docker-compose.yml` to one of the following options:
Each variant has the 'extras' included in `default` but has some changes made as described in the table. Tagged release versions are published on a regular basis - check [hub.docker.com/r/atinoda/text-generation-webui](https://hub.docker.com/r/atinoda/text-generation-webui) for available tags. Pulling an untagged variant will pull either the latest stable release. Bleeding-edge is available via nightly builds of each variant. Choose the desired variant by setting the image `:tag` in `docker-compose.yml` using the pattern `{VARIANT}-{PLATFORM}-{VERSION}`, as follows:

| Variant | Description |
|---|---|
| `default` | Implementation of the vanilla deployment from source. Plus pre-installed `ExLlamaV2` library from `turboderp/exllamav2` with `flash-attn` enabled, and CUDA GPU offloading enabled for `llama-cpp`. *This version is recommended for most users.* |
| `triton` | Updated `GPTQ-for-llama` using the latest `triton` branch from `qwopqwop200/GPTQ-for-LLaMa`. Suitable for Linux only. *This version is accurate but a little slow.* ***DEPRECATION WARNING:** This version is outdated, but will remain for now.* |
| `cuda` | Updated `GPTQ-for-llama` using the latest `cuda` branch from `qwopqwop200/GPTQ-for-LLaMa`. *This version is very slow!* ***DEPRECATION WARNING:** This version is outdated, but will remain for now.* |
| `llama-cpu` | GPU supported is REMOVED from `llama-cpp`. Suitable for systems without a CUDA-capable GPU. *This is only for when GPU acceleration is not available and is a slower way to run models!* |
| `{VARIANT}-{VERSION}` | Build of each {VARIANT} tagged with the release {VERSION} of the text-generation-webui (e.g., `default-v1.5`). *Visit [obabooga/text-generation-webui/releases](https://github.com/oobabooga/text-generation-webui/releases) for release notes.* Dated milestone pseudo-version tags are also available (e.g., `default-2023.09.02`). *Visit [hub.docker.com/r/atinoda/text-generation-webui](https://hub.docker.com/r/atinoda/text-generation-webui) to see the available milestones.*|
| `{VARIANT}-nightly` | Automated nightly build of the {VARIANT}. These images are built and pushed automatically - they are untested and may be unstable. *Suitable when more frequent updates are required and instability is not an issue.* |

*See: [oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/GPTQ-models-(4-bit-mode).md), [obabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md), and [oobabooga/text-generation-webui/blob/main/docs/ExLlama.md](https://github.com/oobabooga/text-generation-webui/blob/main/docs/ExLlama.md) for more information on variants.*
| `default` | Standard deployment with all extensions, configured for Nvidia GPU accelerated inferencing. *This version is recommended for most users.* |
| `default-{PLATFORM}` | Standard deployment with all extensions, configured for each supported platform: `nvidia`, `rocm`, `arc`, or `cpu`. *e.g., `default-cpu`* |
| `base-{PLATFORM}` | Basic deployment with no extensions included, configured for each supported platform: `nvidia`, `rocm`, `arc`, or `cpu`. *e.g., `base-rocm`* |
| `{VARIANT}-{PLATFORM}-{VERSION}` | Build of each `{VARIANT}-{PLATFORM}` tagged with the release `{VERSION}` of the text-generation-webui (e.g., `default-nvidia-snapshot-2024-02-04`). *Visit [obabooga/text-generation-webui/releases](https://github.com/oobabooga/text-generation-webui/releases) for release notes. Go to [hub.docker.com/r/atinoda/text-generation-webui](https://hub.docker.com/r/atinoda/text-generation-webui) to see the available milestones.*|
| `{VARIANT}-{PLATFORM}-nightly` | Automated nightly build of the variant. These images are built and pushed automatically - they are untested and may be unstable. *Suitable when more frequent updates are required and instability is not an issue.* |

### Deploy
Deploy the service:
Expand Down Expand Up @@ -75,16 +72,14 @@ Extra launch arguments can be defined in the environment variable `EXTRA_LAUNCH_

*Launch arguments should be defined as a space-separated list, just like writing them on the command line. These arguments are passed to the `server.py` module.*

**Kubernetes users:** Please see [EXTRA_LAUNCH_ARGS are not honored #25](https://github.com/Atinoda/text-generation-webui-docker/issues/25) for fixing deployments. *Thanks to @jrsperry for reporting, and @accountForIssues for sharing a workaround (TLDR: Escape space characters `\ ` instead of ` `.)*

### Runtime extension build
Extensions which should be built during startup can be defined in the environment variable `BUILD_EXTENSIONS_LIVE` (e.g., `"silero_tts whisper_stt"`, will rebuild those extensions at launch). This feature may be useful if you are developing a third-party extension and need its dependencies to refresh at launch.
Extensions which should be built during startup can be defined in the environment variable `BUILD_EXTENSIONS_LIVE` (e.g., `"coqui_tts whisper_stt"`, will rebuild those extensions at launch). This feature may be useful if you are developing a third-party extension and need its dependencies to refresh at launch.

**Startup times will be much slower** if you use this feature, because it will rebuild the named extensions every time the container is started (i.e., don't use this feature unless you are certain that you need it.)

*Extension names for runtime build should be defined as a space-separated list.*

### Updates
## Updates
These projects are moving quickly! To update to the most recent version on Docker hub, pull the latest image:

`docker compose pull`
Expand All @@ -95,7 +90,7 @@ Then recreate the container:

*When the container is launched, it will print out how many commits behind origin the current build is, so you can decide if you want to update it. Docker hub images will be periodically updated. The `default-nightly` image is built every day but it is not manually tested. If you need bleeding edge versions you must build locally.*

### Build (optional)
## Build (optional)
The provided `docker-compose.yml.build` shows how to build the image locally. You can use it as a reference to modify the original `docker-compose.yml`, or you can rename it and use it as-is. Choose the desired variant to build by setting the build `target` and then run:

`docker compose build`
Expand All @@ -117,17 +112,34 @@ NOT recommended, instructions are included for completeness.
### Run
Run a network accessible container (and destroy it upon completion):

`docker run -it --rm -e EXTRA_LAUNCH_ARGS="--listen --verbose" --gpus all -p 7860:7860 atinoda/text-generation-webui:default`
`docker run -it --rm -e EXTRA_LAUNCH_ARGS="--listen --verbose" --gpus all -p 7860:7860 atinoda/text-generation-webui:default-nvidia`

### Build and run (optional)
Build the image for the default target and tag it as `local` :

`docker build --target default -t text-generation-webui:local .`
`docker build --target default-nvidia -t text-generation-webui:local .`

Run the local image with local network access (and destroy it upon completion):

`docker run -it --rm -e EXTRA_LAUNCH_ARGS="--listen --verbose" --gpus all -p 7860:7860 text-generation-webui:local`

# Known Issues
## AMD GPU ROCM
The `rocm` variant is blind built and cannot be tested due to a lack of hardware. User reports and insights are welcomed.

## Intel Arc GPU
The `arc` variant is blind built and cannot be tested due to a lack of hardware. User reports and insights are welcomed.

## Extensions
The following are known issues with the following extensions and they are planned to be fixed. Testing and insights are welcomed!
- multimodal: Crashes because model is not loaded at start
- ngrok: Requires an account, causes a crash
- silero_tts: Does not work due to pydantic dependency problem
- superbooga/superboogav2: Crashes on startup

## KUBERNETES
Please see [EXTRA_LAUNCH_ARGS are not honored #25](https://github.com/Atinoda/text-generation-webui-docker/issues/25) for fixing deployments. *Thanks to @jrsperry for reporting, and @accountForIssues for sharing a workaround (TLDR: Escape space characters with `\ `, instead of writing as ` ` .)*

# Contributions
Contributions are welcomed - please feel free to submit a PR. More variants (e.g., AMD/ROC-M support) and Windows support can help lower the barrier to entry, make this technology accessible to as many people as possible, and push towards democratising the severe impacts that AI is having on our society.

Expand Down

0 comments on commit 2a1dbbf

Please sign in to comment.