From 29615576fbb07465265a9f2297d624979868eed7 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Serta=C3=A7=20=C3=96zercan?= <852750+sozercan@users.noreply.github.com> Date: Sat, 25 May 2024 00:33:50 -0700 Subject: [PATCH 01/10] ci: fix sd release (#2400) Signed-off-by: Sertac Ozercan --- .github/workflows/release.yaml | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/.github/workflows/release.yaml b/.github/workflows/release.yaml index 330b25594985..7c7f77424435 100644 --- a/.github/workflows/release.yaml +++ b/.github/workflows/release.yaml @@ -100,6 +100,12 @@ jobs: with: name: stablediffusion path: release/ + - name: Release + uses: softprops/action-gh-release@v2 + if: startsWith(github.ref, 'refs/tags/') + with: + files: | + release/* build-macOS-arm64: runs-on: macos-14 From e1d6b706f4b8e4499f13c6dcfbdf9ccfbbe20718 Mon Sep 17 00:00:00 2001 From: Ettore Di Giacinto Date: Sat, 25 May 2024 10:08:23 +0200 Subject: [PATCH 02/10] Update quickstart.md (#2404) Signed-off-by: Ettore Di Giacinto --- docs/content/docs/getting-started/quickstart.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/docs/content/docs/getting-started/quickstart.md b/docs/content/docs/getting-started/quickstart.md index 0c964eb0d936..1bba42fb23b0 100644 --- a/docs/content/docs/getting-started/quickstart.md +++ b/docs/content/docs/getting-started/quickstart.md @@ -123,9 +123,7 @@ You can check out the releases in https://github.com/mudler/LocalAI/releases. | OS | Link | | --- | --- | -| Linux (CUDA 11) | [Download](https://github.com/mudler/LocalAI/releases/download/{{< version >}}/local-ai-cuda11-Linux-x86_64) | -| Linux (CUDA 12) | [Download](https://github.com/mudler/LocalAI/releases/download/{{< version >}}/local-ai-cuda12-Linux-x86_64) | -| Linux (No GPU) | [Download](https://github.com/mudler/LocalAI/releases/download/{{< version >}}/local-ai-Linux-x86_64) | +| Linux | [Download](https://github.com/mudler/LocalAI/releases/download/{{< version >}}/local-ai-Linux-x86_64) | | MacOS | [Download](https://github.com/mudler/LocalAI/releases/download/{{< version >}}/local-ai-Darwin-arm64) | From 663488b6bd3f2086dafbcbef9843019a36d1d7b1 Mon Sep 17 00:00:00 2001 From: "LocalAI [bot]" <139863280+localai-bot@users.noreply.github.com> Date: Sat, 25 May 2024 10:08:35 +0200 Subject: [PATCH 03/10] :arrow_up: Update docs version mudler/LocalAI (#2398) Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: mudler <2420543+mudler@users.noreply.github.com> --- docs/data/version.json | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/data/version.json b/docs/data/version.json index 6991ef2f4ffa..d4af2be33b07 100644 --- a/docs/data/version.json +++ b/docs/data/version.json @@ -1,3 +1,3 @@ { - "version": "v2.15.0" + "version": "v2.16.0" } From 003b43f6fc4844cbf495d22438c85e742d130fdc Mon Sep 17 00:00:00 2001 From: Ettore Di Giacinto Date: Sat, 25 May 2024 10:18:20 +0200 Subject: [PATCH 04/10] Update quickstart.md Signed-off-by: Ettore Di Giacinto --- docs/content/docs/getting-started/quickstart.md | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/docs/content/docs/getting-started/quickstart.md b/docs/content/docs/getting-started/quickstart.md index 1bba42fb23b0..f92303e07da7 100644 --- a/docs/content/docs/getting-started/quickstart.md +++ b/docs/content/docs/getting-started/quickstart.md @@ -114,12 +114,17 @@ docker run -p 8080:8080 --name local-ai -ti -v localai-models:/build/models loca {{% /alert %}} -## From binary +## Running LocalAI from Binaries -LocalAI is available as a standalone binary as well. Binaries are compiled for Linux and MacOS and automatically uploaded in the Github releases. Windows is known to work with WSL. +LocalAI binaries are available for both Linux and MacOS platforms and can be executed directly from your command line. These binaries are continuously updated and hosted on [our GitHub Releases page](https://github.com/mudler/LocalAI/releases). This method also supports Windows users via the Windows Subsystem for Linux (WSL). -You can check out the releases in https://github.com/mudler/LocalAI/releases. +Use the following one-liner command in your terminal to download and run LocalAI on Linux or MacOS: +```bash +curl -Lo local-ai "https://github.com/mudler/LocalAI/releases/download/{{< version >}}/local-ai-$(uname -s)-$(uname -m)" && chmod +x local-ai && ./local-ai +``` + +Otherwise, here are the links to the binaries: | OS | Link | | --- | --- | From 785c54e7b0c7824762ac4f025f2da0cfdd1eacf1 Mon Sep 17 00:00:00 2001 From: Ettore Di Giacinto Date: Sat, 25 May 2024 16:11:01 +0200 Subject: [PATCH 05/10] models(gallery): add Mirai Nova (#2405) Signed-off-by: Ettore Di Giacinto --- gallery/index.yaml | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/gallery/index.yaml b/gallery/index.yaml index a38a78e1ee2d..b43aced1c3e2 100644 --- a/gallery/index.yaml +++ b/gallery/index.yaml @@ -57,6 +57,25 @@ - filename: LocalAI-Llama3-8b-Function-Call-v0.2-q4_k_m.bin sha256: 7e46405ce043cbc8d30f83f26a5655dc8edf5e947b748d7ba2745bd0af057a41 uri: huggingface://mudler/LocalAI-Llama3-8b-Function-Call-v0.2-GGUF/LocalAI-Llama3-8b-Function-Call-v0.2-q4_k_m.bin +- !!merge <<: *mudler + icon: "https://cdn-uploads.huggingface.co/production/uploads/647374aa7ff32a81ac6d35d4/SKuXcvmZ_6oD4NCMkvyGo.png" + name: "mirai-nova-llama3-LocalAI-8b-v0.1" + urls: + - https://huggingface.co/mudler/Mirai-Nova-Llama3-LocalAI-8B-v0.1-GGUF + - https://huggingface.co/mudler/Mirai-Nova-Llama3-LocalAI-8B-v0.1 + description: | + Mirai Nova: "Mirai" means future in Japanese, and "Nova" references a star showing a sudden large increase in brightness. + + A set of models oriented in function calling, but generalist and with enhanced reasoning capability. This is fine tuned with Llama3. + + Mirai Nova works particularly well with LocalAI, leveraging the function call with grammars feature out of the box. + overrides: + parameters: + model: Mirai-Nova-Llama3-LocalAI-8B-v0.1-q4_k_m.bin + files: + - filename: Mirai-Nova-Llama3-LocalAI-8B-v0.1-q4_k_m.bin + sha256: 579cbb229f9c11d0330759ff4733102d2491615a4c61289e26c09d1b3a583fec + uri: huggingface://mudler/Mirai-Nova-Llama3-LocalAI-8B-v0.1-GGUF/Mirai-Nova-Llama3-LocalAI-8B-v0.1-q4_k_m.bin - &parler-tts ### START parler-tts url: "github:mudler/LocalAI/gallery/parler-tts.yaml@master" From bb3ec56de3231354ec6a3e9b368f7fe4017385a2 Mon Sep 17 00:00:00 2001 From: Ettore Di Giacinto Date: Sat, 25 May 2024 16:11:59 +0200 Subject: [PATCH 06/10] docs: add distributed inferencing docs Signed-off-by: Ettore Di Giacinto --- README.md | 5 +- docs/content/docs/advanced/advanced-usage.md | 2 + .../docs/features/distributed_inferencing.md | 101 ++++++++++++++++++ docs/content/docs/features/reranker.md | 2 +- docs/content/docs/overview.md | 3 +- 5 files changed, 109 insertions(+), 4 deletions(-) create mode 100644 docs/content/docs/features/distributed_inferencing.md diff --git a/README.md b/README.md index 377df0d28c76..a4479258eb95 100644 --- a/README.md +++ b/README.md @@ -65,7 +65,7 @@ docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu [Roadmap](https://github.com/mudler/LocalAI/issues?q=is%3Aissue+is%3Aopen+label%3Aroadmap) -- 🔥🔥 Decentralized llama.cpp: https://github.com/mudler/LocalAI/pull/2343 (peer2peer llama.cpp!) +- 🔥🔥 Decentralized llama.cpp: https://github.com/mudler/LocalAI/pull/2343 (peer2peer llama.cpp!) 👉 Docs https://localai.io/features/distribute/ - 🔥🔥 Openvoice: https://github.com/mudler/LocalAI/pull/2334 - 🆕 Function calls without grammars and mixed mode: https://github.com/mudler/LocalAI/pull/2328 - 🔥🔥 Distributed inferencing: https://github.com/mudler/LocalAI/pull/2324 @@ -94,7 +94,8 @@ If you want to help and contribute, issues up for grabs: https://github.com/mudl - ✍️ [Constrained grammars](https://localai.io/features/constrained_grammars/) - 🖼️ [Download Models directly from Huggingface ](https://localai.io/models/) - 🥽 [Vision API](https://localai.io/features/gpt-vision/) -- 🆕 [Reranker API](https://localai.io/features/reranker/) +- 📈 [Reranker API](https://localai.io/features/reranker/) +- 🆕🖧 [P2P Inferencing](https://localai.io/features/distribute/) ## 💻 Usage diff --git a/docs/content/docs/advanced/advanced-usage.md b/docs/content/docs/advanced/advanced-usage.md index 085606e52170..40d7d0fcf1e9 100644 --- a/docs/content/docs/advanced/advanced-usage.md +++ b/docs/content/docs/advanced/advanced-usage.md @@ -370,6 +370,8 @@ there are additional environment variables available that modify the behavior of | `GO_TAGS` | | Go tags. Available: `stablediffusion` | | `HUGGINGFACEHUB_API_TOKEN` | | Special token for interacting with HuggingFace Inference API, required only when using the `langchain-huggingface` backend | | `EXTRA_BACKENDS` | | A space separated list of backends to prepare. For example `EXTRA_BACKENDS="backend/python/diffusers backend/python/transformers"` prepares the conda environment on start | +| `DISABLE_AUTODETECT` | `false` | Disable autodetect of CPU flagset on start | +| `LLAMACPP_GRPC_SERVERS` | | A list of llama.cpp workers to distribute the workload. For example `LLAMACPP_GRPC_SERVERS="address1:port,address2:port"` | Here is how to configure these variables: diff --git a/docs/content/docs/features/distributed_inferencing.md b/docs/content/docs/features/distributed_inferencing.md new file mode 100644 index 000000000000..746616f90a12 --- /dev/null +++ b/docs/content/docs/features/distributed_inferencing.md @@ -0,0 +1,101 @@ ++++ +disableToc = false +title = "✍️ Distributed inferencing" +weight = 15 +url = "/features/distribute/" ++++ + +{{% alert note %}} +This feature is available only with llama-cpp compatible models. + +This feature has landed with https://github.com/mudler/LocalAI/pull/2324 and is based on the upstream work in https://github.com/ggerganov/llama.cpp/pull/6829. +{{% /alert %}} + +This feature allows LocalAI to manage the requests while the workload is distributed among workers. + +## Usage + +### Start workers + +To start workers to offload the computation you can run: + +``` +local-ai llamacpp-worker +``` + +However, you can also follow the llama.cpp README and building the rpc-server (https://github.com/ggerganov/llama.cpp/blob/master/examples/rpc/README.md), which is still compatible with LocalAI. + +### Start LocalAI + +When starting the LocalAI server, which is going to accept the API requests, you can set a list of workers IP/address by specifying the addresses with the `LLAMACPP_GRPC_SERVERS` environment variable, for example: + +```bash +LLAMACPP_GRPC_SERVERS="address1:port,address2:port" local-ai run +``` + +At this point the workload hitting in the LocalAI server should be distributed across the nodes! + +## Peer to peer + +![output](https://github.com/mudler/LocalAI/assets/2420543/8ca277cf-c208-4562-8929-808b2324b584) + +The workers can also be connected to each other, creating a peer to peer network, where the workload is distributed among the workers, in a private, decentralized network. + +A shared token between the server and the workers is needed to let the communication happen via the p2p network. This feature supports both local network (with mdns discovery) and dht for communicating also behind different networks. + +The token is generated automatically when starting the server with the `--p2p` flag, and can be used by starting the workers with `local-ai worker p2p-llama-cpp-rpc` by passing the token via environment variable (TOKEN) or with args (--token). + +A network is established between the server and the workers with dht and mdns discovery protocols, the llama.cpp rpc server is automatically started and exposed to the underlying p2p network so the API server can connect on. + +When the HTTP server is started, it will discover the workers in the network and automatically create the port-forwards to the service locally. Then llama.cpp is configured to use the services. If you are interested in how it works behind the scenes, see the PR: https://github.com/mudler/LocalAI/pull/2343. + + +### Usage + +1. Start the server with `--p2p`: + +```bash +./local-ai run --p2p +# 1:02AM INF loading environment variables from file envFile=.env +# 1:02AM INF Setting logging to info +# 1:02AM INF P2P mode enabled +# 1:02AM INF No token provided, generating one +# 1:02AM INF Generated Token: +# XXXXXXXXXXX +# 1:02AM INF Press a button to proceed +``` + +A token is displayed, copy it and press enter. + +You can re-use the same token later restarting the server with `--p2ptoken` (or `P2P_TOKEN`). + +2. Start the workers. Now you can copy the local-ai binary in other hosts, and run as many workers with that token: + +```bash +TOKEN=XXX ./local-ai p2p-llama-cpp-rpc +# 1:06AM INF loading environment variables from file envFile=.env +# 1:06AM INF Setting logging to info +# {"level":"INFO","time":"2024-05-19T01:06:01.794+0200","caller":"config/config.go:288","message":"connmanager disabled\n"} +# {"level":"INFO","time":"2024-05-19T01:06:01.794+0200","caller":"config/config.go:295","message":" go-libp2p resource manager protection enabled"} +# {"level":"INFO","time":"2024-05-19T01:06:01.794+0200","caller":"config/config.go:409","message":"max connections: 100\n"} +# 1:06AM INF Starting llama-cpp-rpc-server on '127.0.0.1:34371' +# {"level":"INFO","time":"2024-05-19T01:06:01.794+0200","caller":"node/node.go:118","message":" Starting EdgeVPN network"} +# create_backend: using CPU backend +# Starting RPC server on 127.0.0.1:34371, backend memory: 31913 MB +# 2024/05/19 01:06:01 failed to sufficiently increase receive buffer size (was: 208 kiB, wanted: 2048 kiB, got: 416 kiB). # See https://github.com/quic-go/quic-go/wiki/UDP-Buffer-Sizes for details. +# {"level":"INFO","time":"2024-05-19T01:06:01.805+0200","caller":"node/node.go:172","message":" Node ID: 12D3KooWJ7WQAbCWKfJgjw2oMMGGss9diw3Sov5hVWi8t4DMgx92"} +# {"level":"INFO","time":"2024-05-19T01:06:01.806+0200","caller":"node/node.go:173","message":" Node Addresses: [/ip4/127.0.0.1/tcp/44931 /ip4/127.0.0.1/udp/33251/quic-v1/webtransport/certhash/uEiAWAhZ-W9yx2ZHnKQm3BE_ft5jjoc468z5-Rgr9XdfjeQ/certhash/uEiB8Uwn0M2TQBELaV2m4lqypIAY2S-2ZMf7lt_N5LS6ojw /ip4/127.0.0.1/udp/35660/quic-v1 /ip4/192.168.68.110/tcp/44931 /ip4/192.168.68.110/udp/33251/quic-v1/webtransport/certhash/uEiAWAhZ-W9yx2ZHnKQm3BE_ft5jjoc468z5-Rgr9XdfjeQ/certhash/uEiB8Uwn0M2TQBELaV2m4lqypIAY2S-2ZMf7lt_N5LS6ojw /ip4/192.168.68.110/udp/35660/quic-v1 /ip6/::1/tcp/41289 /ip6/::1/udp/33160/quic-v1/webtransport/certhash/uEiAWAhZ-W9yx2ZHnKQm3BE_ft5jjoc468z5-Rgr9XdfjeQ/certhash/uEiB8Uwn0M2TQBELaV2m4lqypIAY2S-2ZMf7lt_N5LS6ojw /ip6/::1/udp/35701/quic-v1]"} +# {"level":"INFO","time":"2024-05-19T01:06:01.806+0200","caller":"discovery/dht.go:104","message":" Bootstrapping DHT"} +``` + +(Note you can also supply the token via args) + +At this point, you should see in the server logs messages stating that new workers are found + +3. Now you can start doing inference as usual on the server (the node used on step 1) + + +## Notes + +- Only single model is supported for now +- Make sure that the server sees new workers after usage starts - currently, if you start the inference you can't add other workers later on. \ No newline at end of file diff --git a/docs/content/docs/features/reranker.md b/docs/content/docs/features/reranker.md index 92c406df53da..4bc01a7f2ddd 100644 --- a/docs/content/docs/features/reranker.md +++ b/docs/content/docs/features/reranker.md @@ -1,7 +1,7 @@ +++ disableToc = false -title = " Reranker" +title = "📈 Reranker" weight = 11 url = "/features/reranker/" +++ diff --git a/docs/content/docs/overview.md b/docs/content/docs/overview.md index 15086f6f4cb9..beadfbd3bfa6 100644 --- a/docs/content/docs/overview.md +++ b/docs/content/docs/overview.md @@ -101,7 +101,8 @@ Note that this started just as a fun weekend project by [mudler](https://github. - 🖼️ [Download Models directly from Huggingface ](https://localai.io/models/) - 🥽 [Vision API](https://localai.io/features/gpt-vision/) - 💾 [Stores](https://localai.io/stores) -- 🆕 [Reranker](https://localai.io/features/reranker/) +- 📈 [Reranker](https://localai.io/features/reranker/) +- 🆕🖧 [P2P Inferencing](https://localai.io/features/distribute/) ## Contribute and help From e25fc656c97e4f63cecfc81c35cfb2c9891ef62f Mon Sep 17 00:00:00 2001 From: Ettore Di Giacinto Date: Sat, 25 May 2024 16:13:04 +0200 Subject: [PATCH 07/10] Update README.md Signed-off-by: Ettore Di Giacinto --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index a4479258eb95..dc0ba70e55ff 100644 --- a/README.md +++ b/README.md @@ -89,7 +89,7 @@ If you want to help and contribute, issues up for grabs: https://github.com/mudl - 🗣 [Text to Audio](https://localai.io/features/text-to-audio/) - 🔈 [Audio to Text](https://localai.io/features/audio-to-text/) (Audio transcription with `whisper.cpp`) - 🎨 [Image generation with stable diffusion](https://localai.io/features/image-generation) -- 🔥 [OpenAI functions](https://localai.io/features/openai-functions/) 🆕 +- 🔥 [OpenAI-alike tools API](https://localai.io/features/openai-functions/) - 🧠 [Embeddings generation for vector databases](https://localai.io/features/embeddings/) - ✍️ [Constrained grammars](https://localai.io/features/constrained_grammars/) - 🖼️ [Download Models directly from Huggingface ](https://localai.io/models/) From 785adc1ed5cb623dc9d1dde07061c4e2ddaf0fad Mon Sep 17 00:00:00 2001 From: Ettore Di Giacinto Date: Sat, 25 May 2024 16:13:44 +0200 Subject: [PATCH 08/10] docs: updaet title Signed-off-by: Ettore Di Giacinto --- docs/content/docs/features/distributed_inferencing.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/content/docs/features/distributed_inferencing.md b/docs/content/docs/features/distributed_inferencing.md index 746616f90a12..8a4cc54583cc 100644 --- a/docs/content/docs/features/distributed_inferencing.md +++ b/docs/content/docs/features/distributed_inferencing.md @@ -1,6 +1,6 @@ +++ disableToc = false -title = "✍️ Distributed inferencing" +title = "🆕🖧 Distributed inferencing" weight = 15 url = "/features/distribute/" +++ From fc3502b56f0d69be7e514a32ec22814d95c66915 Mon Sep 17 00:00:00 2001 From: Ettore Di Giacinto Date: Sat, 25 May 2024 20:17:04 +0200 Subject: [PATCH 09/10] docs: rewording Signed-off-by: Ettore Di Giacinto --- .../docs/features/distributed_inferencing.md | 56 +++++++++---------- 1 file changed, 27 insertions(+), 29 deletions(-) diff --git a/docs/content/docs/features/distributed_inferencing.md b/docs/content/docs/features/distributed_inferencing.md index 8a4cc54583cc..b3b84528c0f8 100644 --- a/docs/content/docs/features/distributed_inferencing.md +++ b/docs/content/docs/features/distributed_inferencing.md @@ -1,54 +1,53 @@ +++ disableToc = false -title = "🆕🖧 Distributed inferencing" +title = "🆕🖧 Distributed Inference" weight = 15 url = "/features/distribute/" +++ {{% alert note %}} -This feature is available only with llama-cpp compatible models. +This feature is available exclusively with llama-cpp compatible models. -This feature has landed with https://github.com/mudler/LocalAI/pull/2324 and is based on the upstream work in https://github.com/ggerganov/llama.cpp/pull/6829. +This feature was introduced in [LocalAI pull request #2324](https://github.com/mudler/LocalAI/pull/2324) and is based on the upstream work in [llama.cpp pull request #6829](https://github.com/ggerganov/llama.cpp/pull/6829). {{% /alert %}} -This feature allows LocalAI to manage the requests while the workload is distributed among workers. +This functionality enables LocalAI to distribute inference requests across multiple worker nodes, improving efficiency and performance. ## Usage -### Start workers +### Starting Workers -To start workers to offload the computation you can run: +To start workers for distributing the computational load, run: -``` +```bash local-ai llamacpp-worker ``` -However, you can also follow the llama.cpp README and building the rpc-server (https://github.com/ggerganov/llama.cpp/blob/master/examples/rpc/README.md), which is still compatible with LocalAI. +Alternatively, you can build the RPC server following the llama.cpp [README](https://github.com/ggerganov/llama.cpp/blob/master/examples/rpc/README.md), which is compatible with LocalAI. -### Start LocalAI +### Starting LocalAI -When starting the LocalAI server, which is going to accept the API requests, you can set a list of workers IP/address by specifying the addresses with the `LLAMACPP_GRPC_SERVERS` environment variable, for example: +To start the LocalAI server, which handles API requests, specify the worker addresses using the `LLAMACPP_GRPC_SERVERS` environment variable: ```bash LLAMACPP_GRPC_SERVERS="address1:port,address2:port" local-ai run ``` -At this point the workload hitting in the LocalAI server should be distributed across the nodes! +The workload on the LocalAI server will then be distributed across the specified nodes. -## Peer to peer +## Peer-to-Peer Networking ![output](https://github.com/mudler/LocalAI/assets/2420543/8ca277cf-c208-4562-8929-808b2324b584) -The workers can also be connected to each other, creating a peer to peer network, where the workload is distributed among the workers, in a private, decentralized network. +Workers can also connect to each other in a peer-to-peer network, distributing the workload in a decentralized manner. -A shared token between the server and the workers is needed to let the communication happen via the p2p network. This feature supports both local network (with mdns discovery) and dht for communicating also behind different networks. +A shared token between the server and the workers is required for communication within the peer-to-peer network. This feature supports both local network (using mDNS discovery) and DHT for communication across different networks. -The token is generated automatically when starting the server with the `--p2p` flag, and can be used by starting the workers with `local-ai worker p2p-llama-cpp-rpc` by passing the token via environment variable (TOKEN) or with args (--token). +The token is automatically generated when starting the server with the `--p2p` flag. Workers can be started with the token using `local-ai worker p2p-llama-cpp-rpc` and specifying the token via the environment variable `TOKEN` or with the `--token` argument. -A network is established between the server and the workers with dht and mdns discovery protocols, the llama.cpp rpc server is automatically started and exposed to the underlying p2p network so the API server can connect on. - -When the HTTP server is started, it will discover the workers in the network and automatically create the port-forwards to the service locally. Then llama.cpp is configured to use the services. If you are interested in how it works behind the scenes, see the PR: https://github.com/mudler/LocalAI/pull/2343. +A network is established between the server and workers using DHT and mDNS discovery protocols. The llama.cpp RPC server is automatically started and exposed to the peer-to-peer network, allowing the API server to connect. +When the HTTP server starts, it discovers workers in the network and creates port forwards to the local service. Llama.cpp is configured to use these services. For more details on the implementation, refer to [LocalAI pull request #2343](https://github.com/mudler/LocalAI/pull/2343). ### Usage @@ -65,14 +64,14 @@ When the HTTP server is started, it will discover the workers in the network and # 1:02AM INF Press a button to proceed ``` -A token is displayed, copy it and press enter. +Copy the displayed token and press Enter. -You can re-use the same token later restarting the server with `--p2ptoken` (or `P2P_TOKEN`). +To reuse the same token later, restart the server with `--p2ptoken` or `P2P_TOKEN`. -2. Start the workers. Now you can copy the local-ai binary in other hosts, and run as many workers with that token: +2. Start the workers. Copy the `local-ai` binary to other hosts and run as many workers as needed using the token: ```bash -TOKEN=XXX ./local-ai p2p-llama-cpp-rpc +TOKEN=XXX ./local-ai p2p-llama-cpp-rpc # 1:06AM INF loading environment variables from file envFile=.env # 1:06AM INF Setting logging to info # {"level":"INFO","time":"2024-05-19T01:06:01.794+0200","caller":"config/config.go:288","message":"connmanager disabled\n"} @@ -88,14 +87,13 @@ TOKEN=XXX ./local-ai p2p-llama-cpp-rpc # {"level":"INFO","time":"2024-05-19T01:06:01.806+0200","caller":"discovery/dht.go:104","message":" Bootstrapping DHT"} ``` -(Note you can also supply the token via args) - -At this point, you should see in the server logs messages stating that new workers are found +(Note: You can also supply the token via command-line arguments) -3. Now you can start doing inference as usual on the server (the node used on step 1) +The server logs should indicate that new workers are being discovered. +3. Start inference as usual on the server initiated in step 1. -## Notes +## Notes -- Only single model is supported for now -- Make sure that the server sees new workers after usage starts - currently, if you start the inference you can't add other workers later on. \ No newline at end of file +- Only a single model is supported currently. +- Ensure the server detects new workers before starting inference. Currently, additional workers cannot be added once inference has begun. \ No newline at end of file From b90cdced5934fe85f48f7f9942cfbd6f781174e6 Mon Sep 17 00:00:00 2001 From: Ettore Di Giacinto Date: Sat, 25 May 2024 20:18:25 +0200 Subject: [PATCH 10/10] docs: rewording Signed-off-by: Ettore Di Giacinto --- .../docs/features/constrained_grammars.md | 19 +++++++++++-------- 1 file changed, 11 insertions(+), 8 deletions(-) diff --git a/docs/content/docs/features/constrained_grammars.md b/docs/content/docs/features/constrained_grammars.md index 9aa9279ee860..5ffa3a231cb2 100644 --- a/docs/content/docs/features/constrained_grammars.md +++ b/docs/content/docs/features/constrained_grammars.md @@ -1,26 +1,27 @@ - +++ disableToc = false -title = "✍️ Constrained grammars" +title = "✍️ Constrained Grammars" weight = 15 url = "/features/constrained_grammars/" +++ -The chat endpoint accepts an additional `grammar` parameter which takes a [BNF defined grammar](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form). +## Overview -This allows the LLM to constrain the output to a user-defined schema, allowing to generate `JSON`, `YAML`, and everything that can be defined with a BNF grammar. +The `chat` endpoint supports the `grammar` parameter, which allows users to specify a grammar in Backus-Naur Form (BNF). This feature enables the Large Language Model (LLM) to generate outputs adhering to a user-defined schema, such as `JSON`, `YAML`, or any other format that can be defined using BNF. For more details about BNF, see [Backus-Naur Form on Wikipedia](https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form). {{% alert note %}} -This feature works only with models compatible with the [llama.cpp](https://github.com/ggerganov/llama.cpp) backend (see also [Model compatibility]({{%relref "docs/reference/compatibility-table" %}})). For details on how it works, see the upstream PRs: https://github.com/ggerganov/llama.cpp/pull/1773, https://github.com/ggerganov/llama.cpp/pull/1887 +**Compatibility Notice:** This feature is only supported by models that use the [llama.cpp](https://github.com/ggerganov/llama.cpp) backend. For a complete list of compatible models, refer to the [Model Compatibility](docs/reference/compatibility-table) page. For technical details, see the related pull requests: [PR #1773](https://github.com/ggerganov/llama.cpp/pull/1773) and [PR #1887](https://github.com/ggerganov/llama.cpp/pull/1887). {{% /alert %}} ## Setup -Follow the setup instructions from the [LocalAI functions]({{%relref "docs/features/openai-functions" %}}) page. +To use this feature, follow the installation and setup instructions on the [LocalAI Functions](docs/features/openai-functions) page. Ensure that your local setup meets all the prerequisites specified for the llama.cpp backend. + +## 💡 Usage Example -## 💡 Usage example +The following example demonstrates how to use the `grammar` parameter to constrain the model's output to either "yes" or "no". This can be particularly useful in scenarios where the response format needs to be strictly controlled. -For example, to constrain the output to either `yes`, `no`: +### Example: Binary Response Constraint ```bash curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{ @@ -29,3 +30,5 @@ curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/jso "grammar": "root ::= (\"yes\" | \"no\")" }' ``` + +In this example, the `grammar` parameter is set to a simple choice between "yes" and "no", ensuring that the model's response adheres strictly to one of these options regardless of the context. \ No newline at end of file