From d440f038a6448e2ee0eeba0c1b2a038a73513486 Mon Sep 17 00:00:00 2001 From: vb Date: Wed, 2 Oct 2024 18:56:49 +0200 Subject: [PATCH 1/7] Update gguf-llamacpp.md --- docs/hub/gguf-llamacpp.md | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/docs/hub/gguf-llamacpp.md b/docs/hub/gguf-llamacpp.md index daafc6538..439fb180f 100644 --- a/docs/hub/gguf-llamacpp.md +++ b/docs/hub/gguf-llamacpp.md @@ -1,6 +1,30 @@ # GGUF usage with llama.cpp -Llama.cpp allows you to download and run inference on a GGUF simply by providing a path to the Hugging Face repo path and the file name. llama.cpp download the model checkpoint and automatically caches it. The location of the cache is defined by `LLAMA_CACHE` environment variable, read more about it [here](https://github.com/ggerganov/llama.cpp/pull/7826): +NEW: You can now deploy any llama.cpp compatible GGUF on Hugging Face Endpoints, read more about it [here](https://huggingface.co/docs/inference-endpoints/en/others/llamacpp_container) + +Llama.cpp allows you to download and run inference on a GGUF simply by providing a path to the Hugging Face repo path and the file name. llama.cpp downloads the model checkpoint and automatically caches it. The location of the cache is defined by `LLAMA_CACHE` environment variable, read more about it [here](https://github.com/ggerganov/llama.cpp/pull/7826). + +Install llama.cpp through brew (works on Mac and Linux) + +```bash +brew install llama.cpp +``` + +You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. + +Step 1: Clone llama.cpp from GitHub. + +``` +git clone https://github.com/ggerganov/llama.cpp +``` + +Step 2: Move into the llama.cpp folder and build it with `LLAMA_CURL=1` flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux). + +``` +cd llama.cpp && LLAMA_CURL=1 make +``` + +Once installed, you can use the `llama-cli` or `llama-server` as follows: ```bash ./llama-cli From 78f71e8071dc9b4f6dfd60f34f7ea4aafda056d6 Mon Sep 17 00:00:00 2001 From: Vaibhav Srivastav Date: Thu, 3 Oct 2024 12:41:41 +0200 Subject: [PATCH 2/7] up. --- docs/hub/gguf-llamacpp.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/hub/gguf-llamacpp.md b/docs/hub/gguf-llamacpp.md index 439fb180f..59d356969 100644 --- a/docs/hub/gguf-llamacpp.md +++ b/docs/hub/gguf-llamacpp.md @@ -1,6 +1,7 @@ # GGUF usage with llama.cpp -NEW: You can now deploy any llama.cpp compatible GGUF on Hugging Face Endpoints, read more about it [here](https://huggingface.co/docs/inference-endpoints/en/others/llamacpp_container) +> [!TIP] +> You can now deploy any llama.cpp compatible GGUF on Hugging Face Endpoints, read more about it [here](https://huggingface.co/docs/inference-endpoints/en/others/llamacpp_container) Llama.cpp allows you to download and run inference on a GGUF simply by providing a path to the Hugging Face repo path and the file name. llama.cpp downloads the model checkpoint and automatically caches it. The location of the cache is defined by `LLAMA_CACHE` environment variable, read more about it [here](https://github.com/ggerganov/llama.cpp/pull/7826). @@ -27,7 +28,7 @@ cd llama.cpp && LLAMA_CURL=1 make Once installed, you can use the `llama-cli` or `llama-server` as follows: ```bash -./llama-cli +llama-cli --hf-repo lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF \ --hf-file Meta-Llama-3-8B-Instruct-Q8_0.gguf \ -p "You are a helpful assistant" -cnv @@ -38,7 +39,7 @@ Note: You can remove `-cnv` to run the CLI in chat completion mode. Additionally, you can invoke an OpenAI spec chat completions endpoint directly using the llama.cpp server: ```bash -./llama-server \ +llama-server \ --hf-repo lmstudio-community/Meta-Llama-3-8B-Instruct-GGUF \ --hf-file Meta-Llama-3-8B-Instruct-Q8_0.gguf ``` From 275d3275eb951d5d79c942824d8cfcb2e4c08195 Mon Sep 17 00:00:00 2001 From: vb Date: Thu, 3 Oct 2024 16:54:40 +0200 Subject: [PATCH 3/7] Apply suggestions from code review Co-authored-by: Omar Sanseviero Co-authored-by: Pedro Cuenca --- docs/hub/gguf-llamacpp.md | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/docs/hub/gguf-llamacpp.md b/docs/hub/gguf-llamacpp.md index 59d356969..90b1972b8 100644 --- a/docs/hub/gguf-llamacpp.md +++ b/docs/hub/gguf-llamacpp.md @@ -3,15 +3,17 @@ > [!TIP] > You can now deploy any llama.cpp compatible GGUF on Hugging Face Endpoints, read more about it [here](https://huggingface.co/docs/inference-endpoints/en/others/llamacpp_container) -Llama.cpp allows you to download and run inference on a GGUF simply by providing a path to the Hugging Face repo path and the file name. llama.cpp downloads the model checkpoint and automatically caches it. The location of the cache is defined by `LLAMA_CACHE` environment variable, read more about it [here](https://github.com/ggerganov/llama.cpp/pull/7826). +Llama.cpp allows you to download and run inference on a GGUF simply by providing a path to the Hugging Face repo path and the file name. llama.cpp downloads the model checkpoint and automatically caches it. The location of the cache is defined by `LLAMA_CACHE` environment variable; read more about it [here](https://github.com/ggerganov/llama.cpp/pull/7826). -Install llama.cpp through brew (works on Mac and Linux) +You can install llama.cpp through brew (works on Mac and Linux), or you can build it from source. There are also pre-built binaries and Docker images that you can [check in the official documentation](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage). + + ### Option 1: Install with brew ```bash brew install llama.cpp ``` -You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well. +### Option 2: build from source Step 1: Clone llama.cpp from GitHub. From 36c77a3b6feece7dbb5b7ad50bdfb8a37f9edf20 Mon Sep 17 00:00:00 2001 From: vb Date: Mon, 25 Nov 2024 18:28:36 +0100 Subject: [PATCH 4/7] Add Intel Research Agreement --- docs/hub/repositories-licenses.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/hub/repositories-licenses.md b/docs/hub/repositories-licenses.md index 59ff4fd5a..5de1001b8 100644 --- a/docs/hub/repositories-licenses.md +++ b/docs/hub/repositories-licenses.md @@ -60,6 +60,7 @@ A full list of the available licenses is available here: | GNU Lesser General Public License v2.1 | `lgpl-2.1` | | GNU Lesser General Public License v3.0 | `lgpl-3.0` | | ISC | `isc` | +| Intel Research Use License Agreeme | `intel-research` | | LaTeX Project Public License v1.3c | `lppl-1.3c` | | Microsoft Public License | `ms-pl` | | Apple Sample Code license | `apple-ascl` | From 860628685dc400255ae41f70224dfe72720dbe68 Mon Sep 17 00:00:00 2001 From: vb Date: Tue, 26 Nov 2024 15:59:53 +0100 Subject: [PATCH 5/7] Update docs/hub/repositories-licenses.md Co-authored-by: Julien Chaumond --- docs/hub/repositories-licenses.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hub/repositories-licenses.md b/docs/hub/repositories-licenses.md index 5de1001b8..8160be820 100644 --- a/docs/hub/repositories-licenses.md +++ b/docs/hub/repositories-licenses.md @@ -60,7 +60,7 @@ A full list of the available licenses is available here: | GNU Lesser General Public License v2.1 | `lgpl-2.1` | | GNU Lesser General Public License v3.0 | `lgpl-3.0` | | ISC | `isc` | -| Intel Research Use License Agreeme | `intel-research` | +| Intel Research Use License Agreement | `intel-research` | | LaTeX Project Public License v1.3c | `lppl-1.3c` | | Microsoft Public License | `ms-pl` | | Apple Sample Code license | `apple-ascl` | From a25f324993b1a3590f34b29dd8c426e731963cdc Mon Sep 17 00:00:00 2001 From: Julien Chaumond Date: Tue, 26 Nov 2024 19:05:04 +0100 Subject: [PATCH 6/7] Update docs/hub/repositories-licenses.md --- docs/hub/repositories-licenses.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hub/repositories-licenses.md b/docs/hub/repositories-licenses.md index 8160be820..648f0738c 100644 --- a/docs/hub/repositories-licenses.md +++ b/docs/hub/repositories-licenses.md @@ -60,7 +60,7 @@ A full list of the available licenses is available here: | GNU Lesser General Public License v2.1 | `lgpl-2.1` | | GNU Lesser General Public License v3.0 | `lgpl-3.0` | | ISC | `isc` | -| Intel Research Use License Agreement | `intel-research` | +| Intel Research Use License Agreement | `intel-research` | | LaTeX Project Public License v1.3c | `lppl-1.3c` | | Microsoft Public License | `ms-pl` | | Apple Sample Code license | `apple-ascl` | From 6fc3b4f17c984cf7eb2de030087dbd5c179f1290 Mon Sep 17 00:00:00 2001 From: Vaibhav Srivastav Date: Fri, 29 Nov 2024 17:12:06 +0100 Subject: [PATCH 7/7] up. --- docs/hub/repositories-licenses.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/hub/repositories-licenses.md b/docs/hub/repositories-licenses.md index 648f0738c..35764532c 100644 --- a/docs/hub/repositories-licenses.md +++ b/docs/hub/repositories-licenses.md @@ -60,7 +60,7 @@ A full list of the available licenses is available here: | GNU Lesser General Public License v2.1 | `lgpl-2.1` | | GNU Lesser General Public License v3.0 | `lgpl-3.0` | | ISC | `isc` | -| Intel Research Use License Agreement | `intel-research` | +| Intel Research Use License Agreement | `intel-research` | | LaTeX Project Public License v1.3c | `lppl-1.3c` | | Microsoft Public License | `ms-pl` | | Apple Sample Code license | `apple-ascl` |