Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Intel Research Agreement #1502

Merged
merged 8 commits into from
Nov 29, 2024
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Apply suggestions from code review
Co-authored-by: Omar Sanseviero <osanseviero@gmail.com>
Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
3 people authored Oct 3, 2024
commit 275d3275eb951d5d79c942824d8cfcb2e4c08195
8 changes: 5 additions & 3 deletions docs/hub/gguf-llamacpp.md
Original file line number Diff line number Diff line change
@@ -3,15 +3,17 @@
> [!TIP]
> You can now deploy any llama.cpp compatible GGUF on Hugging Face Endpoints, read more about it [here](https://huggingface.co/docs/inference-endpoints/en/others/llamacpp_container)

Llama.cpp allows you to download and run inference on a GGUF simply by providing a path to the Hugging Face repo path and the file name. llama.cpp downloads the model checkpoint and automatically caches it. The location of the cache is defined by `LLAMA_CACHE` environment variable, read more about it [here](https://github.com/ggerganov/llama.cpp/pull/7826).
Llama.cpp allows you to download and run inference on a GGUF simply by providing a path to the Hugging Face repo path and the file name. llama.cpp downloads the model checkpoint and automatically caches it. The location of the cache is defined by `LLAMA_CACHE` environment variable; read more about it [here](https://github.com/ggerganov/llama.cpp/pull/7826).

Install llama.cpp through brew (works on Mac and Linux)
You can install llama.cpp through brew (works on Mac and Linux), or you can build it from source. There are also pre-built binaries and Docker images that you can [check in the official documentation](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage).

### Option 1: Install with brew

```bash
brew install llama.cpp
```

You can also use this checkpoint directly through the [usage steps](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#usage) listed in the Llama.cpp repo as well.
### Option 2: build from source

Step 1: Clone llama.cpp from GitHub.