Name		Name	Last commit message	Last commit date
parent directory ..
chart		chart
scripts		scripts
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
config.yaml		config.yaml
llama-cpp-python-values.yaml		llama-cpp-python-values.yaml
main.py		main.py
pyproject.toml		pyproject.toml
zarf.yaml		zarf.yaml

README.md

LeapfrogAI llama-cpp-python Backend

A LeapfrogAI API-compatible llama-cpp-python w wrapper for quantized and un-quantized model inferencing across CPU infrastructures.

See instructions to get the backend up and running. Then, use the LeapfrogAI API server to interact with the backend.

Instructions

The instructions in this section assume the following:

Properly installed and configured Python 3.11.x, to include its development tools
The LeapfrogAI API server is deployed and running

The following are additional assumptions for GPU inferencing:

You have properly installed one or more NVIDIA GPUs and GPU drivers
You have properly installed and configured the cuda-toolkit and nvidia-container-toolkit

Model Selection

The default model that comes with this backend in this repository's officially released images is a 4-bit quantization of the Synthia-7b model.

Models are pulled from HuggingFace Hub via the model_download.py script. To change what model comes with the llama-cpp-python backend, set the following environment variables:

REPO_ID   # eg: "TheBloke/SynthIA-7B-v2.0-GGUF"
FILENAME  # eg: "synthia-7b-v2.0.Q4_K_M.gguf"
REVISION  # eg: "3f65d882253d1f15a113dabf473a7c02a004d2b5"

Zarf Package Deployment

To build and deploy just the llama-cpp-python Zarf package (from the root of the repository):

Deploy a UDS cluster if one isn't deployed already

pip install 'huggingface_hub[cli,hf_transfer]'  # Used to download the model weights from huggingface
make build-llama-cpp-python LOCAL_VERSION=dev
uds zarf package deploy packages/llama-cpp-python/zarf-package-llama-cpp-python-*-dev.tar.zst --confirm

Run Locally

To run the llama-cpp-python backend locally (starting from the root directory of the repository):

From this directory:

# Setup Virtual Environment
python -m venv .venv
source .venv/bin/activate

# Install dependencies
python -m pip install src/leapfrogai_sdk
cd packages/llama-cpp-python
python -m pip install ".[dev]"

# Clone Model
# Supply a REPO_ID, FILENAME and REVISION if a different model is desired
python scripts/model_download.py

mv .model/*.gguf .model/model.gguf

# Start Model Backend
lfai-cli --app-dir=. main:Model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama-cpp-python

llama-cpp-python

README.md

LeapfrogAI llama-cpp-python Backend

Instructions

Model Selection

Zarf Package Deployment

Run Locally

Files

llama-cpp-python

Directory actions

More options

Directory actions

More options

Latest commit

History

llama-cpp-python

Folders and files

parent directory

README.md

LeapfrogAI llama-cpp-python Backend

Instructions

Model Selection

Zarf Package Deployment

Run Locally