-
Notifications
You must be signed in to change notification settings - Fork 112
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add a model_server example podman-llm
This is a tool that was written to be as simple as ollama, in it's simplest form it's: podman-llm run granite Signed-off-by: Eric Curtin <[email protected]>
- Loading branch information
1 parent
5875d90
commit a39f1ee
Showing
1 changed file
with
89 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
# podman-llm | ||
|
||
The goal of podman-llm is to make AI even more boring. | ||
|
||
## Install | ||
|
||
Install podman-llm by running this one-liner: | ||
|
||
``` | ||
curl -fsSL https://raw.githubusercontent.com/ericcurtin/podman-llm/s/install.sh | sudo bash | ||
``` | ||
|
||
## Usage | ||
|
||
### Running Models | ||
|
||
You can run a model using the `run` command. This will start an interactive session where you can query the model. | ||
|
||
``` | ||
$ podman-llm run granite | ||
> Tell me about podman in less than ten words | ||
A fast, secure, and private container engine for modern applications. | ||
> | ||
``` | ||
|
||
### Serving Models | ||
|
||
To serve a model via HTTP, use the `serve` command. This will start an HTTP server that listens for incoming requests to interact with the model. | ||
|
||
``` | ||
$ podman-llm serve granite | ||
... | ||
{"tid":"140477699799168","timestamp":1719579518,"level":"INFO","function":"main","line":3793,"msg":"HTTP server listening","n_threads_http":"11","port":"8080","hostname":"127.0.0.1"} | ||
... | ||
``` | ||
|
||
## Model library | ||
|
||
| Model | Parameters | Run | | ||
| ------------------ | ---------- | ------------------------------ | | ||
| granite | 3B | `podman-llm run granite` | | ||
| mistral | 7B | `podman-llm run mistral` | | ||
| merlinite | 7B | `podman-llm run merlinite` | | ||
|
||
## Containerfile Example | ||
|
||
Here is an example Containerfile: | ||
|
||
``` | ||
FROM quay.io/podman-llm/podman-llm:41 | ||
RUN llama-main --hf-repo ibm-granite/granite-3b-code-instruct-GGUF -m granite-3b-code-instruct.Q4_K_M.gguf | ||
LABEL MODEL=/granite-3b-code-instruct.Q4_K_M.gguf | ||
``` | ||
|
||
`LABEL MODEL` is important so we know where to find the .gguf file. | ||
|
||
And we build via: | ||
|
||
``` | ||
podman-llm build granite | ||
``` | ||
|
||
## Diagram | ||
|
||
``` | ||
+------------------------+ +--------------------+ +------------------+ | ||
| | | Pull runtime layer | | Pull model layer | | ||
| podman-llm run | -> | with llama.cpp | -> | with granite | | ||
| | | | | | | ||
+------------------------+ +--------------------+ |------------------| | ||
| Repo options: | | ||
+------------------+ | ||
| | | ||
v v | ||
+--------------+ +---------+ | ||
| Hugging Face | | quay.io | | ||
+--------------+ +---------+ | ||
\ / | ||
\ / | ||
\ / | ||
v v | ||
+-----------------+ | ||
| Start container | | ||
| with llama.cpp | | ||
| and granite | | ||
| model | | ||
+-----------------+ | ||
``` | ||
|