-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add a model_server example podman-llm #649
base: main
Are you sure you want to change the base?
Conversation
Does this tool have an upstream? Where is the REPO? Not sure I love the name. |
Yeah, happy to rename, just needed to name it something: https://github.com/ericcurtin/podman-llm Could be llmc, llm-container, llm-oci, podllm? I really don't mind It requires a couple of small patches to llama.cpp also, but nothing major: |
I was working with Ollama, but I worry about the long-term future there as regards external contributions: https://github.com/ollama/ollama/pulls/ericcurtin I fixed a lot of issues around OSTree-based OSes, podman support, Fedora support in general... But I just don't think Ollama folk are genuinely interested in external contributions (they weren't complex reviews). So then I removed the middle component Ollama itself, since Ollama is a llama.cpp wrapper. So this uses llama.cpp direct pretty much, it kinda shows that the Ollama layer actually isn't doing a whole pile. What I really liked about Ollama is it simplified running LLMs to:
so that's what I was going for here. I think creating an Ollama clone that's built directly against llama.cpp library could do very well. And this is daemon-less (unlike Ollama), no client, servers, etc. unless you want to serve, it's zippier as a result. |
This review in particular was super easy and would give rpm-ostree/bootc OS support: |
There's obvious overlap with instructlab... This is like containerized, daemonless, simplified instructlab for dummies, kinda like Ollama |
If we felt this idea was worth pursing there would probably be plenty of breaking changes to go. Some ideas we were thinking about GGUF's would be delivered as single file "FROM scratch" images (in it's own gguf container store, to be used with podman-llm:41, podman-llm-amd:41 or podman-llm-nvidia:41 container images). So every "podman-llm run/serve" invocation is made up of some container image runtime (AMD, Nvidia, CPU, etc.) and a .gguf file which is delivered as a separate container image or downloaded from hugging face. It's like Ollama with no custom "Modelfile" syntax (I think standard containerfiles are better) and no special OCI format, a .gguf is just a .gguf from a container image (or hugging face direct) Some name change like @rhatdan is proposing to whatever people thinks sounds cool :) But this is just a 20% project for me so would like to get people's opinions on if something like this is worthwhile, etc. |
e6ed9c5
to
a39f1ee
Compare
Normally, I would not be in favor of including contributions that solely document integration with external software that is not essential to or used in the recipes. However there does seem to be some alignment here around |
bootc is pretty useful for AI use-cases, even for having the Nvidia dependencies pre-installed alone which are not always trivial to install in a deployment. podman-llm (to be renamed) would work within a bootc image, or a non-bootc image for that matter. The only real dependency it has is that podman (or docker) is installed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets go with this for now.
@tumido 's feedback could also be interesting, looking at upcoming devconf.us talks, he is speaking about: "Store AI/ML models efficiently with OCI Artifacts" which is one of the things I am trying to do here, maybe we can combine efforts :) I played around with a couple of ideas with different pros/cons, podman volumes, FROM scratch images, just simple container image inheritance. Right now it's a bind mounted directory ($HOME/.cache/huggingface/) to share .gguf's between multiple images. @tumido I bet has some interesting ideas :) |
997c8e0
to
432e91b
Compare
Updated README.md diagram to highlight the value of pulling different runtimes:
|
This is a tool that was written to be as simple as ollama, in it's simplest form it's: podman-llm run granite Signed-off-by: Eric Curtin <[email protected]>
This is a tool that was written to be as simple as ollama, in it's simplest form it's:
podman-llm run granite