Ollama api implementation for spin
⚠️ Proof of concept: This project is not production ready
- Install spin
- login to fermeyon cloud
spin login
- clone this repository
git clone https://github.com/BLaZeKiLL/Spin-O-Llama.git cd Spin-O-Llama
- build
spin build
- deploy
spin deploy
Routes implemented
-
POST /api/generate
supported request body
{ "model": "<supported-model>", "prompt": "<input prompt>", "system": "<system prompt>", // optional, system prompt "stream": false, // streaming not supported, has no impact "options": { // optional, llm options "num_predict": 128, "temperature": 0.8, "top_p": 0.9, "repeat_penalty": 1.1 } // default values provided above }
response body
{ "model": "<model-id>", "response": "<output>", "done": true }
-
POST /api/embeddings
supported request body
{ "model": "<model-id>", // doesn't matter for now will always use all-minilm-l6-v2 "prompt": "<input>" }
response body
{ "embedding": [<float array>] }
Model compatibility
- generate - llama2-chat, codellama-instruct
- embeddings - all-minilm-l6-v2
Contributions are welcome for further implementation of the Ollama api that is supported on the spin runtime.