Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't load a local model with llama.cpp "repo id must be a string" "model path does not exist" #834

Closed
julian-passebecq opened this issue Apr 24, 2024 · 3 comments
Labels
bug llama.cpp Related to the `llama.cpp` integration

Comments

@julian-passebecq
Copy link

Describe the issue as clearly as possible:

i can't load a local model with llama cpp following outlines documentation. The model load well, so llm is charging my local model but the script has an error when using "model".

I tried different syntax nothing is working, also putting the model in the same directory and using llm = Llama("./mistral-7b-instruct-v0.2.Q5_K_M.gguf") model = models.llamacpp(llm) , the model is in the same repertory but i have error
Message=Model path does not exist: ./mistral-7b-instruct-v0.2.Q5_K_M.gguf
Source=D:\LLM\outlines\outlines2.py
StackTrace:
File "D:\LLM\outlines\outlines2.py", line 5, in (Current frame)
llm = Llama("./mistral-7b-instruct-v0.2.Q5_K_M.gguf")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: Model path does not exist: ./mistral-7b-instruct-v0.2.Q5_K_M.gguf

Steps/code to reproduce the bug:

from outlines import models
from llama_cpp import Llama
import os
import json


MODEL_DIR = r"D:\LLM\models\mistral"
model_file = "mistral-7b-instruct-v0.2.Q5_K_M.gguf"
model_path = os.path.join(MODEL_DIR, model_file)

llm= Llama(model_path, n_gpu_layers=33, n_ctx=3584, n_batch=521, verbose=True)
model = models.llamacpp(llm)

Expected result:

model loading with model

Error message:

Repo id must be a string, not <class 'llama_cpp.llama.Llama'>: '<llama_cpp.llama.Llama object at 0x000001A779383470>'.

Outlines/Python version information:

latest version

Context for the issue:

image
image
image
image

@rlouf rlouf added the llama.cpp Related to the `llama.cpp` integration label Apr 24, 2024
@rlouf
Copy link
Member

rlouf commented Apr 24, 2024

Thank you for opening an issue. I am sorry, but the documentation was incorrect. The following code should work:

from outlines import models
from llama_cpp import Llama
import os
import json


MODEL_DIR = r"D:\LLM\models\mistral"
model_file = "mistral-7b-instruct-v0.2.Q5_K_M.gguf"
model_path = os.path.join(MODEL_DIR, model_file)

llm= Llama(model_path, n_gpu_layers=33, n_ctx=3584, n_batch=521, verbose=True)
model = models.LlamaCpp(llm)

I just updated the documentation in #835

@julian-passebecq
Copy link
Author

Thanks so much for quick answer, i confirm that your code works well :) Good continuation

@erlebach
Copy link

The following code crashes on my mac M1:

from outlines import models
from llama_cpp import Llama
import os
import json


model_url = "/Users/erlebach/data/llm_models/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf"

llm= Llama(model_url, n_gpu_layers=0, n_ctx=4096, verbose=False)

I get the error:
llama_new_context_with_model: n_ctx_per_seq (4096) < n_ctx_train (131072) -- the full capacity of the model will not be utilized
ggml_metal_init: skipping kernel_get_rows_bf16 (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_1row (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_f32_l4 (not supported)
ggml_metal_init: skipping kernel_mul_mv_bf16_bf16 (not supported)
ggml_metal_init: skipping kernel_mul_mv_id_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_mul_mm_id_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h64 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h80 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h96 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h112 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_bf16_h256 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h128 (not supported)
ggml_metal_init: skipping kernel_flash_attn_ext_vec_bf16_h256 (not supported)
ggml_metal_init: skipping kernel_cpy_f32_bf16 (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_f32 (not supported)
ggml_metal_init: skipping kernel_cpy_bf16_bf16 (not supported)
Exception ignored in: <function Llama.del at 0x142c40160>
Traceback (most recent call last):
File "/Users/erlebach/src/2024/my_llama_cpp-python/.venv/lib/python3.10/site-packages/llama_cpp/llama.py", line 2201, in del
File "/Users/erlebach/src/2024/my_llama_cpp-python/.venv/lib/python3.10/site-packages/llama_cpp/llama.py", line 2198, in close
File "/Users/erlebach/opt/miniconda3/lib/python3.10/contextlib.py", line 584, in close
File "/Users/erlebach/opt/miniconda3/lib/python3.10/contextlib.py", line 576, in exit
File "/Users/erlebach/opt/miniconda3/lib/python3.10/contextlib.py", line 561, in exit
File "/Users/erlebach/opt/miniconda3/lib/python3.10/contextlib.py", line 340, in exit
File "/Users/erlebach/src/2024/my_llama_cpp-python/.venv/lib/python3.10/site-packages/llama_cpp/_internals.py", line 69, in close
File "/Users/erlebach/opt/miniconda3/lib/python3.10/contextlib.py", line 584, in close
File "/Users/erlebach/opt/miniconda3/lib/python3.10/contextlib.py", line 576, in exit
File "/Users/erlebach/opt/miniconda3/lib/python3.10/contextlib.py", line 561, in exit
File "/Users/erlebach/opt/miniconda3/lib/python3.10/contextlib.py", line 449, in _exit_wrapper
File "/Users/erlebach/src/2024/my_llama_cpp-python/.venv/lib/python3.10/site-packages/llama_cpp/_internals.py", line 63, in free_model
TypeError: 'NoneType' object is not callable

My pyproject.toml (with Poetry) is: 

[tool.poetry]
name = "my-llama-cpp-python"
version = "0.1.0"
description = ""
authors = ["erlebach [email protected]"]
readme = "README.md"

[tool.poetry.dependencies]
python = ">=3.10, <3.13"
outlines = "^0.1.3"
transformers = "^4.46.2"
ipykernel = "^6.29.5"
jupyter = "^1.1.1"
pillow = "^11.0.0"
loadenv = "^0.1.1"
python-dotenv = "^1.0.1"
pydantic = "^2.9.2"
pdf2image = "^1.17.0"
rich = "^13.9.4"
torch = "^2.5.1"
accelerate = "^1.1.1"
huggingface-hub = "^0.26.2"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

llama_cpp_python was installed via pip into the environment. From `pip list`: 

llama_cpp_python 0.3.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug llama.cpp Related to the `llama.cpp` integration
Projects
None yet
Development

No branches or pull requests

3 participants