Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error on 13B Model #8

Open
DARDORKE opened this issue Apr 9, 2023 · 4 comments
Open

Error on 13B Model #8

DARDORKE opened this issue Apr 9, 2023 · 4 comments

Comments

@DARDORKE
Copy link

DARDORKE commented Apr 9, 2023

Hi !

I quantized the 13B model, I got a 15,15Go file. But I got an error when I try to mount it with ./main

main: seed = 1681070012 llama_model_load: loading model from '/content/vigogne/llama.cpp/models/13B/ggml-model-q4_0.bin' - please wait ... llama_model_load: n_vocab = 32000 llama_model_load: n_ctx = 2048 llama_model_load: n_embd = 5120 llama_model_load: n_mult = 256 llama_model_load: n_head = 40 llama_model_load: n_layer = 40 llama_model_load: n_rot = 128 llama_model_load: f16 = 2 llama_model_load: n_ff = 13824 llama_model_load: n_parts = 2 llama_model_load: type = 2 llama_model_load: ggml map size = 15517.64 MB llama_model_load: ggml ctx size = 101.25 KB llama_model_load: mem required = 17565.74 MB (+ 1608.00 MB per state) llama_model_load: loading tensors from '/content/vigogne/llama.cpp/models/13B/ggml-model-q4_0.bin' llama_model_load: tensor 'tok_embeddings.weight' has wrong size in model file llama_init_from_file: failed to load model main: error: failed to load model '/content/vigogne/llama.cpp/models/13B/ggml-model-q4_0.bin'

I think the problem comes from the tokenizer.model file. Where may I find the files corresponding with 13b model ?

Thank you !

@cmhamiche
Copy link

Same I was searching for the 13B tokenizer.model file to quantize with GPTQ-for-llama

@bofenghuang
Copy link
Owner

Hi @DARDORKE @cmhamiche,

The 7B and 13B models should have the same tokenizer.model file. You could check huggyllama/llama-7b and huggyllama/llama-13b.

I have the following quantized files working on my PC.

models
├── 13B
│   ├── consolidated.00.pth
│   ├── ggml-model-f16.bin
│   ├── ggml-model-q4_0.bin
│   └── params.json
├── 7B
│   ├── consolidated.00.pth
│   ├── ggml-model-f16.bin
│   ├── ggml-model-q4_0.bin
│   └── params.json
└── tokenizer.model

PS: Your model file of 15.15 GB is a little strange. According to this section, the quantized 4-bits file of the 13B model should have a size of around 7.x GB.

@DARDORKE
Copy link
Author

Nice thx mate !

@cmhamiche
Copy link

I merged and quantized the 13b model. cmh/vigogne-13b-4bit-32g-triton
Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants