Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama3.1 quantization #687

Open
sunjianxide opened this issue Dec 31, 2024 · 0 comments
Open

llama3.1 quantization #687

sunjianxide opened this issue Dec 31, 2024 · 0 comments

Comments

@sunjianxide
Copy link

I meet a problem,before I convert llama 3.1 model ,its tokenzier.json from left to right .
why?
"éĶ ¦" -> ["éĶ", "¦"] this will cause a rust problem

thread '' panicked at src/lib.rs:22:50:
called Result::unwrap() on an Err value: Error("data did not match any variant of untagged enum ModelWrapper", line: 1251004, column: 1)
stack backtrace:
0: rust_begin_unwind
at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/std/src/panicking.rs:647:5
1: core::panicking::panic_fmt
at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/panicking.rs:72:14
2: core::result::unwrap_failed
at /rustc/7cf61ebde7b22796c69757901dd346d0fe70bd97/library/core/src/result.rs:1649:5
3: tokenizers_new_from_str
4: _ZN10tokenizers9Tokenizer12FromBlobJSONERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
5: _ZN9xcore_llm12BatchManager13initTokenizerEv
at ./src/engine/batch_manager.cpp:270:53
6: _ZN9xcore_llm12BatchManager4initESt10shared_ptrINS_7SamplerEE
at ./src/engine/batch_manager.cpp:285:49
7: _ZN9xcore_llm9LLMEngine4initEv
at ./src/engine/engine.cpp:117:47
8: _ZN9xcore_llm12EnginePyBind4initEv
at ./src/pybind/engine_bind.cpp:16:23
9: ZZN8pybind1112cpp_functionC4IbN9xcore_llm12EnginePyBindEJEJNS_4nameENS_9is_methodENS_7siblingEEEEMT0_FT_DpT1_EDpRKT2_ENKUlPS3_E_clESH
at /root/.conan2/p/pybind0fa8fad1e2b0/p/include/pybind11/pybind11.h:111:66
10: ZNO8pybind116detail15argument_loaderIJPN9xcore_llm12EnginePyBindEEE9call_implIbRZNS_12cpp_functionC4IbS3_JEJNS_4nameENS_9is_methodENS_7siblingEEEEMT0_FT_DpT1_EDpRKT2_EUlS4_E_JLm0EENS0_9void_typeEEESD_OSC_St16integer_sequenceImJXspT1_EEEOT2
at /root/.conan2/p/pybind0fa8fad1e2b0/p/include/pybind11/cast.h:1480:37
11: ZNO8pybind116detail15argument_loaderIJPN9xcore_llm12EnginePyBindEEE4callIbNS0_9void_typeERZNS_12cpp_functionC4IbS3_JEJNS_4nameENS_9is_methodENS_7siblingEEEEMT0_FT_DpT1_EDpRKT2_EUlS4_E_EENSt9enable_ifIXntsrSt7is_voidISE_E5valueESE_E4typeEOT1
at /root/.conan2/p/pybind0fa8fad1e2b0/p/include/pybind11/cast.h:1448:72
12: ZZN8pybind1112cpp_function10initializeIZNS0_C4IbN9xcore_llm12EnginePyBindEJEJNS_4nameENS_9is_methodENS_7siblingEEEEMT0_FT_DpT1_EDpRKT2_EUlPS4_E_bJSI_EJS5_S6_S7_EEEvOS9_PFS8_SB_ESH_ENKUlRNS_6detail13function_callEE1_clESP
at /root/.conan2/p/pybind0fa8fad1e2b0/p/include/pybind11/pybind11.h:254:75
13: ZZN8pybind1112cpp_function10initializeIZNS0_C4IbN9xcore_llm12EnginePyBindEJEJNS_4nameENS_9is_methodENS_7siblingEEEEMT0_FT_DpT1_EDpRKT2_EUlPS4_E_bJSI_EJS5_S6_S7_EEEvOS9_PFS8_SB_ESH_ENUlRNS_6detail13function_callEE1_4_FUNESP
at /root/.conan2/p/pybind0fa8fad1e2b0/p/include/pybind11/pybind11.h:224:21
14: ZN8pybind1112cpp_function10dispatcherEP7_objectS2_S2
at /root/.conan2/p/pybind0fa8fad1e2b0/p/include/pybind11/pybind11.h:946:35
15:
16: _PyObject_MakeTpCall
17:
18: _PyEval_EvalFrameDefault
19:
20: PyObject_Call
21: _PyEval_EvalFrameDefault
22: _PyFunction_Vectorcall
23: _PyObject_FastCallDictTstate
24:
25:
26: PyObject_Call
27: _PyEval_EvalFrameDefault
28:
29: _PyEval_EvalFrameDefault
30:
31: PyEval_EvalCode
32:
33:
34:
35: _PyRun_SimpleFileObject
36: _PyRun_AnyFileObject
37: Py_RunMain
38: Py_BytesMain
39: __libc_start_call_main
at ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16
40: __libc_start_main_impl
at ./csu/../csu/libc-start.c:392:3
41: _start
note: Some details are omitted, run with RUST_BACKTRACE=full for a verbose backtrace.
fatal runtime error: Rust panics must be rethrown

this is my convert script:

Adapted from

https://github.com/casper-hansen/AutoAWQ/blob/main/examples/quantize.py

under MIT license

from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer

import argparse
parser = argparse.ArgumentParser(description="auto-awq")
parser.add_argument("-i", "--in-path", type=str, help="input model path")
parser.add_argument("-o", "--out-path", type=str, help="output path")

args = parser.parse_args()

model_path = args.in_path
quant_path = args.out_path
quant_config = { "zero_point": True, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

Load model

model = AutoAWQForCausalLM.from_pretrained(model_path, **{"low_cpu_mem_usage": True, "use_cache": False})
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

Quantize

model.quantize(tokenizer, quant_config=quant_config)

Save quantized model

model.save_quantized(quant_path)
tokenizer.save_pretrained(quant_path)

transformers = 4.47.1
autoawq = 0.2.7.post3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant