Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to save llama2 after SmoothQuant #1600

Open
dellamuradario opened this issue Feb 2, 2024 · 1 comment
Open

Unable to save llama2 after SmoothQuant #1600

dellamuradario opened this issue Feb 2, 2024 · 1 comment
Assignees

Comments

@dellamuradario
Copy link

Hi all,

I'm attempting to follow the SmoothQuant tutorial for the LLAMA2-7b model: [https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/nlp/huggingface_model/text_generation/llama/quantization/ptq_static]

System configuration:
OS : WINDOWS 11
Python: Python 3.10.11

My steps:

  1. CREATE PROJECT FOLDERr: neural-compressor-tutorial
  2. CREATE VIRTUAL ENV: python -m venv neural-compressor-env
  3. DOWNLOAD:d the folder of the guide
  4. RUN: pip install neural-compressor and SKIP_RUNTIME=True pip install -r requirements.txt (successful))
  5. RUN: python prepare_model.py --input_model="meta-llama/Llama-2-7b-chat-hf" --output_model="./llama-2-7b-chat-hf" (successful)
  6. RUN WITH GIT BASH TERMINAL: bash run_quant.sh --input_model=C:/Users/Dario/Downloads/INTEL/neural-compressor-tutorial/llama-2-7b-chat-hf --output_model=C:/Users/Dario/Downloads/INTEL/neural-compressor-tutorial/output_model

TERMINAL LOG - ERROR:

2024-02-02 11:28:30.1017397 [E:onnxruntime:, inference_session.cc:1935 onnxruntime::InferenceSession::Initialize::<lambda_5a23845ba810e30de3b9e7b450415bf5>::operator ()] Exception during initialization: bad allocation 2024-02-02 11:28:30 [ERROR] Unexpected exception RuntimeException('[ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: bad allocation') happened during tuning. Traceback (most recent call last): File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\quantization.py", line 234, in fit strategy.traverse() File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\strategy\auto.py", line 140, in traverse super().traverse() File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\strategy\strategy.py", line 483, in traverse self._setup_pre_tuning_algo_scheduler() File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\strategy\strategy.py", line 361, in _setup_pre_tuning_algo_scheduler self.model = self._pre_tuning_algo_scheduler("pre_quantization") File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\algorithm\algorithm.py", line 127, in __call__ self._q_model = algo(self._origin_model, self._q_model, self._adaptor, self._dataloader, self._calib_iter) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\algorithm\smooth_quant.py", line 89, in __call__ q_model = adaptor.smooth_quant( File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\onnxrt.py", line 228, in smooth_quant self.smooth_quant_model = self.sq.transform(**self.cur_sq_args) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\ox_utils\smooth_quant.py", line 183, in transform self._dump_op_info(percentile, op_types, calib_iter, quantize_config) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\ox_utils\smooth_quant.py", line 395, in _dump_op_info self.max_vals_per_channel, self.shape_info, self.tensors_to_node = augment.calib_smooth( File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\ox_utils\calibration.py", line 774, in calib_smooth _, output_dicts = self.get_intermediate_outputs() File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\ox_utils\calibration.py", line 254, in get_intermediate_outputs else onnxruntime.InferenceSession(self.model_wrapper.model_path + "_augment.onnx", so, providers=[backend]) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: bad allocation 2024-02-02 11:28:36 [ERROR] Specified timeout or max trials is reached! Not found any quantized model which meet accuracy goal. Exit. model: decoder_model.onnx args.output_model: C:/Users/Dario/Downloads/INTEL/neural-compressor-tutorial/output_model Traceback (most recent call last): File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\main.py", line 336, in <module> q_model.save(os.path.join(args.output_model, model)) AttributeError: 'NoneType' object has no attribute 'save'

What could be the solution? Did I miss any crucial steps during the installation or while executing the commands listed above?

Thank you for any suggestions.

@yuwenzho
Copy link
Contributor

RUNTIME_EXCEPTION : Exception during initialization: bad allocation is raised when create InferenceSession. The exception seems to be a memory allocation issue. You can try to track your memory consumption.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants