You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm attempting to follow the SmoothQuant tutorial for the LLAMA2-7b model: [https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/nlp/huggingface_model/text_generation/llama/quantization/ptq_static]
System configuration:
OS : WINDOWS 11
Python: Python 3.10.11
RUN WITH GIT BASH TERMINAL: bash run_quant.sh --input_model=C:/Users/Dario/Downloads/INTEL/neural-compressor-tutorial/llama-2-7b-chat-hf --output_model=C:/Users/Dario/Downloads/INTEL/neural-compressor-tutorial/output_model
TERMINAL LOG - ERROR:
2024-02-02 11:28:30.1017397 [E:onnxruntime:, inference_session.cc:1935 onnxruntime::InferenceSession::Initialize::<lambda_5a23845ba810e30de3b9e7b450415bf5>::operator ()] Exception during initialization: bad allocation 2024-02-02 11:28:30 [ERROR] Unexpected exception RuntimeException('[ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: bad allocation') happened during tuning. Traceback (most recent call last): File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\quantization.py", line 234, in fit strategy.traverse() File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\strategy\auto.py", line 140, in traverse super().traverse() File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\strategy\strategy.py", line 483, in traverse self._setup_pre_tuning_algo_scheduler() File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\strategy\strategy.py", line 361, in _setup_pre_tuning_algo_scheduler self.model = self._pre_tuning_algo_scheduler("pre_quantization") File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\algorithm\algorithm.py", line 127, in __call__ self._q_model = algo(self._origin_model, self._q_model, self._adaptor, self._dataloader, self._calib_iter) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\algorithm\smooth_quant.py", line 89, in __call__ q_model = adaptor.smooth_quant( File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\onnxrt.py", line 228, in smooth_quant self.smooth_quant_model = self.sq.transform(**self.cur_sq_args) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\ox_utils\smooth_quant.py", line 183, in transform self._dump_op_info(percentile, op_types, calib_iter, quantize_config) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\ox_utils\smooth_quant.py", line 395, in _dump_op_info self.max_vals_per_channel, self.shape_info, self.tensors_to_node = augment.calib_smooth( File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\ox_utils\calibration.py", line 774, in calib_smooth _, output_dicts = self.get_intermediate_outputs() File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\ox_utils\calibration.py", line 254, in get_intermediate_outputs else onnxruntime.InferenceSession(self.model_wrapper.model_path + "_augment.onnx", so, providers=[backend]) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: bad allocation 2024-02-02 11:28:36 [ERROR] Specified timeout or max trials is reached! Not found any quantized model which meet accuracy goal. Exit. model: decoder_model.onnx args.output_model: C:/Users/Dario/Downloads/INTEL/neural-compressor-tutorial/output_model Traceback (most recent call last): File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\main.py", line 336, in <module> q_model.save(os.path.join(args.output_model, model)) AttributeError: 'NoneType' object has no attribute 'save'
What could be the solution? Did I miss any crucial steps during the installation or while executing the commands listed above?
Thank you for any suggestions.
The text was updated successfully, but these errors were encountered:
RUNTIME_EXCEPTION : Exception during initialization: bad allocation is raised when create InferenceSession. The exception seems to be a memory allocation issue. You can try to track your memory consumption.
Hi all,
I'm attempting to follow the SmoothQuant tutorial for the LLAMA2-7b model: [https://github.com/intel/neural-compressor/tree/master/examples/onnxrt/nlp/huggingface_model/text_generation/llama/quantization/ptq_static]
System configuration:
OS : WINDOWS 11
Python: Python 3.10.11
My steps:
TERMINAL LOG - ERROR:
2024-02-02 11:28:30.1017397 [E:onnxruntime:, inference_session.cc:1935 onnxruntime::InferenceSession::Initialize::<lambda_5a23845ba810e30de3b9e7b450415bf5>::operator ()] Exception during initialization: bad allocation 2024-02-02 11:28:30 [ERROR] Unexpected exception RuntimeException('[ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: bad allocation') happened during tuning. Traceback (most recent call last): File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\quantization.py", line 234, in fit strategy.traverse() File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\strategy\auto.py", line 140, in traverse super().traverse() File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\strategy\strategy.py", line 483, in traverse self._setup_pre_tuning_algo_scheduler() File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\strategy\strategy.py", line 361, in _setup_pre_tuning_algo_scheduler self.model = self._pre_tuning_algo_scheduler("pre_quantization") File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\algorithm\algorithm.py", line 127, in __call__ self._q_model = algo(self._origin_model, self._q_model, self._adaptor, self._dataloader, self._calib_iter) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\algorithm\smooth_quant.py", line 89, in __call__ q_model = adaptor.smooth_quant( File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\onnxrt.py", line 228, in smooth_quant self.smooth_quant_model = self.sq.transform(**self.cur_sq_args) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\ox_utils\smooth_quant.py", line 183, in transform self._dump_op_info(percentile, op_types, calib_iter, quantize_config) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\ox_utils\smooth_quant.py", line 395, in _dump_op_info self.max_vals_per_channel, self.shape_info, self.tensors_to_node = augment.calib_smooth( File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\ox_utils\calibration.py", line 774, in calib_smooth _, output_dicts = self.get_intermediate_outputs() File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\neural_compressor\adaptor\ox_utils\calibration.py", line 254, in get_intermediate_outputs else onnxruntime.InferenceSession(self.model_wrapper.model_path + "_augment.onnx", so, providers=[backend]) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 419, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\neural-compressor-env\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 483, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Exception during initialization: bad allocation 2024-02-02 11:28:36 [ERROR] Specified timeout or max trials is reached! Not found any quantized model which meet accuracy goal. Exit. model: decoder_model.onnx args.output_model: C:/Users/Dario/Downloads/INTEL/neural-compressor-tutorial/output_model Traceback (most recent call last): File "C:\Users\Dario\Downloads\INTEL\neural-compressor-tutorial\main.py", line 336, in <module> q_model.save(os.path.join(args.output_model, model)) AttributeError: 'NoneType' object has no attribute 'save'
What could be the solution? Did I miss any crucial steps during the installation or while executing the commands listed above?
Thank you for any suggestions.
The text was updated successfully, but these errors were encountered: