Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError of act_observer when using SmoothQuant for Llama-13b #2033

Open
kyang-06 opened this issue Oct 16, 2024 · 0 comments
Open

AssertionError of act_observer when using SmoothQuant for Llama-13b #2033

kyang-06 opened this issue Oct 16, 2024 · 0 comments

Comments

@kyang-06
Copy link

When I tried smoothquant with sample code clip

from neural_compressor.torch.quantization import SmoothQuantConfig, convert, prepare
def run_fn(model):
    model(example_inputs)
quant_config = SmoothQuantConfig(alpha=0.5)
prepared_model = prepare(fp32_model, quant_config=quant_config, example_inputs=example_inputs)
run_fn(prepared_model)
q_model = convert(prepared_model)

I got the error

AssertionError                            Traceback (most recent call last)
Cell In[7], line 11
      9 quant_config = SmoothQuantConfig(alpha=0.5)
     10 print(quant_config)
---> 11 prepared_model = prepare(model, quant_config=quant_config, example_inputs=example_prompts)
     12 run_fn(prepared_model)
     13 q_model = convert(prepared_model)
...
...
File ~/anaconda3/envs/intel-arc-py39/lib/python3.9/site-packages/intel_extension_for_pytorch/quantization/_smooth_quant.py:85, in SmoothQuantActivationObserver.__init__(self, act_observer, act_ic_observer, smooth_quant_enabled, dtype, qscheme, reduce_range, quant_min, quant_max, alpha, factory_kwargs, eps)
     75     self.act_obs = HistogramObserver(
     76         dtype=dtype,
     77         qscheme=qscheme,
   (...)
     82         eps=eps,
     83     )
     84 else:
---> 85     assert isinstance(act_observer, UniformQuantizationObserverBase), 'act_observer:' + str(act_observer)
     86     self.act_obs = act_observer
     87 # if smooth_quant_enabled is false, this observer acts as
     88 # a normal per-tensor observer

AssertionError: act_observer:<class 'torch.ao.quantization.observer.MinMaxObserver'>

Below is my env

torch                            2.1.0a0+cxx11.abi
neural_compressor                3.0.2
neural_compressor_3x_pt          2.6
intel-extension-for-pytorch      2.1.10+xpu
intel-extension-for-transformers 1.2.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant