You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thank you for making the code publicly available to evaluate various quantization configurations.
I have a few questions:
First, using these codes, is it possible to evaluate W4A8KV8 or W4A8KV16 instead of W4A8KV4 in lmquant/projects/llm/scripts/qoq.sh?
Second, can I run accuracy evaluations for other models like Qwen2.5-32B or Gemma2? If possible, are there default values for alpha and beta of qoq that do not result in significant accuracy loss?
Thanks!
The text was updated successfully, but these errors were encountered:
I am still confused about dtype of opts, and enable_attn of smooth. I notice the configuration in smoothquant_dynamic uses sint8 as dtype of opts, but set enable_attn as false. My understand is that the SmoothQuant setting is W8A8 without quantizing KV, not sure if it is right. If so, does it mean by setting enable_attn as false, the quantization of KV will be disabled no matter what the dtype's value is for opts? And dtype of opts only works when enable_attn is true?
I also do not understand what 'opts' is for. Is it only for KV quantization?
Thank you for making the code publicly available to evaluate various quantization configurations.
I have a few questions:
First, using these codes, is it possible to evaluate W4A8KV8 or W4A8KV16 instead of W4A8KV4 in lmquant/projects/llm/scripts/qoq.sh?
Second, can I run accuracy evaluations for other models like Qwen2.5-32B or Gemma2? If possible, are there default values for alpha and beta of qoq that do not result in significant accuracy loss?
Thanks!
The text was updated successfully, but these errors were encountered: