support for W4A8KV8/16 and other models #19

KKwanhee · 2024-09-30T14:59:04Z

Thank you for making the code publicly available to evaluate various quantization configurations.

I have a few questions:

First, using these codes, is it possible to evaluate W4A8KV8 or W4A8KV16 instead of W4A8KV4 in lmquant/projects/llm/scripts/qoq.sh?

Second, can I run accuracy evaluations for other models like Qwen2.5-32B or Gemma2? If possible, are there default values for alpha and beta of qoq that do not result in significant accuracy loss?

Thanks!

synxlin · 2024-11-08T02:53:24Z

Hi,

For your first question, you can evaluate W4A8KV8 or W4A8KV16 by directly settings the dtype parameter for opts field in the configuration (see here).

For your second question, we currently do not have smooth alpha and beta for Qwen2.5 and Gemma2. You may try with the default settings here and here.

JamesTheZ · 2024-11-29T12:56:22Z

Hi,

For your first question, you can evaluate W4A8KV8 or W4A8KV16 by directly settings the dtype parameter for opts field in the configuration (see here).

For your second question, we currently do not have smooth alpha and beta for Qwen2.5 and Gemma2. You may try with the default settings here and here.

Thanks @synxlin.

I am still confused about dtype of opts, and enable_attn of smooth. I notice the configuration in smoothquant_dynamic uses sint8 as dtype of opts, but set enable_attn as false. My understand is that the SmoothQuant setting is W8A8 without quantizing KV, not sure if it is right. If so, does it mean by setting enable_attn as false, the quantization of KV will be disabled no matter what the dtype's value is for opts? And dtype of opts only works when enable_attn is true?

I also do not understand what 'opts' is for. Is it only for KV quantization?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support for W4A8KV8/16 and other models #19

support for W4A8KV8/16 and other models #19

KKwanhee commented Sep 30, 2024 •

edited

Loading

synxlin commented Nov 8, 2024

JamesTheZ commented Nov 29, 2024

support for W4A8KV8/16 and other models #19

support for W4A8KV8/16 and other models #19

Comments

KKwanhee commented Sep 30, 2024 • edited Loading

synxlin commented Nov 8, 2024

JamesTheZ commented Nov 29, 2024

KKwanhee commented Sep 30, 2024 •

edited

Loading