Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for W4A8KV8/16 and other models #19

Open
KKwanhee opened this issue Sep 30, 2024 · 2 comments
Open

support for W4A8KV8/16 and other models #19

KKwanhee opened this issue Sep 30, 2024 · 2 comments

Comments

@KKwanhee
Copy link

KKwanhee commented Sep 30, 2024

Thank you for making the code publicly available to evaluate various quantization configurations.

I have a few questions:

First, using these codes, is it possible to evaluate W4A8KV8 or W4A8KV16 instead of W4A8KV4 in lmquant/projects/llm/scripts/qoq.sh?

Second, can I run accuracy evaluations for other models like Qwen2.5-32B or Gemma2? If possible, are there default values for alpha and beta of qoq that do not result in significant accuracy loss?

Thanks!

@synxlin
Copy link
Contributor

synxlin commented Nov 8, 2024

Hi,

For your first question, you can evaluate W4A8KV8 or W4A8KV16 by directly settings the dtype parameter for opts field in the configuration (see here).

For your second question, we currently do not have smooth alpha and beta for Qwen2.5 and Gemma2. You may try with the default settings here and here.

@JamesTheZ
Copy link

Hi,

For your first question, you can evaluate W4A8KV8 or W4A8KV16 by directly settings the dtype parameter for opts field in the configuration (see here).

For your second question, we currently do not have smooth alpha and beta for Qwen2.5 and Gemma2. You may try with the default settings here and here.

Thanks @synxlin.

I am still confused about dtype of opts, and enable_attn of smooth. I notice the configuration in smoothquant_dynamic uses sint8 as dtype of opts, but set enable_attn as false. My understand is that the SmoothQuant setting is W8A8 without quantizing KV, not sure if it is right. If so, does it mean by setting enable_attn as false, the quantization of KV will be disabled no matter what the dtype's value is for opts? And dtype of opts only works when enable_attn is true?

I also do not understand what 'opts' is for. Is it only for KV quantization?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants