You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to have Llama-3.1-8B-Instruct quantized with
quant_config = { "zero_point": False, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
But, I found that GEMM has "assert scales is not None and zeros is not None" in the source code.
I guess, No needs to filter out zeros=None for "zero_point":False (absmax quantization).
For resolving this, I let zeros be a torch zero like torch.zeros_like(scale). Then, it seemed successfully AWQed.
However, when I forwarded some prompts, It said stupid.
Even it did not yield numbers in terms of wikitext perplexity.
Could you check this out to be cleared.
Thank you.
The text was updated successfully, but these errors were encountered:
Cornelii
changed the title
zero_point=False in quant_fig dict
"zero_point":False in quant_fig dict
Nov 8, 2024
I've realized that AutoAWQ does not feature absmax quantization yet. It needs additional contribution on Kernel and int4, sort of compression.
absmax quantization can be tested by modifying as follows. (not in terms of speed)
Hello,
I tried to have Llama-3.1-8B-Instruct quantized with
quant_config = { "zero_point": False, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }
But, I found that GEMM has "assert scales is not None and zeros is not None" in the source code.
I guess, No needs to filter out zeros=None for "zero_point":False (absmax quantization).
For resolving this, I let zeros be a torch zero like torch.zeros_like(scale). Then, it seemed successfully AWQed.
However, when I forwarded some prompts, It said stupid.
Even it did not yield numbers in terms of wikitext perplexity.
Could you check this out to be cleared.
Thank you.
The text was updated successfully, but these errors were encountered: