"zero_point":False in quant_fig dict #643

Cornelii · 2024-11-08T11:33:04Z

Hello,

I tried to have Llama-3.1-8B-Instruct quantized with
quant_config = { "zero_point": False, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

But, I found that GEMM has "assert scales is not None and zeros is not None" in the source code.

I guess, No needs to filter out zeros=None for "zero_point":False (absmax quantization).

For resolving this, I let zeros be a torch zero like torch.zeros_like(scale). Then, it seemed successfully AWQed.
However, when I forwarded some prompts, It said stupid.
Even it did not yield numbers in terms of wikitext perplexity.

Could you check this out to be cleared.

Thank you.

Cornelii · 2024-11-11T04:59:14Z

I've realized that AutoAWQ does not feature absmax quantization yet. It needs additional contribution on Kernel and int4, sort of compression.
absmax quantization can be tested by modifying as follows. (not in terms of speed)

pseudo_quantize_tensor (quantizer.py)

else:
max_val = w.abs().amax(dim=1, keepdim=True)
max_val = max_val.clamp(min=1e-5)
max_int = 2 ** (self.w_bit - 1) - 1
min_int = -(2 ** (self.w_bit - 1))
scales = max_val / max_int
#w = torch.clamp(torch.round(w / scales), min_int, max_int) * scales

        zeros = -1*min_int*torch.ones_like(scales)
        w = (
            torch.clamp(torch.round(w / scales) + zeros, min_int-min_int, max_int-min_int) - zeros
        ) * scales
        zeros = zeros.view(org_w_shape[0], -1)

Cornelii changed the title ~~zero_point=False in quant_fig dict~~ "zero_point":False in quant_fig dict Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"zero_point":False in quant_fig dict #643

"zero_point":False in quant_fig dict #643

Cornelii commented Nov 8, 2024 •

edited

Loading

Cornelii commented Nov 11, 2024 •

edited

Loading

"zero_point":False in quant_fig dict #643

"zero_point":False in quant_fig dict #643

Comments

Cornelii commented Nov 8, 2024 • edited Loading

Cornelii commented Nov 11, 2024 • edited Loading

Cornelii commented Nov 8, 2024 •

edited

Loading

Cornelii commented Nov 11, 2024 •

edited

Loading