Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"zero_point":False in quant_fig dict #643

Open
Cornelii opened this issue Nov 8, 2024 · 1 comment
Open

"zero_point":False in quant_fig dict #643

Cornelii opened this issue Nov 8, 2024 · 1 comment

Comments

@Cornelii
Copy link

Cornelii commented Nov 8, 2024

Hello,

I tried to have Llama-3.1-8B-Instruct quantized with
quant_config = { "zero_point": False, "q_group_size": 128, "w_bit": 4, "version": "GEMM" }

But, I found that GEMM has "assert scales is not None and zeros is not None" in the source code.

I guess, No needs to filter out zeros=None for "zero_point":False (absmax quantization).

For resolving this, I let zeros be a torch zero like torch.zeros_like(scale). Then, it seemed successfully AWQed.
However, when I forwarded some prompts, It said stupid.
Even it did not yield numbers in terms of wikitext perplexity.

Could you check this out to be cleared.

Thank you.

@Cornelii Cornelii changed the title zero_point=False in quant_fig dict "zero_point":False in quant_fig dict Nov 8, 2024
@Cornelii
Copy link
Author

Cornelii commented Nov 11, 2024

I've realized that AutoAWQ does not feature absmax quantization yet. It needs additional contribution on Kernel and int4, sort of compression.
absmax quantization can be tested by modifying as follows. (not in terms of speed)

pseudo_quantize_tensor (quantizer.py)

else:
max_val = w.abs().amax(dim=1, keepdim=True)
max_val = max_val.clamp(min=1e-5)
max_int = 2 ** (self.w_bit - 1) - 1
min_int = -(2 ** (self.w_bit - 1))
scales = max_val / max_int
#w = torch.clamp(torch.round(w / scales), min_int, max_int) * scales

        zeros = -1*min_int*torch.ones_like(scales)
        w = (
            torch.clamp(torch.round(w / scales) + zeros, min_int-min_int, max_int-min_int) - zeros
        ) * scales
        zeros = zeros.view(org_w_shape[0], -1)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant