Enable gptqmodel #35012

jiqing-feng · 2024-11-29T07:33:44Z

We are going to replace auto_gptq with gptqmodel. Start with the quantizer check, and also need to change the optimum: huggingface/optimum#2064.

We intended to deprecate AutoGPTQ in this PR, but considering users' behavior, we would like to keep the support for auto_gptq for the next few versions and give a warning for deprecating.

Rocketknight1 · 2024-11-29T13:29:39Z

cc @SunMarc @MekkCyber

Qubitium · 2024-11-29T14:34:28Z

@SunMarc GPTQModel is intended to replace AutoGPTQ entirely due to lack of progress in that repo for many reasons but for the sake of compat, they can co-exist in parallel until this integration is merged, everything is stable/tested, and maybe later we can add init a deprecation plan of AutoGPTQ which is no longer actively developed and/or maintained.

MekkCyber · 2024-11-29T14:45:06Z

Hey @jiqing-feng, thanks for adding gptqmodel LGTM ! Could you update the PR description and title to make them clearer? Thanks 😊

src/transformers/quantizers/quantizer_gptq.py

Signed-off-by: jiqing-feng <[email protected]>

SunMarc

Thanks for this PR. Left a couple of comments. Note that we aslo need to modify the dockerfile for our quantization test if we decide to deprecate auto-gptq + would be nice to include a new version of a colab notebook that works with gptqmodel

SunMarc · 2024-12-02T15:26:50Z

src/transformers/quantizers/quantizer_gptq.py

+        gptq_supports_cpu = (
+            is_auto_gptq_available()
+            and version.parse(importlib.metadata.version("auto-gptq")) > version.parse("0.4.2")
+        ) or is_gptqmodel_available()
        if not gptq_supports_cpu and not torch.cuda.is_available():
            raise RuntimeError("GPU is required to quantize or run quantize model.")
-        elif not (is_optimum_available() and is_auto_gptq_available()):
+        elif not (is_optimum_available() and (is_auto_gptq_available() or is_gptqmodel_available())):
            raise ImportError(
-                "Loading a GPTQ quantized model requires optimum (`pip install optimum`) and auto-gptq library (`pip install auto-gptq`)"
+                "Loading a GPTQ quantized model requires optimum (`pip install optimum`) and auto-gptq or gptqmodel library (`pip install auto-gptq` or `pip install gptqmodel`)"
            )
-        elif version.parse(importlib.metadata.version("auto_gptq")) < version.parse("0.4.2"):
+        elif is_auto_gptq_available() and version.parse(importlib.metadata.version("auto_gptq")) < version.parse(
+            "0.4.2"
+        ):
            raise ImportError(
-                "You need a version of auto_gptq >= 0.4.2 to use GPTQ: `pip install --upgrade auto-gptq`"
+                "You need a version of auto_gptq >= 0.4.2 to use GPTQ: `pip install --upgrade auto-gptq` or use gptqmodel by `pip install gptqmodel`"
            )


can you add a message mentioning that autogptq will be deprecated ? I think we can do two version of transformers from now. For optimum, maybe we can deprecate this a bit later than transformers to make sure that we can still revert if there is a big issue.

SunMarc · 2024-12-02T15:28:10Z

src/transformers/quantizers/quantizer_gptq.py

+        gptq_supports_cpu = (
+            is_auto_gptq_available()
+            and version.parse(importlib.metadata.version("auto-gptq")) > version.parse("0.4.2")
+        ) or is_gptqmodel_available()
        if not gptq_supports_cpu and not torch.cuda.is_available():
            raise RuntimeError("GPU is required to quantize or run quantize model.")
-        elif not (is_optimum_available() and is_auto_gptq_available()):
+        elif not (is_optimum_available() and (is_auto_gptq_available() or is_gptqmodel_available())):
            raise ImportError(
-                "Loading a GPTQ quantized model requires optimum (`pip install optimum`) and auto-gptq library (`pip install auto-gptq`)"
+                "Loading a GPTQ quantized model requires optimum (`pip install optimum`) and auto-gptq or gptqmodel library (`pip install auto-gptq` or `pip install gptqmodel`)"
            )
-        elif version.parse(importlib.metadata.version("auto_gptq")) < version.parse("0.4.2"):
+        elif is_auto_gptq_available() and version.parse(importlib.metadata.version("auto_gptq")) < version.parse(
+            "0.4.2"
+        ):
            raise ImportError(
-                "You need a version of auto_gptq >= 0.4.2 to use GPTQ: `pip install --upgrade auto-gptq`"
+                "You need a version of auto_gptq >= 0.4.2 to use GPTQ: `pip install --upgrade auto-gptq` or use gptqmodel by `pip install gptqmodel`"
            )


Don't forget that the users need to use the latest version from optimum with gptqmodel.

I have limited the optimum and gptqmodel version. The version limitation can be changed after gptqmodel and optimum released.

Qubitium · 2024-12-02T16:05:51Z

@SunMarc Current PR in the current state is not passing our internal tests. @jiqing-feng Will merge some of our changes in that will pass both inference/quant tests. Please delay your review until then since there are substantial changes, relative to the code/PR currently.

* gptqmodel need use checkpoint_format * fix quantize * Update quantization_config.py * Update quantization_config.py * Update quantization_config.py --------- Co-authored-by: ZX-ModelCloud <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]>

* revert quantizer_gptq.py change * pass **kwargs

jiqing-feng · 2024-12-04T07:04:27Z

Testing changes contain:

Refactor: CPU tests do not need @require_torch_gpu
GPTQ lib: @require_gptq means we can run these tests with gptqmodel or auto-gptq
Default model: we default run llama in tests instead of bloom because it's more common.

Signed-off-by: jiqing-feng <[email protected]>

* revert quantizer_gptq.py change * pass **kwargs * add meta info * cleanup * cleanup * Update quantization_config.py * hf_select_quant_linear pass checkpoint_format and meta * fix GPTQTestCUDA * Update test_gptq.py * gptqmodel.hf_select_quant_linear() now does not select ExllamaV2 * cleanup * add backend * cleanup * cleanup * no need check exllama version * Update quantization_config.py * lower checkpoint_format and backend * check none * cleanup * Update quantization_config.py * fix self.use_exllama == False * spell * fix unittest * fix unittest --------- Co-authored-by: LRL <[email protected]> Co-authored-by: Qubitium-ModelCloud <[email protected]>

Qubitium · 2024-12-05T13:37:45Z

@SunMarc Review can start at optimum first. I will write up a detailed explainer at the optimum pr on some obvious small/large change we pushed.

There may be some testing code tweaks but I do not foresee any major changes from this point foreword other than passing flaky tests and/or some testing bugs. Due to optimum having the largest diffs and where most of the gptq quant logic is, we fill first concentrate on making sure optimum is review-cleared, then pef/transformers in that order.

Qubitium · 2025-01-06T01:13:58Z

@ArthurZucker @SunMarc @MekkCyber Anything else that is required of us to move this PR forward? Thanks! We still have a lingering PEFT PR that is contingent on this PR be merged first.

ArthurZucker · 2025-01-08T15:23:13Z

Hey, super sorry reviewing in a bit I hope I was not the blocker !

jiqing-feng · 2025-01-09T08:24:20Z

Hi @SunMarc @ArthurZucker @MekkCyber . The optimum PR has been merged, this PR should be ready to merge.

ArthurZucker

Mostly wondering if it would not make more sense to create a separate backend as we can now treat them as different libs no? 🤗

ArthurZucker · 2025-01-09T09:01:46Z

docs/source/en/quantization/gptq.md

+* Model support: GPTQModel continues to support all of the latest released LLM models.
+* Multi-Modal support: GPTQModel supports accurate quantization of Qwen 2-VL and Ovis 1.6-VL image-to-text models. 
+* Platform support: Validated MacOS Apple Silicon and Windows 11 support.
+* Hardware support: Apple Silicon M1+, Intel/AMD CPU, and Intel Datacenter Max + Arc GPUs.
+* Asymmetric support: Asymmetric quantization can potentially introduce lower quantization errors compared to symmetric quantization. However, it is not backward compatible with AutoGPTQ, and not all kernels, such as Marlin, support asymmetric quantization.
+* IPEX kernel for Intel/AMD accelerated CPU and Intel GPU (Datacenter Max + ARc) support.
+* Updated Marlin kernel from Neural Magic that is optimized for A100 (Ampere)
+* Updated Kernels with auto-padding for legacy model support and models with non-uniform in/out-features. 
+* Faster quantization, lower memory usage, and more accurate default quantization via GPTQModel quantization apis.
+* User and developer friendly apis. 


jiqing-feng · 2025-01-09T09:09:41Z

Mostly wondering if it would not make more sense to create a separate backend as we can now treat them as different libs no? 🤗

For docs, we'd better put autogptq and gptqmodel in the same section because autogptq is no longer maintained, and we might deprecate autogptq in the future if it is incompatible.

Qubitium · 2025-01-09T14:31:45Z

Mostly wondering if it would not make more sense to create a separate backend as we can now treat them as different libs no? 🤗

To add to what @jiqing-feng mentioned. For this PR, backward compat and minimal code change on transformer/optimum/peft is target.

Deprecation is fully planned for autogptq with good reason.

Looking forward, long term, there is no reason to keep autogptq or to spend time doing arch and split hf gptq into two backends. The autogptq core maintainer has been mia, literally unreachable by anyone including @fxmarty (second maintainer) who did almost all the work in 2024 (until he ran out of spare time to work on it) when the project was active until I was invited by fxmarty to help as 3rd part-time restricted maintainer but I later decided to make GPTQModel instead in my vision unburdened by legacy api and what I considered questionable foundation code.

GPTQModel runs 100% ci feature and model coverage for each release. Autogptq has no ci. We have bugs too, but there are so many hidden bugs in autogptq we fixed we have lost count.

Transformers/Optimum/Peft only use the kernel part of autogptq code base so the full short/long term problems there are not visible here.

stevhliu

Thanks for updating!

docs/source/en/quantization/gptq.md

docs/source/en/quantization/overview.md

Co-authored-by: Steven Liu <[email protected]>

Qubitium · 2025-01-10T01:47:55Z

Thanks for updating!

@stevhliu Thanks for the doc/text corrections.

jiqing-feng · 2025-01-10T06:10:02Z

I suppose this PR is ready to be merged as we got enough approvals, please let me know if anything I need to change.

jiqing-feng · 2025-01-14T00:52:19Z

Hi @ArthurZucker @Rocketknight1 . Pls let me know if there is anything I need to change before merging. Thx!

Rocketknight1 · 2025-01-14T14:42:26Z

@SunMarc @MekkCyber I think you have the ultimate authority here. Take one last look and feel free to merge it if you're happy!

SunMarc · 2025-01-15T13:22:42Z

Sounds good ! I'll merge it then ! cc @MekkCyber for visibility

jiqing-feng mentioned this pull request Nov 29, 2024

transformers ModelCloud/GPTQModel#713

Closed

MekkCyber reviewed Nov 29, 2024

View reviewed changes

src/transformers/quantizers/quantizer_gptq.py Outdated Show resolved Hide resolved

MekkCyber requested a review from SunMarc November 29, 2024 14:48

jiqing-feng added 2 commits November 29, 2024 15:22

gptqmodel

4c567b3

Signed-off-by: jiqing-feng <[email protected]>

fix format

1d8f83e

Signed-off-by: jiqing-feng <[email protected]>

jiqing-feng marked this pull request as ready for review December 2, 2024 05:14

jiqing-feng changed the title ~~gptqmodel~~ Enable gptqmodel Dec 2, 2024

jiqing-feng marked this pull request as draft December 2, 2024 09:11

jiqing-feng added 2 commits December 2, 2024 13:05

update readme

9f44604

Signed-off-by: jiqing-feng <[email protected]>

Merge branch 'main' into gptq

62cd0dd

SunMarc reviewed Dec 2, 2024

View reviewed changes

jiqing-feng mentioned this pull request Dec 3, 2024

[INTEGRATION] Add GPTQModel support into transformers + optimum + peft ModelCloud/GPTQModel#729

Open

6 tasks

BenjaminBossan mentioned this pull request Dec 3, 2024

add gptqmodel support huggingface/peft#2247

Open

LRL-ModelCloud and others added 2 commits December 4, 2024 08:55

Revert quantizer_gptq.py (#2)

ef0fb56

* revert quantizer_gptq.py change * pass **kwargs

Merge branch 'main' into gptq

0191322

jiqing-feng and others added 8 commits December 4, 2024 10:32

limit gptqmodel and optimum version

0655960

Signed-off-by: jiqing-feng <[email protected]>

fix format

be914ea

Signed-off-by: jiqing-feng <[email protected]>

fix warning

aa9a5c6

Signed-off-by: jiqing-feng <[email protected]>

fix version check

a4bc251

Signed-off-by: jiqing-feng <[email protected]>

revert unrelated changes

9ae979b

Signed-off-by: jiqing-feng <[email protected]>

enable gptqmodel tests

a73a8c2

Signed-off-by: jiqing-feng <[email protected]>

fix requires gptq

c18a5f1

Signed-off-by: jiqing-feng <[email protected]>

Merge branch 'main' into gptq

2234122

Merge branch 'main' into gptq

d07ed96

jiqing-feng requested review from Rocketknight1 and stevhliu as code owners January 9, 2025 08:23

ArthurZucker reviewed Jan 9, 2025

View reviewed changes

Merge branch 'main' into gptq

a20dfd3

doc rocm support

91d12cc

stevhliu approved these changes Jan 9, 2025

View reviewed changes

Qubitium and others added 6 commits January 10, 2025 09:45

Update docs/source/en/quantization/gptq.md

1ec6fe7

Co-authored-by: Steven Liu <[email protected]>

Update docs/source/en/quantization/gptq.md

7d2b708

Co-authored-by: Steven Liu <[email protected]>

Update docs/source/en/quantization/gptq.md

8c2a8b3

Co-authored-by: Steven Liu <[email protected]>

Update docs/source/en/quantization/gptq.md

053e0ad

Co-authored-by: Steven Liu <[email protected]>

Update docs/source/en/quantization/overview.md

d3bfbb0

Co-authored-by: Steven Liu <[email protected]>

Update docs/source/en/quantization/overview.md

1d883ec

Co-authored-by: Steven Liu <[email protected]>

Merge branch 'main' into gptq

2806f71

SunMarc requested a review from ArthurZucker January 10, 2025 09:39

jiqing-feng added 2 commits January 10, 2025 17:41

Merge branch 'main' into gptq

25169bd

Merge branch 'main' into gptq

5ea104a

SunMarc merged commit 387663e into huggingface:main Jan 15, 2025
25 checks passed

Qubitium deleted the gptq branch January 15, 2025 16:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable gptqmodel #35012

Enable gptqmodel #35012

jiqing-feng commented Nov 29, 2024 •

edited

Loading

Rocketknight1 commented Nov 29, 2024

Qubitium commented Nov 29, 2024

MekkCyber commented Nov 29, 2024

SunMarc left a comment •

edited

Loading

SunMarc Dec 2, 2024

jiqing-feng Dec 4, 2024

SunMarc Dec 2, 2024

jiqing-feng Dec 4, 2024

Qubitium commented Dec 2, 2024

jiqing-feng commented Dec 4, 2024

Qubitium commented Dec 5, 2024

Qubitium commented Jan 6, 2025

ArthurZucker commented Jan 8, 2025

jiqing-feng commented Jan 9, 2025

ArthurZucker left a comment

ArthurZucker Jan 9, 2025

jiqing-feng commented Jan 9, 2025 •

edited

Loading

Qubitium commented Jan 9, 2025 •

edited

Loading

stevhliu left a comment

Qubitium commented Jan 10, 2025

jiqing-feng commented Jan 10, 2025

jiqing-feng commented Jan 14, 2025

Rocketknight1 commented Jan 14, 2025

SunMarc commented Jan 15, 2025

Enable gptqmodel #35012

Enable gptqmodel #35012

Conversation

jiqing-feng commented Nov 29, 2024 • edited Loading

Rocketknight1 commented Nov 29, 2024

Qubitium commented Nov 29, 2024

MekkCyber commented Nov 29, 2024

SunMarc left a comment • edited Loading

Choose a reason for hiding this comment

SunMarc Dec 2, 2024

Choose a reason for hiding this comment

jiqing-feng Dec 4, 2024

Choose a reason for hiding this comment

SunMarc Dec 2, 2024

Choose a reason for hiding this comment

jiqing-feng Dec 4, 2024

Choose a reason for hiding this comment

Qubitium commented Dec 2, 2024

jiqing-feng commented Dec 4, 2024

Qubitium commented Dec 5, 2024

Qubitium commented Jan 6, 2025

ArthurZucker commented Jan 8, 2025

jiqing-feng commented Jan 9, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

ArthurZucker Jan 9, 2025

Choose a reason for hiding this comment

jiqing-feng commented Jan 9, 2025 • edited Loading

Qubitium commented Jan 9, 2025 • edited Loading

stevhliu left a comment

Choose a reason for hiding this comment

Qubitium commented Jan 10, 2025

jiqing-feng commented Jan 10, 2025

jiqing-feng commented Jan 14, 2025

Rocketknight1 commented Jan 14, 2025

SunMarc commented Jan 15, 2025

jiqing-feng commented Nov 29, 2024 •

edited

Loading

SunMarc left a comment •

edited

Loading

jiqing-feng commented Jan 9, 2025 •

edited

Loading

Qubitium commented Jan 9, 2025 •

edited

Loading