-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[INTEGRATION] Add GPTQModel support into transformers + optimum + peft #729
Comments
Sounds good to me @jiqing-feng ! Also, after discussing a bit with other members at HF, I think it is better for now to put any mention of deprecating AutoGPTQ for now. If both libraries are installed, we can use GPTQModel library and put a clear warning sign. Thanks for working on this ! |
@jiqing-feng Internal GPTQModel code refractor to support hf/optimum passing internal tests. Transformer/Optimum PRs needs the following merges: @SunMarc We will do more testing tomorrow and let you know when everything is kosher so you can do review on proper code that is passing all tests. Update: above 2xPR both Num.2 for transformer/optimum have been merged. |
Status update. We have started testing the above Transformer/Optimum/Peft tests under Nvidia/cuda GPU using gptqmodel[main] + the following PRs: https://github.com/jiqing-feng/transformers/pull/3/files |
@jiqing-feng @SunMarc One issue we found is all the gptq tests in optimum are super flaky not only between transformer/torch version but also between gptqmodel vs auto-gptq. In gptqmodel we moved away from string compare and directly do eval harness benchmarks and check for regression floor value on a fixed benchmark. This is not going to be addressed in this PR but needs to addressed in future PRs. |
Update: All 3 pending (not-yet ready) PRs we will are testing for @jiqing-feng PRs Transformers: https://github.com/jiqing-feng/transformers/pull/3/files |
Final update for today. We are trying to find min pain-point for fixing peft/gptqmodel non-ipex paths. There are solutions but none are great since only three kernels (torch/cuda/triton) actually supports accelerated + trainable forward. But selecting right kernel in transformer load, then optimum quantize, followed finally by peft/training mode are very disjointed steps in transformer: there appears to be no auto-hook to register so gptqmodel internals can switch non trainable kernel into trainable kernel. Can a nn.module actually register a hook to know its going into training mode beyond a boolean self.trainable state that is only checked in forward()? We will test the best available method tomrrow. Ran out of time today. |
Best-fit solution has been found. We are refactoring gptqmodel code at the moment so peft compat can be added cleanly. |
Almost there. All code 99.8% ready for all 3 prs. Internal testing starting. @jiqing-feng Once all our tests pass, I will let you know to merge to your 3 PRS and before we can start final round of testing. Tracking: Transformers: https://github.com/jiqing-feng/transformers/pull/3/files |
@jiqing-feng All 3 prs passing all cpu/gpu tests internally here. There are some string output mismatches but that's random/variability on different cpu. Cpu tests may fail, tensors on cuda, when cpu tests are run in env where both cpu + cuda device are exposed despite Once these 3 prs are merged to the larger PRS, I will write a length explanation on some of the obvious and not so obvious small/large changes we require/pushed. |
All merged, will check it again on CPU. |
All tests passing. Both gptqmodel internal tests and transformer/optimum/peft tests passing for cpu/xpu/cuda. |
GPTQModel v1.4.0 has been released. If there are any change required for the above upstream PRs, we will make changes and cut a new release. |
Function Uptreaming
optimum
2064 <-- MERGED
transformers
35012 <-- MERGED
peft
2247 <-- PENDING
Tests
optimum
test_quantization
RUN_SLOW=1 pytest tests/gptq/test_quantization.py
transformers
test_gptq
RUN_SLOW=1 pytest tests/quantization/gptq/test_gptq.py
peft
PeftGPTQGPUTests
pytest tests/test_gpu_examples.py::PeftGPTQTests
andpytest tests/test_common_gpu.py::PeftCommonTests::test_lora_gptq_quantization_from_pretrained_safetensors
I suppose we don't need new unit tests for gptq in HF, just need to pass all gptq tests with gptqmodel lib. Please help to confirm it.
cc @Qubitium @SunMarc
The text was updated successfully, but these errors were encountered: