Currently only supports LoRA-related techniques, but more are in the pipeline to be added:
Plugin | Description | Depends | Loading | Augmentation | Callbacks |
---|---|---|---|---|---|
autogptq | Loads 4bit GPTQ-LoRA with quantized GPTQ as base | AutoGPTQ | ✅ | ✅ | |
bnb | Loads 4bit QLoRA with quantized bitsandbytes Linear4 | Huggingface bitsandbytes |
✅ | ✅ |
- fix upcasting (resulting in slowdown) issue for
bnb
plugin, originally discovered by inventors of Unsloth. bnb
properly configured to work with FSDP following this guide.triton_v2
kernels are not yet properly integrated into huggingface optimum.triton_v2
kernels are the only 4bit kernels that work for training.
GPTQ-LORA depends on an AutoGPTQ backend to run. There are 2 backend options
- Current Implementation
- This is an extracted local subset from ModelCloud's refactored fork.
- It removes redundant code to simplify build and installation of the plugin
- Legacy Implementation
-
This requires building the package from the official AutoGPTQ repository
-
To replicate this implementation, follow the installation below
- The legacy implementation of GPTQ-LORA uses an external AutoGPTQ package, you must ensure the specific commit is installed
pip install git+https://github.com/AutoGPTQ/AutoGPTQ.git@ea829c7bbe83561c2b1de26795b6592992373ef7
- To construct the plugin, in the configuration object that is passed to the plugin - set
use_external_lib: True
(otherwise defaults to use the local AutoGPTQ package)
peft: quantization: auto_gptq: kernel: triton_v2 from_quantized: True use_external_lib: True
- The legacy implementation of GPTQ-LORA uses an external AutoGPTQ package, you must ensure the specific commit is installed
-
- Models with sliding windows (e.g., Mistral, Mixtral) will have memory and throughout issues.
- GPTQ-LORA sometimes observed to have
nan
grad norms in the begining of training, but training proceeds well otherwise. low_cpu_mem_usage
temporarily disabled for AutoGPTQ until bug withmake_sure_no_tensor_in_meta_device
is resolved.