fixed bug in peft installation for gptqmodel #81

achew010 · 2024-09-03T18:58:27Z

Description

This is a fix for the gptq-peft plugin to follow the official peft implementation where specifying target_modules='all-linear' will install adapters on all linear layers. Note that HF by default will not install adapters on lm_head for all-linear. This applies to both our locally maintained gptqmodel as well as the opensource auto_gptq.

Run with adapters installed on all linear modules --target_modules 'all-linear

python -m tuning.sft_trainer --model_name_or_path TheBloke/Mistral-7B-v0.1-GPTQ --packing False --max_seq_len 4096 --auto_gptq triton_v2 True --training_data_path orca/benchmark_outputs/data/cache_TheBloke_Mistral_7B_v0.1_GPTQ.json --use_flash_attn True --include_tokens_per_second True --num_train_epochs 1 --gradient_checkpointing True --evaluation_strategy no --save_strategy no --weight_decay 0.01 --warmup_steps 10 --adam_epsilon 1e-4 --lr_scheduler_type linear --logging_strategy steps --logging_steps 10 --learning_rate 2e-4 --fp16 True --torch_dtype float16 --peft_method lora --r 16 --lora_alpha 16 --lora_dropout 0.1 --target_modules all-linear --gradient_accumulation_steps 2 --per_device_train_batch_size 4 --output_dir tmp --skip_memory_metrics False

All linear layers except lm_head are installed with adapters and reflected in size of trainable parameters

***** Running training *****
  Num examples = 2,000
  Num Epochs = 1
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 2
  Total optimization steps = 250
  Number of trainable parameters = 41,943,040

Run with specified target modules --target_modules q_proj k_proj v_proj o_proj, smaller trainable parameters are displayed.

python -m tuning.sft_trainer --model_name_or_path TheBloke/Mistral-7B-v0.1-GPTQ --packing False --max_seq_len 4096 --auto_gptq triton_v2 True --training_data_path orca/benchmark_outputs/data/cache_TheBloke_Mistral_7B_v0.1_GPTQ.json --use_flash_attn True --include_tokens_per_second True --num_train_epochs 1 --gradient_checkpointing True --evaluation_strategy no --save_strategy no --weight_decay 0.01 --warmup_steps 10 --adam_epsilon 1e-4 --lr_scheduler_type linear --logging_strategy steps --logging_steps 10 --learning_rate 2e-4 --fp16 True --torch_dtype float16 --peft_method lora --r 16 --lora_alpha 16 --lora_dropout 0.1 --target_modules q_proj k_proj v_proj o_proj --gradient_accumulation_steps 2 --per_device_train_batch_size 4 --output_dir tmp --skip_memory_metrics False

Smaller number of trainable parameters

***** Running training *****
  Num examples = 2,000
  Num Epochs = 1
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 2
  Total optimization steps = 250
  Number of trainable parameters = 13,631,488

Signed-off-by: 1000850000 user <[email protected]>

fabianlim

minor formatting suggestion and a question

plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_autogptq.py

fabianlim · 2024-09-04T10:08:34Z

plugins/accelerated-peft/src/fms_acceleration_peft/gptqmodel/utils/peft.py

    if ignore_lm_head and lm_head_name not in ignore:
        ignore.append(lm_head_name)
    results = set()
    for n, m in model.named_modules():
-        if isinstance(m, torch.nn.Linear):
+        if isinstance(m, QuantLinearTriton):


are we sure QuantLinearTriton is the only quant linear module that is used by gptqmodel?

@fabianlim we limited gptqmodel to only use QuantlinearTriton in this function but i think setting parent class BaseQuantLinear would more properly cover all gptqmodel qlinears modules. I can set it to the parent class instead.

ok can we make the change then

Signed-off-by: 1000850000 user <[email protected]>

aluu317 · 2024-09-04T19:52:48Z

plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_autogptq.py

@@ -318,7 +320,7 @@ def augmentation(
        model = get_gptq_peft_model(
            model,
            peft_config=peft_config,
-            auto_find_all_linears=peft_config.target_modules is None,
+            auto_find_all_linears=(peft_config.target_modules == PEFT_ALL_LINEAR),


Perhaps we should check for both ["all-linear"] and "all-linear"?

Fair point @aluu317

I see that @achew010 already tested with the common usage of the CLI, which would be

--target_modules all-linear

which seems to be parsed as a single string "all-linear".

We could guard against the case ["all-linear"] though.

Signed-off-by: 1000850000 user <[email protected]>

achew010 · 2024-09-05T11:07:09Z

@aluu317 I made changes to check for both 'all-linear' and ['all-linear']. Could you review it before we merge and release?

aluu317

Thank you!!

aluu317 · 2024-09-05T14:09:09Z

plugins/accelerated-peft/src/fms_acceleration_peft/autogptq_utils.py

+    if isinstance(tm, list):
+        if PEFT_ALL_LINEAR not in tm:
+            return False
+        assert len(tm) == 1, f"`{PEFT_ALL_LINEAR}` must exist alone in target modules"


ahh interesting. So we don't allow someone passing target_modules as ["all_linear", "lm_head"]? We allow this in our doc for LoRA as HF allows it for loRA https://github.com/foundation-model-stack/fms-hf-tuning?tab=readme-ov-file#lora-tuning-example (see the section "How to specify lm_head as a target module", example 3). I can update the doc however for qLoRA. Or should we keep consistent with loRA and allow both values?

Up to you! I will approve the PR so it's not blocked on me for merging for this release, but we can open new PRs to amend this behavior

@aluu317 in your doc, I understand ["all_linear", "lm_head"] as equivalent to ["all_linear"], is that correct? That is, all-linear by default includes the lm_head also?

no, what that doc means is running ["all_linear", "lm_head"] is same as running ["all_linear"] only. lm_head will not be produced in either way

Sorry I am not clear, are you saying ["all_linear", "lm_head"] will not install adapters onto lm_head?

plugins/accelerated-peft/src/fms_acceleration_peft/gptqmodel/utils/peft.py

fabianlim · 2024-09-05T23:34:47Z

@aluu317 i see you have updated your documentation to be consistent with this PR, i feel in the long run, we should just have the same behavior as regular lora, otherwise it will be too confusing for users. Lets continue these clarifications to get there, but for now i will merge this.

achew010 force-pushed the fms-acceleration-peft-bug-fixes branch 3 times, most recently from d7ca2f4 to dc11779 Compare September 4, 2024 08:00

fixed bug in peft installation for gptqmodel

b707814

Signed-off-by: 1000850000 user <[email protected]>

achew010 force-pushed the fms-acceleration-peft-bug-fixes branch from dc11779 to b707814 Compare September 4, 2024 08:11

achew010 marked this pull request as ready for review September 4, 2024 08:16

achew010 requested a review from fabianlim as a code owner September 4, 2024 08:16

fabianlim requested changes Sep 4, 2024

View reviewed changes

achew010 force-pushed the fms-acceleration-peft-bug-fixes branch from 57be3e0 to 4666f2c Compare September 4, 2024 11:57

changed peft installation on parent qlinear

e0a3589

Signed-off-by: 1000850000 user <[email protected]>

achew010 force-pushed the fms-acceleration-peft-bug-fixes branch from 4666f2c to e0a3589 Compare September 4, 2024 11:59

fabianlim approved these changes Sep 4, 2024

View reviewed changes

aluu317 reviewed Sep 4, 2024

View reviewed changes

additional checks on peft_config target modules

971c71b

Signed-off-by: 1000850000 user <[email protected]>

achew010 requested a review from aluu317 September 5, 2024 11:01

aluu317 approved these changes Sep 5, 2024

View reviewed changes

fabianlim reviewed Sep 5, 2024

View reviewed changes

plugins/accelerated-peft/src/fms_acceleration_peft/gptqmodel/utils/peft.py Show resolved Hide resolved

fabianlim merged commit 2e2736b into foundation-model-stack:main Sep 5, 2024
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fixed bug in peft installation for gptqmodel #81

fixed bug in peft installation for gptqmodel #81

achew010 commented Sep 3, 2024 •

edited

Loading

fabianlim left a comment

fabianlim Sep 4, 2024

achew010 Sep 4, 2024 •

edited

Loading

fabianlim Sep 4, 2024

aluu317 Sep 4, 2024

fabianlim Sep 5, 2024

achew010 commented Sep 5, 2024 •

edited

Loading

aluu317 left a comment

aluu317 Sep 5, 2024

fabianlim Sep 5, 2024

aluu317 Sep 5, 2024

fabianlim Sep 5, 2024

fabianlim commented Sep 5, 2024

fixed bug in peft installation for gptqmodel #81

fixed bug in peft installation for gptqmodel #81

Conversation

achew010 commented Sep 3, 2024 • edited Loading

Description

fabianlim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

achew010 Sep 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

achew010 commented Sep 5, 2024 • edited Loading

aluu317 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fabianlim commented Sep 5, 2024

achew010 commented Sep 3, 2024 •

edited

Loading

achew010 Sep 4, 2024 •

edited

Loading

achew010 commented Sep 5, 2024 •

edited

Loading