Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fixed bug in peft installation for gptqmodel #81

Conversation

achew010
Copy link
Contributor

@achew010 achew010 commented Sep 3, 2024

Description

This is a fix for the gptq-peft plugin to follow the official peft implementation where specifying target_modules='all-linear' will install adapters on all linear layers. Note that HF by default will not install adapters on lm_head for all-linear. This applies to both our locally maintained gptqmodel as well as the opensource auto_gptq.

  • Run with adapters installed on all linear modules --target_modules 'all-linear
python -m tuning.sft_trainer --model_name_or_path TheBloke/Mistral-7B-v0.1-GPTQ --packing False --max_seq_len 4096 --auto_gptq triton_v2 True --training_data_path orca/benchmark_outputs/data/cache_TheBloke_Mistral_7B_v0.1_GPTQ.json --use_flash_attn True --include_tokens_per_second True --num_train_epochs 1 --gradient_checkpointing True --evaluation_strategy no --save_strategy no --weight_decay 0.01 --warmup_steps 10 --adam_epsilon 1e-4 --lr_scheduler_type linear --logging_strategy steps --logging_steps 10 --learning_rate 2e-4 --fp16 True --torch_dtype float16 --peft_method lora --r 16 --lora_alpha 16 --lora_dropout 0.1 --target_modules all-linear --gradient_accumulation_steps 2 --per_device_train_batch_size 4 --output_dir tmp --skip_memory_metrics False

All linear layers except lm_head are installed with adapters and reflected in size of trainable parameters

***** Running training *****
  Num examples = 2,000
  Num Epochs = 1
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 2
  Total optimization steps = 250
  Number of trainable parameters = 41,943,040
  • Run with specified target modules --target_modules q_proj k_proj v_proj o_proj, smaller trainable parameters are displayed.
python -m tuning.sft_trainer --model_name_or_path TheBloke/Mistral-7B-v0.1-GPTQ --packing False --max_seq_len 4096 --auto_gptq triton_v2 True --training_data_path orca/benchmark_outputs/data/cache_TheBloke_Mistral_7B_v0.1_GPTQ.json --use_flash_attn True --include_tokens_per_second True --num_train_epochs 1 --gradient_checkpointing True --evaluation_strategy no --save_strategy no --weight_decay 0.01 --warmup_steps 10 --adam_epsilon 1e-4 --lr_scheduler_type linear --logging_strategy steps --logging_steps 10 --learning_rate 2e-4 --fp16 True --torch_dtype float16 --peft_method lora --r 16 --lora_alpha 16 --lora_dropout 0.1 --target_modules q_proj k_proj v_proj o_proj --gradient_accumulation_steps 2 --per_device_train_batch_size 4 --output_dir tmp --skip_memory_metrics False

Smaller number of trainable parameters

***** Running training *****
  Num examples = 2,000
  Num Epochs = 1
  Instantaneous batch size per device = 4
  Total train batch size (w. parallel, distributed & accumulation) = 8
  Gradient Accumulation steps = 2
  Total optimization steps = 250
  Number of trainable parameters = 13,631,488

@achew010 achew010 force-pushed the fms-acceleration-peft-bug-fixes branch 3 times, most recently from d7ca2f4 to dc11779 Compare September 4, 2024 08:00
@achew010 achew010 force-pushed the fms-acceleration-peft-bug-fixes branch from dc11779 to b707814 Compare September 4, 2024 08:11
@achew010 achew010 marked this pull request as ready for review September 4, 2024 08:16
@achew010 achew010 requested a review from fabianlim as a code owner September 4, 2024 08:16
Copy link
Contributor

@fabianlim fabianlim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor formatting suggestion and a question

if ignore_lm_head and lm_head_name not in ignore:
ignore.append(lm_head_name)
results = set()
for n, m in model.named_modules():
if isinstance(m, torch.nn.Linear):
if isinstance(m, QuantLinearTriton):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure QuantLinearTriton is the only quant linear module that is used by gptqmodel?

Copy link
Contributor Author

@achew010 achew010 Sep 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fabianlim we limited gptqmodel to only use QuantlinearTriton in this function but i think setting parent class BaseQuantLinear would more properly cover all gptqmodel qlinears modules. I can set it to the parent class instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok can we make the change then

@achew010 achew010 force-pushed the fms-acceleration-peft-bug-fixes branch from 57be3e0 to 4666f2c Compare September 4, 2024 11:57
@achew010 achew010 force-pushed the fms-acceleration-peft-bug-fixes branch from 4666f2c to e0a3589 Compare September 4, 2024 11:59
@@ -318,7 +320,7 @@ def augmentation(
model = get_gptq_peft_model(
model,
peft_config=peft_config,
auto_find_all_linears=peft_config.target_modules is None,
auto_find_all_linears=(peft_config.target_modules == PEFT_ALL_LINEAR),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should check for both ["all-linear"] and "all-linear"?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point @aluu317

I see that @achew010 already tested with the common usage of the CLI, which would be

--target_modules all-linear

which seems to be parsed as a single string "all-linear".

We could guard against the case ["all-linear"] though.

@achew010 achew010 requested a review from aluu317 September 5, 2024 11:01
@achew010
Copy link
Contributor Author

achew010 commented Sep 5, 2024

@aluu317 I made changes to check for both 'all-linear' and ['all-linear']. Could you review it before we merge and release?

Copy link

@aluu317 aluu317 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!!

if isinstance(tm, list):
if PEFT_ALL_LINEAR not in tm:
return False
assert len(tm) == 1, f"`{PEFT_ALL_LINEAR}` must exist alone in target modules"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahh interesting. So we don't allow someone passing target_modules as ["all_linear", "lm_head"]? We allow this in our doc for LoRA as HF allows it for loRA https://github.com/foundation-model-stack/fms-hf-tuning?tab=readme-ov-file#lora-tuning-example (see the section "How to specify lm_head as a target module", example 3). I can update the doc however for qLoRA. Or should we keep consistent with loRA and allow both values?

Up to you! I will approve the PR so it's not blocked on me for merging for this release, but we can open new PRs to amend this behavior

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@aluu317 in your doc, I understand ["all_linear", "lm_head"] as equivalent to ["all_linear"], is that correct? That is, all-linear by default includes the lm_head also?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, what that doc means is running ["all_linear", "lm_head"] is same as running ["all_linear"] only. lm_head will not be produced in either way

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I am not clear, are you saying ["all_linear", "lm_head"] will not install adapters onto lm_head?

@fabianlim
Copy link
Contributor

@aluu317 i see you have updated your documentation to be consistent with this PR, i feel in the long run, we should just have the same behavior as regular lora, otherwise it will be too confusing for users. Lets continue these clarifications to get there, but for now i will merge this.

@fabianlim fabianlim merged commit 2e2736b into foundation-model-stack:main Sep 5, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants