-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fixed bug in peft installation for gptqmodel #81
fixed bug in peft installation for gptqmodel #81
Conversation
d7ca2f4
to
dc11779
Compare
Signed-off-by: 1000850000 user <[email protected]>
dc11779
to
b707814
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor formatting suggestion and a question
plugins/accelerated-peft/src/fms_acceleration_peft/framework_plugin_autogptq.py
Outdated
Show resolved
Hide resolved
if ignore_lm_head and lm_head_name not in ignore: | ||
ignore.append(lm_head_name) | ||
results = set() | ||
for n, m in model.named_modules(): | ||
if isinstance(m, torch.nn.Linear): | ||
if isinstance(m, QuantLinearTriton): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we sure QuantLinearTriton
is the only quant linear module that is used by gptqmodel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fabianlim we limited gptqmodel to only use QuantlinearTriton
in this function but i think setting parent class BaseQuantLinear
would more properly cover all gptqmodel qlinears modules. I can set it to the parent class instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok can we make the change then
57be3e0
to
4666f2c
Compare
Signed-off-by: 1000850000 user <[email protected]>
4666f2c
to
e0a3589
Compare
@@ -318,7 +320,7 @@ def augmentation( | |||
model = get_gptq_peft_model( | |||
model, | |||
peft_config=peft_config, | |||
auto_find_all_linears=peft_config.target_modules is None, | |||
auto_find_all_linears=(peft_config.target_modules == PEFT_ALL_LINEAR), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should check for both ["all-linear"]
and "all-linear"
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: 1000850000 user <[email protected]>
@aluu317 I made changes to check for both |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!!
if isinstance(tm, list): | ||
if PEFT_ALL_LINEAR not in tm: | ||
return False | ||
assert len(tm) == 1, f"`{PEFT_ALL_LINEAR}` must exist alone in target modules" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh interesting. So we don't allow someone passing target_modules
as ["all_linear", "lm_head"]
? We allow this in our doc for LoRA as HF allows it for loRA https://github.com/foundation-model-stack/fms-hf-tuning?tab=readme-ov-file#lora-tuning-example (see the section "How to specify lm_head as a target module", example 3). I can update the doc however for qLoRA. Or should we keep consistent with loRA and allow both values?
Up to you! I will approve the PR so it's not blocked on me for merging for this release, but we can open new PRs to amend this behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aluu317 in your doc, I understand ["all_linear", "lm_head"]
as equivalent to ["all_linear"]
, is that correct? That is, all-linear
by default includes the lm_head
also?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, what that doc means is running ["all_linear", "lm_head"]
is same as running ["all_linear"]
only. lm_head
will not be produced in either way
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I am not clear, are you saying ["all_linear", "lm_head"]
will not install adapters onto lm_head
?
plugins/accelerated-peft/src/fms_acceleration_peft/gptqmodel/utils/peft.py
Show resolved
Hide resolved
@aluu317 i see you have updated your documentation to be consistent with this PR, i feel in the long run, we should just have the same behavior as regular lora, otherwise it will be too confusing for users. Lets continue these clarifications to get there, but for now i will merge this. |
Description
This is a fix for the gptq-peft plugin to follow the official peft implementation where specifying
target_modules='all-linear'
will install adapters on all linear layers. Note that HF by default will not install adapters onlm_head
forall-linear
. This applies to both our locally maintainedgptqmodel
as well as the opensourceauto_gptq
.--target_modules 'all-linear
All linear layers except
lm_head
are installed with adapters and reflected in size of trainable parameters--target_modules q_proj k_proj v_proj o_proj
, smaller trainable parameters are displayed.Smaller number of trainable parameters