You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please check that this issue hasn't been reported before.
I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
I'm testing out the falcon-7B finetuning example with the config file examples/falcon/config-7b-qlora.yml as is.
Current behaviour
As suggested in the README, I ran the command line accelerate launch -m axolotl.cli.train examples/falcon/config-7b-qlora.yml
It first errors out with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 43, in <module>
fire.Fire(do_cli)
File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 26, in do_cli
parsed_cfg = load_cfg(config, **kwargs)
File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/__init__.py", line 290, in load_cfg
validate_config(cfg)
File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/utils/config.py", line 349, in validate_config
raise ValueError(
ValueError: ``early_stopping_patience`` requires save_steps and eval_steps to be set. eval_steps should evenly divide save_steps.
Traceback (most recent call last):
File "/home/radhachitta/.local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
simple_launcher(args)
File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
After unsetting early_stopping_patience as early_stopping_patience: this is the error
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 43, in <module>
fire.Fire(do_cli)
File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 38, in do_cli
dataset_meta = load_datasets(cfg=parsed_cfg, cli_args=parsed_cli_args)
File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/__init__.py", line 310, in load_datasets
tokenizer = load_tokenizer(cfg)
File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/utils/models.py", line 178, in load_tokenizer
raise ValueError(
ValueError: Please set lora_modules_to_save to `embed_tokens`, `lm_head` when using an adapter and changing the special tokens.
Traceback (most recent call last):
File "/home/radhachitta/.local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
simple_launcher(args)
File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python3.10', '-m', 'axolotl.cli.train', 'examples/falcon/config-7b-qlora.yml']' returned non-zero exit status 1.
Finally after setting the lora_modules_to_save as lora_modules_to_save: embed_tokens, lm_head, this is the error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/opt/conda/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 43, in <module>
fire.Fire(do_cli)
File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/cli/train.py", line 39, in do_cli
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/train.py", line 65, in train
model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/utils/models.py", line 634, in load_model
model, lora_config = load_adapter(model, cfg, cfg.adapter)
File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/utils/models.py", line 670, in load_adapter
return load_lora(model, cfg, inference=inference)
File "/home/radhachitta/llm-finetuning/axolotl/src/axolotl/utils/models.py", line 756, in load_lora
model = get_peft_model(model, lora_config)
File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/mapping.py", line 133, in get_peft_model
return MODEL_TYPE_TO_PEFT_MODEL_MAPPING[peft_config.task_type](model, peft_config, adapter_name=adapter_name)
File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/peft_model.py", line 1041, in __init__
super().__init__(model, peft_config, adapter_name)
File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/peft_model.py", line 123, in __init__
self.base_model = cls(model, {adapter_name: peft_config}, adapter_name)
File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/tuners/lora/model.py", line 119, in __init__
super().__init__(model, config, adapter_name)
File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 95, in __init__
self.inject_adapter(self.model, adapter_name)
File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/tuners/tuners_utils.py", line 233, in inject_adapter
new_module = ModulesToSaveWrapper(target, adapter_name)
File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/utils/other.py", line 177, in __init__
self.update(adapter_name)
File "/home/radhachitta/.local/lib/python3.10/site-packages/peft/utils/other.py", line 200, in update
self.modules_to_save[adapter_name].requires_grad_(True)
File "/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2440, in requires_grad_
p.requires_grad_(requires_grad)
RuntimeError: only Tensors of floating point dtype can require gradients
Traceback (most recent call last):
File "/home/radhachitta/.local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
args.func(args)
File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1023, in launch_command
simple_launcher(args)
File "/home/radhachitta/.local/lib/python3.10/site-packages/accelerate/commands/launch.py", line 643, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python3.10', '-m', 'axolotl.cli.train', 'examples/falcon/config-7b-qlora.yml']' returned non-zero exit status 1.
Steps to reproduce
I ran the command accelerate launch -m axolotl.cli.train examples/falcon/config-7b-qlora.yml
with the changes to the yaml as described above
Config yaml
This is the final config-7b-qlora.yaml which results in the last error
# 1b: tiiuae/falcon-rw-1b
# 40b: tiiuae/falcon-40b
base_model: tiiuae/falcon-7b
# required by falcon custom model code: https://huggingface.co/tiiuae/falcon-7b/tree/main
trust_remote_code: false
model_type: AutoModelForCausalLM
tokenizer_type: AutoTokenizer
is_falcon_derived_model: true
load_in_8bit: false
# enable 4bit for QLoRA
load_in_4bit: true
gptq: false
strict: false
push_dataset_to_hub:
datasets:
- path: QingyiSi/Alpaca-CoT
data_files:
- Chain-of-Thought/formatted_cot_data/gsm8k_train.json
type: "alpaca:chat"
dataset_prepared_path:
val_set_size: 0.05
# enable QLoRA
adapter: qlora
lora_model_dir:
sequence_len: 2048
max_packed_sequence_len:
# hyperparameters from QLoRA paper Appendix B.2
# "We find hyperparameters to be largely robust across datasets"
lora_r: 64
lora_alpha: 16
# 0.1 for models up to 13B
# 0.05 for 33B and 65B models
lora_dropout: 0.05
# add LoRA modules on all linear layers of the base model
lora_target_modules:
lora_target_linear: true
lora_fan_in_fan_out:
wandb_project:
wandb_entity:
wandb_watch:
wandb_name:
wandb_log_model:
output_dir: ./qlora-out
# QLoRA paper Table 9
# - 16 for 7b & 13b
# - 32 for 33b, 64 for 64b
# Max size tested on A6000
# - 7b: 40
# - 40b: 4
# decrease if OOM, increase for max VRAM utilization
micro_batch_size: 1
gradient_accumulation_steps: 2
num_epochs: 4
# Optimizer for QLoRA
optimizer: paged_adamw_32bit
torchdistx_path:
lr_scheduler: cosine
# QLoRA paper Table 9
# - 2e-4 for 7b & 13b
# - 1e-4 for 33b & 64b
learning_rate: 0.0002
train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: true
gradient_checkpointing: true
# stop training after this many evaluation losses have increased in a row
# https://huggingface.co/transformers/v4.2.2/_modules/transformers/trainer_callback.html#EarlyStoppingCallback
#early_stopping_patience: 3
early_stopping_patience:
lora_modules_to_save: embed_tokens, lm_head
resume_from_checkpoint:
auto_resume_from_checkpoints: true
local_rank:
logging_steps: 1
xformers_attention: true
flash_attention:
gptq_groupsize:
gptq_model_v1:
warmup_steps: 10
evals_per_epoch: 4
saves_per_epoch: 1
debug:
deepspeed:
weight_decay: 0.000001
fsdp:
fsdp_config:
special_tokens:
pad_token: "<|endoftext|>"
bos_token: ">>ABSTRACT<<"
eos_token: "<|endoftext|>"
Possible solution
No response
Which Operating Systems are you using?
Linux
macOS
Windows
Python Version
3.10.12
axolotl branch-commit
main v0.3.0
Acknowledgements
My issue title is concise, descriptive, and in title casing.
I have searched the existing issues to make sure this bug has not been reported yet.
I am using the latest version of axolotl.
I have provided enough information for the maintainers to reproduce and diagnose the issue.
The text was updated successfully, but these errors were encountered:
Please check that this issue hasn't been reported before.
Expected Behavior
I'm testing out the falcon-7B finetuning example with the config file
examples/falcon/config-7b-qlora.yml
as is.Current behaviour
As suggested in the README, I ran the command line
accelerate launch -m axolotl.cli.train examples/falcon/config-7b-qlora.yml
It first errors out with the following error:
After unsetting early_stopping_patience as
early_stopping_patience:
this is the errorFinally after setting the
lora_modules_to_save
aslora_modules_to_save: embed_tokens, lm_head
, this is the error:Steps to reproduce
I ran the command
accelerate launch -m axolotl.cli.train examples/falcon/config-7b-qlora.yml
with the changes to the yaml as described above
Config yaml
This is the final config-7b-qlora.yaml which results in the last error
Possible solution
No response
Which Operating Systems are you using?
Python Version
3.10.12
axolotl branch-commit
main v0.3.0
Acknowledgements
The text was updated successfully, but these errors were encountered: