Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support to disable exllama for gptq #604

Merged
merged 3 commits into from
Sep 19, 2023
Merged

support to disable exllama for gptq #604

merged 3 commits into from
Sep 19, 2023

Conversation

winglian
Copy link
Collaborator

fixes #599

adding gptq_disable_exllama: true to the yml config should fix the issue with gptq

@NanoCode012
Copy link
Collaborator

NanoCode012 commented Sep 19, 2023

The author of the linked issue mentions

    raise ValueError('Expected a cuda device, but got: {}'.format(device))
ValueError: Expected a cuda device, but got: cpu

after setting it. Is that due to cuda oom?

@Napuh
Copy link
Contributor

Napuh commented Sep 19, 2023

Executing the example yaml file in this branch thows an error related to modifying the LlamaConfig object:

  File "/home/axolotl/scripts/finetune.py", line 52, in <module>
    fire.Fire(do_cli)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 475, in _Fire
    component, remaining_args = _CallAndUpdateTrace(
  File "/opt/conda/lib/python3.10/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "/home/axolotl/scripts/finetune.py", line 48, in do_cli
    train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta)
  File "/home/axolotl/src/axolotl/train.py", line 58, in train
    model, peft_config = load_model(cfg, tokenizer, inference=cli_args.inference)
  File "/home/axolotl/src/axolotl/utils/models.py", line 202, in load_model
    model_config["disable_exllama"] = cfg.gptq_disable_exllama
TypeError: 'LlamaConfig' object does not support item assignment
Traceback (most recent call last):
  File "/opt/conda/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 47, in main
    args.func(args)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 986, in launch_command
    simple_launcher(args)
  File "/opt/conda/lib/python3.10/site-packages/accelerate/commands/launch.py", line 628, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/opt/conda/bin/python', 'scripts/finetune.py', 'examples/llama-2/gptq-lora.yml']' returned non-zero exit status 1.

Also:

The author of the linked issue mentions

    raise ValueError('Expected a cuda device, but got: {}'.format(device))
ValueError: Expected a cuda device, but got: cpu

after setting it. Is that due to cuda oom?

I doubt is a CUDA memory issue, I was executing on a RTX3090 24GB. Wasn't a fluke either, as the error persisted in two different machines.

@winglian
Copy link
Collaborator Author

@Napuh I updated the fix, lmk if that works.

@Napuh
Copy link
Contributor

Napuh commented Sep 19, 2023

Now it's throwing the same error as in #599, ValueError: Found modules on cpu/disk. Using Exllama backend requires all the modules to be on GPU.You can deactivate exllama backend by setting `disable_exllama=True` in the quantization config object.

Im executing accelerate launch scripts/finetune.py examples/llama-2/gptq-lora.yml, am I doing something wrong?

Also script keeps loading after model downloading for about 5 minutes, and no memory is ever allocated on the gpu (monitored manually via nvidia-smi).

@winglian
Copy link
Collaborator Author

@Napuh hopefully this most recent commit resolves it.

@Napuh
Copy link
Contributor

Napuh commented Sep 19, 2023

@winglian Latest commit solves the config issue with exllama, but now it's throwing ValueError: Expected a cuda device, but got: cpu, the same error as if I edit the config.json manually.

#456 may be related as it's the same error, but the scenario is different from this one.

@winglian
Copy link
Collaborator Author

@winglian Latest commit solves the config issue with exllama, but now it's throwing ValueError: Expected a cuda device, but got: cpu, the same error as if I edit the config.json manually.

#456 may be related as it's the same error, but the scenario is different from this one.

#609 should fix the issue for the device check when logging gpu utilization

@winglian winglian merged commit faecff9 into main Sep 19, 2023
@winglian winglian deleted the gptq-disable-exllama branch September 19, 2023 21:51
mkeoliya pushed a commit to mkeoliya/axolotl that referenced this pull request Dec 15, 2023
* support to disable exllama for gptq

* update property instead of item

* fix config key
djsaunde pushed a commit that referenced this pull request Dec 17, 2024
* support to disable exllama for gptq

* update property instead of item

* fix config key
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Llama2 GPTQ training does not work
3 participants