-
-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add AutoGPTQ quantization script #545
base: main
Are you sure you want to change the base?
Conversation
…to feat/wandb-pred-table
# import debugpy | ||
# debugpy.listen(('0.0.0.0', 5678)) | ||
# debugpy.wait_for_client() | ||
# debugpy.breakpoint() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Glavin001 clean up old code
scripts/quantize.py
Outdated
prompter = AlpacaPrompter() | ||
|
||
# def load_data(data_path, tokenizer, n_samples, template=TEMPLATE): | ||
def load_data(data_path, tokenizer, n_samples): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Glavin001 Delete this. Have a new method using Axolotl built-in functions
scripts/quantize.py
Outdated
) | ||
|
||
# TEMPLATE = "<|prompt|>{instruction}</s><|answer|>" | ||
prompter = AlpacaPrompter() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Glavin001 delete. Using Axolotl config and built-in functions now
scripts/quantize.py
Outdated
# huggingface_username = "CHANGE_ME" | ||
## CHANGE ABOVE | ||
|
||
quantize_config = BaseQuantizeConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Glavin001 : Add to Axolotl config?
cc @winglian @tmm1 @NanoCode012 : Would you recommend leaving this as default or adding to Axolotl config file as options?
scripts/quantize.py
Outdated
print("Done importing...") | ||
|
||
## CHANGE BELOW ## | ||
config_path: Path = Path("./examples/llama-2/lora.yml") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Glavin001 : Replace hard-coded path with the Axolotl callback current config
configure_logging() | ||
LOG = logging.getLogger("axolotl") | ||
|
||
# logging.basicConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Help Wanted
I couldn't get any logging to work from AutoGPTQ. Would be nice to fix logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is my old code which works when not calling Axolotl's configure_logging()
scripts/quantize.py
Outdated
print("Merged model not found. Merging...") | ||
# model, tokenizer = load_model(cfg, inference=True) | ||
# do_merge_lora_model_and_tokenizer(cfg=cfg, model=model, tokenizer=tokenizer) | ||
raise NotImplementedError("Merging model is not implemented yet.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Glavin001 TODO implement this. So quantization has merged model to work with
# accelerate launch ./scripts/finetune.py ./examples/llama-2/lora.yml --merge_lora --lora_model_dir="./lora-out" --load_in_8bit=False --load_in_4bit=False | ||
# CUDA_VISIBLE_DEVICES="1" accelerate launch ./scripts/finetune.py ./examples/llama-2/lora.yml --merge_lora --lora_model_dir="./lora-out" --load_in_8bit=False --load_in_4bit=False | ||
|
||
# HUB_MODEL_ID="Glavin001/llama-2-7b-alpaca_2k_test" accelerate launch ./scripts/quantize.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Glavin001 delete test notes
cfg.wandb_project = os.environ.get("WANDB_PROJECT") | ||
|
||
if os.environ.get("HUB_MODEL_ID") and len(os.environ.get("HUB_MODEL_ID", "")) > 0: | ||
cfg.hub_model_id = os.environ.get("HUB_MODEL_ID") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: this is used for upcoming work of starting scripts/finetune.py
and having it run without any custom / run specific / user specific info in the Axolotl config.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be better off in the setup_wandb_env_vars
function
972bfcf
to
894a4be
Compare
scripts/finetune.py
Outdated
train(cfg=parsed_cfg, cli_args=parsed_cli_args, dataset_meta=dataset_meta) | ||
# tokenizer = None | ||
should_quantize = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TODO: @Glavin001 should make this based off the config
|
||
log_gpu_memory() | ||
|
||
do_merge_lora(cfg=parsed_cfg, cli_args=parsed_cli_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Help Wanted
I kept getting:
Expected a cuda device, but got: cpu
when calling do_merge_lora
Running nvidia-smi
always showed lots of GPU memory still taken up / unreleased.
Let's wait until #521 is merged.
Closes #491
Demo
How to try yourself
Create a quantized model with Axolotl in 3 steps:
1️⃣ Train
2️⃣ Merge
3️⃣ 🆕 Quantize
Progress:
Look for logging lines such as:
This shows
4/32
layers quantized.Task list
Will add another callback to automatically merge and quantize upon completion, if enabled in configmodel
from GPU memory, so couldn't run merge model directly after / in the same process, so needed to make them separate steps.--quantize
CLI option