Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AutoGPTQ quantization script #545

Draft
wants to merge 24 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
24 commits
Select commit Hold shift + click to select a range
84d4476
WIP Add training callback to send predictions to WandB table
Glavin001 Sep 3, 2023
766875f
Merge branch 'main' of github.com:OpenAccess-AI-Collective/axolotl in…
Glavin001 Sep 5, 2023
0c743e3
WIP improve wandb table reporting callback
Glavin001 Sep 5, 2023
5a7f301
WIP improve wandb table reporting callback (cont)
Glavin001 Sep 5, 2023
8c7b7c5
Add VSCode launching for debugging
Glavin001 Sep 5, 2023
88c31f1
Add tiny llama example
Glavin001 Sep 5, 2023
06a44de
WIP attempt to improve post-eval prediction generation for table
Glavin001 Sep 7, 2023
ab3cffa
WIP attempt to improve post-eval prediction generation for table - pa…
Glavin001 Sep 8, 2023
b22d1c6
WIP batch generation
Glavin001 Sep 8, 2023
6f3216e
WIP attempt to handle sample_packing using position_ids for wandb pre…
Glavin001 Sep 8, 2023
e9eae77
WIP add code for debugging
Glavin001 Sep 8, 2023
83e6b29
Fix sample_packing support for wandb prediction table
Glavin001 Sep 9, 2023
aaf4d1e
Clean up code for PR review
Glavin001 Sep 9, 2023
e4c1a2e
WIP Add AutoGPTQ quantization script
Glavin001 Sep 9, 2023
19a30cf
WIP Integrate quantization into finetune script
Glavin001 Sep 10, 2023
894a4be
Add --quantize option to finetune script, fix auto_gptq logging
Glavin001 Sep 11, 2023
24c0483
Disable quantizing directly after fine tuning
Glavin001 Sep 11, 2023
14d26e1
Add eval_table_size, eval_table_max_new_tokens configs & clean up code
Glavin001 Sep 12, 2023
c6c54ee
Clean up PR, delete VSCode config, add tiny-llama example
Glavin001 Sep 12, 2023
dee3d54
Add eval_table_size, eval_table_max_new_tokens configs & clean up code
Glavin001 Sep 12, 2023
09b16d8
Clean up PR, delete VSCode config, add tiny-llama example
Glavin001 Sep 12, 2023
578d8b6
Merge branch 'feat/wandb-pred-table' of github.com:Glavin001/axolotl …
Glavin001 Sep 12, 2023
cf23998
WIP quantize model & push model
Glavin001 Sep 12, 2023
8a26ab3
WIP
Glavin001 Sep 12, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions .vscode/launch.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
{
// Use IntelliSense to learn about possible attributes.
// Hover to view descriptions of existing attributes.
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
"version": "0.2.0",
"configurations": [
{
"name": "Python: Remote Attach",
"type": "python",
"request": "attach",
"connect": {
"host": "0.0.0.0",
"port": 5678
},
"pathMappings": [
{
"localRoot": "${workspaceFolder}",
"remoteRoot": "/workspace/axolotl/"
}
],
"justMyCode": false
},
{
"name": "train",
"type": "python",
"request": "launch",
"module": "accelerate.commands.launch",
"args": [
"${workspaceFolder}/scripts/finetune.py",
// "${file}",
"${workspaceFolder}/examples/llama-2/tiny-random.yml",
], // other args comes after train.py
"console": "integratedTerminal",
// "env": {"CUDA_LAUNCH_BLOCKING": "1"}
},
]
}
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -703,7 +703,7 @@ Pass the appropriate flag to the train command:
Add below flag to train command above

```bash
--merge_lora --lora_model_dir="./completed-model" --load_in_8bit=False --load_in_4bit=False
--merge_lora --lora_model_dir="./completed-model"
```

If you run out of CUDA memory, you can try to merge in system RAM with
Expand Down
9 changes: 6 additions & 3 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
# version: '3.8'
services:
axolotl:
build:
context: .
dockerfile: ./docker/Dockerfile
# build:
# context: .
# dockerfile: ./docker/Dockerfile
image: winglian/axolotl:main-py3.10-cu118-2.0.1
volumes:
- .:/workspace/axolotl
- ~/.cache/huggingface/:/root/.cache/huggingface/
Expand All @@ -15,6 +16,8 @@ services:
- GIT_COMMITTER_NAME=${GIT_COMMITTER_NAME}
- GIT_COMMITTER_EMAIL=${GIT_COMMITTER_EMAIL}
- WANDB_API_KEY=${WANDB_API_KEY}
ports:
- "5678:5678"
deploy:
resources:
reservations:
Expand Down
70 changes: 70 additions & 0 deletions examples/llama-2/llama-68.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
base_model: JackFram/llama-68m
base_model_config: JackFram/llama-68m

model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true

load_in_8bit: true
load_in_4bit: false
strict: false

datasets:
- path: mhenrichsen/alpaca_2k_test
type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.01
output_dir: ./lora-out

# sequence_len: 4096
sequence_len: 2048
sample_packing: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 3
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
eval_steps: 20
eval_table_size: 5
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"
70 changes: 70 additions & 0 deletions examples/llama-2/lora-short.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
base_model: meta-llama/Llama-2-7b-hf
base_model_config: meta-llama/Llama-2-7b-hf
model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true

load_in_8bit: true
load_in_4bit: false
strict: false

datasets:
- path: mhenrichsen/alpaca_2k_test
type: alpaca
dataset_prepared_path: last_run_prepared
# val_set_size: 0.01
val_set_size: 0.001
output_dir: ./lora-out

sequence_len: 4096
sample_packing: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
# num_epochs: 3
# num_epochs: 1
num_epochs: 0.1
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
eval_steps: 20
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"
4 changes: 3 additions & 1 deletion examples/llama-2/lora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_project: test-issue-490-7b-2
wandb_entity:
wandb_watch:
wandb_run_id:
Expand Down Expand Up @@ -56,6 +56,8 @@ flash_attention: true

warmup_steps: 10
eval_steps: 20
eval_table_size: 5
eval_table_max_new_tokens: 128
save_steps:
debug:
deepspeed:
Expand Down
1 change: 1 addition & 0 deletions examples/llama-2/qlora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,7 @@ flash_attention: true

warmup_steps: 10
eval_steps: 20
eval_table_size: 5
save_steps:
debug:
deepspeed:
Expand Down
69 changes: 69 additions & 0 deletions examples/llama-2/tiny-llama.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
base_model: PY007/TinyLlama-1.1B-step-50K-105b
base_model_config: PY007/TinyLlama-1.1B-step-50K-105b

model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true

load_in_8bit: true
load_in_4bit: false
strict: false

datasets:
- path: mhenrichsen/alpaca_2k_test
type: alpaca
dataset_prepared_path: last_run_prepared
val_set_size: 0.01
output_dir: ./lora-out

sequence_len: 4096
sample_packing: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 2
num_epochs: 3
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
eval_steps: 20
eval_table_size: 5
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"
71 changes: 71 additions & 0 deletions examples/llama-2/tiny-puffed-llama.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
base_model: PY007/TinyLlama-1.1B-step-50K-105b
base_model_config: PY007/TinyLlama-1.1B-step-50K-105b

model_type: LlamaForCausalLM
tokenizer_type: LlamaTokenizer
is_llama_derived_model: true

load_in_8bit: true
load_in_4bit: false
strict: false

datasets:
# - path: mhenrichsen/alpaca_2k_test
# type: alpaca
- path: LDJnr/Puffin
type: sharegpt:chat
dataset_prepared_path: last_run_prepared
val_set_size: 0.01
output_dir: ./lora-tiny-puffed-out

sequence_len: 2048
sample_packing: true

adapter: lora
lora_model_dir:
lora_r: 32
lora_alpha: 16
lora_dropout: 0.05
lora_target_linear: true
lora_fan_in_fan_out:

wandb_project:
wandb_entity:
wandb_watch:
wandb_run_id:
wandb_log_model:

gradient_accumulation_steps: 4
micro_batch_size: 4
num_epochs: 2
optimizer: adamw_bnb_8bit
lr_scheduler: cosine
learning_rate: 0.0002

train_on_inputs: false
group_by_length: false
bf16: true
fp16: false
tf32: false

gradient_checkpointing: true
early_stopping_patience:
resume_from_checkpoint:
local_rank:
logging_steps: 1
xformers_attention:
flash_attention: true

warmup_steps: 10
eval_steps: 20
eval_table_size: 10
save_steps:
debug:
deepspeed:
weight_decay: 0.0
fsdp:
fsdp_config:
special_tokens:
bos_token: "<s>"
eos_token: "</s>"
unk_token: "<unk>"
Loading
Loading