Skip to content

Commit

Permalink
Merge pull request #86 from SeanLee97/improve/nli
Browse files Browse the repository at this point in the history
update nli doc
  • Loading branch information
SeanLee97 authored Jun 28, 2024
2 parents 2605aeb + 1c5515b commit bec0124
Show file tree
Hide file tree
Showing 4 changed files with 62 additions and 31 deletions.
8 changes: 4 additions & 4 deletions angle_emb/angle_trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,10 @@
help='Specify dataset random seed, default None')
parser.add_argument('--workers', type=int, default=2,
help='Specify dataset workers, default 2')
parser.add_argument('--cosine_w', type=float, default=1.0,
help='Specify weight for cosine loss, default 1.0')
parser.add_argument('--ibn_w', type=float, default=1.0,
help='Specify weight for ibn loss, default 1.0')
parser.add_argument('--cosine_w', type=float, default=0.0,
help='Specify weight for cosine loss, default 0.0')
parser.add_argument('--ibn_w', type=float, default=30.0,
help='Specify weight for ibn loss, default 30.0')
parser.add_argument('--angle_w', type=float, default=1.0,
help='Specify weight for angle loss, default 1.0')
parser.add_argument('--angle_tau', type=float, default=20.0,
Expand Down
27 changes: 14 additions & 13 deletions docs/notes/training.rst
Original file line number Diff line number Diff line change
Expand Up @@ -49,24 +49,25 @@ You can train a powerful sentence embedding model using the `angle-trainer` cli

.. code-block:: bash
WANDB_MODE=disabled CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=2345 -m angle_emb.angle_trainer \
--train_name_or_path SeanLee97/all_nli_angle_format_b \
--save_dir ckpts/billm-uae-large-nli \
--model_name_or_path WhereIsAI/UAE-Large-V1 \
WANDB_MODE=disabled CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node=8 --master_port=1234 -m angle_emb.angle_trainer \
--train_name_or_path SeanLee97/all_nli_angle_format_a \
--save_dir ckpts/bert-base-nli-test \
--model_name_or_path google-bert/bert-base-uncased \
--pooling_strategy cls \
--maxlen 75 \
--ibn_w 20.0 \
--maxlen 128 \
--ibn_w 30.0 \
--cosine_w 0.0 \
--angle_w 1.0 \
--learning_rate 1e-6 \
--push_to_hub 1 --hub_model_id SeanLee97/test-uae-large-nli --hub_private_repo 1 \
--logging_steps 5 \
--save_steps 50 \
--angle_tau 20.0 \
--learning_rate 5e-5 \
--push_to_hub 1 --hub_model_id SeanLee97/bert-base-nli-test-0728 --hub_private_repo 1 \
--logging_steps 10 \
--save_steps 100 \
--warmup_steps 50 \
--batch_size 64 \
--batch_size 128 \
--seed 42 \
--gradient_accumulation_steps 4 \
--epochs 1 \
--gradient_accumulation_steps 16 \
--epochs 10 \
--fp16 1
Expand Down
56 changes: 43 additions & 13 deletions examples/NLI/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,27 +40,57 @@ $ bash download_dataset.sh

## 4. Train script

1) use `train_angle.py`
### 4.1 BERT

train:

Here is an training example for BERT-based NLI model:

```bash
CUDA_VISIBLE_DEVICES=1,2,3,4 torchrun --nproc_per_node=4 --master_port=1234 train_angle.py \
--task NLI-STS --save_dir ckpts/NLI-STS-angle-llama-7b \
--model_name NousResearch/Llama-2-7b-hf \
--w2 35 --learning_rate 1e-4 --maxlen 50 \
--lora_r 32 --lora_alpha 32 --lora_dropout 0.1 \
--save_steps 500 --batch_size 120 --seed 42 --do_eval 0 --load_kbit 4 --gradient_accumulation_steps 4 --epochs 1
WANDB_MODE=disabled CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 torchrun --nproc_per_node=8 --master_port=1234 -m angle_emb.angle_trainer \
--train_name_or_path SeanLee97/all_nli_angle_format_a \
--save_dir ckpts/bert-base-nli-test \
--model_name_or_path google-bert/bert-base-uncased \
--pooling_strategy cls \
--maxlen 128 \
--ibn_w 30.0 \
--cosine_w 0.0 \
--angle_w 1.0 \
--angle_tau 20.0 \
--learning_rate 5e-5 \
--push_to_hub 1 --hub_model_id SeanLee97/bert-base-nli-test-0728 --hub_private_repo 1 \
--logging_steps 10 \
--save_steps 100 \
--warmup_steps 50 \
--batch_size 128 \
--seed 42 \
--gradient_accumulation_steps 16 \
--epochs 10 \
--fp16 1
```

eval:

```bash
CUDA_VISIBLE_DEVICES=0 python eval_nli.py \
--model_name_or_path SeanLee97/bert-base-nli-test-0728 \
--pooling_strategy cls_avg
```

2) use `angle-trainer`

You need to transform the AllNLI dataset into jsonl format like {"text1": "", "text2": "", "label": 0/1}.
For the label, we set `entailment` to `1`, `contradiction` to `0`, and skip `neutral`.
Supposed the filename is `train.jsonl`, then you can train as follows:
**Tuning Tips**:

- prepare data into `DatasetFormats.A`
- try to increase epochs
- set gradient_accumulation_steps = n * n_gpus


### 4.2 LLM-based

```bash
CUDA_VISIBLE_DEVICES=1,2,3,4 torchrun --nproc_per_node=4 --master_port=1234 angle-trainer \
CUDA_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc_per_node=4 --master_port=1234 -m angle_emb.angle_trainer \
--model_name_or_path NousResearch/Llama-2-7b-hf \
--train_name_or_path train.jsonl \
--train_name_or_path SeanLee97/all_nli_angle_format_b \
--save_dir ckpts/NLI-STS-angle-llama-7b \
--prompt_template 'Summarize sentence "{text}" in one word:"' \
--w2 35 --learning_rate 1e-4 --maxlen 50 \
Expand Down
2 changes: 1 addition & 1 deletion examples/NLI/eval_nli.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ def main():
'tenacity': 3, 'epoch_size': 2}
elif args.mode == 'test':
# Full mode
params = {'task_path': PATH_TO_DATA, 'usepytorch': True, 'kfold': 10, 'batch_size':16}
params = {'task_path': PATH_TO_DATA, 'usepytorch': True, 'kfold': 10, 'batch_size': 2}
params['classifier'] = {'nhid': 0, 'optim': 'adam', 'batch_size': 64,
'tenacity': 5, 'epoch_size': 4}
else:
Expand Down

0 comments on commit bec0124

Please sign in to comment.