-
Notifications
You must be signed in to change notification settings - Fork 445
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
modify train ruGPT3XL finetune example
- Loading branch information
1 parent
580d649
commit 47ccf82
Showing
8 changed files
with
1,499 additions
and
1,531 deletions.
There are no files selected for viewing
This file was deleted.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,42 +1,41 @@ | ||
#! /bin/bash | ||
%%bash | ||
|
||
# Model parallel size | ||
MP_SIZE=1 | ||
# Change for multinode config | ||
NUM_GPUS_PER_WORKER=1 | ||
|
||
gpt_options=" \ | ||
--train-data-path /path/2/train/data/files.list \ | ||
--max-files-per-process 20000 \ | ||
--logging-dir=/path/2/log/dir \ | ||
--train-data-path examples/train.list \ | ||
--test-data-path examples/valid.list \ | ||
--load-huggingface sberbank-ai/rugpt3xl \ | ||
--save /path/2/save/model \ | ||
--tokenizer-path sberbank-ai/rugpt3xl \ | ||
--cache-prefix p5 \ | ||
--save-interval 500 \ | ||
--no-load-optim \ | ||
--finetune \ | ||
--log-interval 100 \ | ||
--model-parallel-size 1 \ | ||
--logging-dir=examples/log/ \ | ||
--save examples/model \ | ||
--save-interval 200 \ | ||
--model-parallel-size ${MP_SIZE} \ | ||
--num-layers 24 \ | ||
--hidden-size 2048 \ | ||
--num-attention-heads 16 \ | ||
--batch-size 2 \ | ||
--batch-size 1 \ | ||
--seq-length 2048 \ | ||
--max-position-embeddings 2048 \ | ||
--train-iters 20000 \ | ||
--train-iters 1000 \ | ||
--distributed-backend nccl \ | ||
--lr 0.000015 \ | ||
--warmup 0.0 \ | ||
--lr-decay-style constant \ | ||
--lr 0.0002 \ | ||
--lr-decay-style cosine \ | ||
--weight-decay 1e-2 \ | ||
--warmup .01 \ | ||
--log-interval 50 \ | ||
--fp16 \ | ||
--sparse-mode alternating \ | ||
--checkpoint-activations \ | ||
--deepspeed-activation-checkpointing \ | ||
--sparse-mode alternating \ | ||
--deepspeed \ | ||
--deepspeed_config ../src/deepspeed_config/gpt3_xl_sparse_2048.json \ | ||
--deepspeed_config src/deepspeed_config/gpt3_xl_sparse_2048.json \ | ||
" | ||
|
||
run_cmd="USE_DEEPSPEED=1 mpirun --np ${NUM_GPUS_PER_WORKER} python ../pretrain_gpt3.py $@ ${gpt_options}" | ||
echo ${run_cmd} | ||
eval ${run_cmd} | ||
run_cmd="USE_DEEPSPEED=1 python -m torch.distributed.launch --nproc_per_node=${NUM_GPUS_PER_WORKER} pretrain_gpt3.py $@ ${gpt_options}" | ||
echo "${run_cmd}" | ||
eval "${run_cmd}" | ||
|
||
set +x |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters