Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENHANCEMENT] Could u provide the EE-Tuning scripts for llama-2-7b model? #19

Open
Noblezhong opened this issue Oct 9, 2024 · 3 comments
Labels

Comments

@Noblezhong
Copy link

hi! I am a undergraduate student who interested in your team's project. When I run demo code in EE-Tuning part, I found there are no tuning scripts for llama2-7b model (truly provided for 13b and 70b), but there are convert scripts for llama2-7b

add_exit_layers.sh

For llama2 7B model (32 layers)

[add an embedding-norm-mlp exit every 1/8 depth
python ${MEGATRON_ROOT_PATH}/tools/checkpoint/checkpoint_converter.py \
    --load-dir $LOAD_DIR \
    --save-dir $SAVE_DIR \
    --load-iteration $LOAD_ITER \
    --use-exit-norm \
    --use-exit-mlp \
    --conversion-type add-exit \
    --add-exit-layer-nums 4 8 12 16 20 24 28 32 \
    --megatron-path $MEGATRON_ROOT_PATH]

So, I think you can add a tuning scripts for llama2-7b. Maybe threre is someone like me just have poor hardware condition (A40), or just simply tell us how to modify the original tuning scripts to adapt to llama2-7b. Thx!

@Noblezhong Noblezhong changed the title [ENHANCEMENT] Can u provide the EE-Tuning scripts for llama-2-7b model? [ENHANCEMENT] Could u provide the EE-Tuning scripts for llama-2-7b model? Oct 9, 2024
@pan-x-c
Copy link
Owner

pan-x-c commented Oct 11, 2024

Try this one

#!/bin/bash

PROJECT_NAME=EE-TUNE
GROUP_NAME=llama-2-7B-chat-8-EXIT-mlp-norm-pt

CURRENT_TIME=`date "+%m%d-%H%M"`

MASTER_NAME=${CURRENT_TIME}

export CUDA_DEVICE_MAX_CONNECTIONS=1
export OMP_NUM_THREADS=4

LOAD_PATH=
CHECKPOINT_PATH=
TOKENIZER_PATH=

DATA_HOME=

DATASET_ARXIV=${DATA_HOME}/redpajama-arxiv/all
DATASET_BOOKS=${DATA_HOME}/redpajama-book/all
DATASET_C4=${DATA_HOME}/redpajama-c4/all
DATASET_CC=${DATA_HOME}/redpajama-cc/all
DATASET_STACKEXCHANGE=${DATA_HOME}/redpajama-pile-stackexchange/all
DATASET_CODE=${DATA_HOME}/redpajama-stack-code/all
DATASET_WIKIPEDIA=${DATA_HOME}/redpajama-wiki/all

DATASET_PILE_EUROPARL=${DATA_HOME}/the-pile-europarl/all
DATASET_PILE_FREELAW=${DATA_HOME}/the-pile-freelaw/all
DATASET_PILE_HACKERNEWS=${DATA_HOME}/the-pile-hackernews/all
DATASET_PILE_NIH=${DATA_HOME}/the-pile-nih/all
DATASET_PILE_PHILPAPER=${DATA_HOME}/the-pile-philpaper/all
DATASET_PILE_PMA=${DATA_HOME}/the-pile-pubmed-abstract/all
DATASET_PILE_PMC=${DATA_HOME}/the-pile-pubmed-central/all
DATASET_PILE_USPTO=${DATA_HOME}/the-pile-uspto/all

DATA_PATH="\
    0.0362 ${DATASET_ARXIV} \
    0.0657 ${DATASET_BOOKS} \
    0.2264 ${DATASET_C4} \
    0.4491 ${DATASET_CC} \
    0.0246 ${DATASET_STACKEXCHANGE} \
    0.0810 ${DATASET_CODE} \
    0.0548 ${DATASET_WIKIPEDIA} \
    0.0010 ${DATASET_PILE_EUROPARL} \
    0.0162 ${DATASET_PILE_FREELAW} \
    0.0006 ${DATASET_PILE_HACKERNEWS} \
    0.0005 ${DATASET_PILE_NIH} \
    0.0006 ${DATASET_PILE_PHILPAPER} \
    0.0065 ${DATASET_PILE_PMA} \
    0.0318 ${DATASET_PILE_PMC} \
    0.0050 ${DATASET_PILE_USPTO} \
"


NLAYERS=32
HIDDEN=4096
HEADS=32
SEQ=2048
FFN_SIZE=11008

TP=1
PP=4

MICRO_BATCH=4
GLOBAL_BATCH=16


MASTER_ADDR=127.0.0.1
MASTER_PORT=5900
WORLD_SIZE=1
RANK=0
NPROC_PER_NODE=4

TRAIN_ITER=40000
EVAL_INTERVAL=50000
SAVE_INTERVAL=20000

DIST_ARGS="
    --master_addr $MASTER_ADDR \
    --master_port $MASTER_PORT \
    --nproc_per_node $NPROC_PER_NODE \
    --nnodes $WORLD_SIZE \
    --node_rank $RANK \
    "

GPT_ARGS="
    --tensor-model-parallel-size $TP \
    --pipeline-model-parallel-size $PP \
    --num-layers $NLAYERS \
    --hidden-size $HIDDEN \
    --num-attention-heads $HEADS \
    --seq-length $SEQ \
    --max-position-embeddings $SEQ \
    --micro-batch-size $MICRO_BATCH \
    --global-batch-size $GLOBAL_BATCH \
    --lr 0.0001 \
    --train-iters $TRAIN_ITER \
    --min-lr 1.0e-5 \
    --lr-warmup-fraction .01 \
    --adam-beta1 0.9 \
    --adam-beta2 0.95 \
    --adam-eps 1e-5 \
    --clip-grad 1.0 \
    --bf16 \
    --disable-bias-linear \
    --use-flash-attn \
    --normalization RMSNorm \
    --position-embedding-type rope \
    --swiglu \
    --exit-layer-nums 4 8 12 16 20 24 28 32 \
    --use-exit-norm \
    --use-exit-mlp \
    --untie-embeddings-and-output-weights \
    --untie-exit-output-weights \
    --padded-vocab-size 32000 \
    --ffn-hidden-size $FFN_SIZE \
    --finetune \
    --tune-exit-pipeline-parallel-size 4 \
    --tune-exit \
"

DATA_ARGS="
    --data-path $DATA_PATH \
    --tokenizer-type Llama2Tokenizer \
    --tokenizer-model $TOKENIZER_PATH \
    --split 990,9,1 \
"

OUTPUT_ARGS="
    --log-interval 10 \
    --log-timers-to-tracker \
    --save-interval $SAVE_INTERVAL \
    --eval-interval $EVAL_INTERVAL \
    --eval-iters 1 \
    --wandb-project $PROJECT_NAME \
    --wandb-group $GROUP_NAME \
    --wandb-exp-name $MASTER_NAME \
"

torchrun $DIST_ARGS \
    pretrain_early_exit_gpt.py \
    $GPT_ARGS \
    $DATA_ARGS \
    $OUTPUT_ARGS \
    --load $LOAD_PATH \
    --save $CHECKPOINT_PATH

@Noblezhong
Copy link
Author

thank you!

Copy link

Marking as stale. No activity in 60 days.

@github-actions github-actions bot added the stale label Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants