In this project, we would like to check some common properties of the loss functions during training deep learning tasks. In theory, we usually assume the loss to be convex and smooth but that might not be the case due to deep neural networks. For simplicity, let us denote our model parameter as
- Convexity_gap: We compute the additive convexity gap in every iterate as
$f(x_t) -f(y) - \langle \nabla f(x_t), x_t-y \rangle$ where$x_t$ is the current iterate and$y$ is some reference point. We then report the average of this quantity in every epoch (negative convexity gap means the function is convex). - Smoothness: We compute the smoothness constant
$L= |x_t -y|/| | \nabla f(x_t)-\nabla f(y)|$ where$x_t$ is the current iterate and$y$ is some reference point. We then report the maximum L of every epoch. - Ratio: We also compute the multiplicative convexity gap which is
$\langle \nabla f(x_t), x_t-y \rangle/(f(x_t) -f(y)) $ . We then report the sum of the numerator/sum of the denominator in each epoch (our function is "well-behaved" if this ratio is a positive constant).
- For BU SCC
Before installing additional packages, we need to set up a virtual environment. Use 'python3 -m venv <env_name>' to create your environment, then 'source <env_name>/bin/activate' to activate. Then, we need to load some existing modules:
module load python3 pytorch cuda
To install the rest of the packages, go to the appropriate project and run pip install -r requirements.txt
. If there are any missing packages, just keep pip install <package_name>
until there's no error left.
To run the new_clm.py on scc, we can use these following commands:
module load python3 cuda pytorch
source /projectnb/aclab/tranhp/venv/mynewenv/bin/activate
python /projectnb/aclab/tranhp/test_properties/transformers/new_clm.py --dataset_name the_pile --model_name_or_path gpt2 --streaming True --output_dir /projectnb/aclab/tranhp/test_properties/transformers/examples/pytorch/language-modeling/pile_1e-5/ --num_train_epochs 50 --checkpointing_steps epoch --name pile8_1e-5_900000 --weight_decay 0.01 --learning_rate 1e-5 --max_train_steps 1000000 --max_step 1000001
The results would be logged at new_transformer_project in optimizedlearning wandb project. To resume from the checkpoint, we can add the following arguments (change the last 3 arguments accordingly depending on the current step):
python /projectnb/aclab/tranhp/test_properties/transformers/new_clm.py --dataset_name the_pile --model_name_or_path gpt2 --streaming True --output_dir /projectnb/aclab/tranhp/test_properties/transformers/examples/pytorch/language-modeling/pile_1e-5/ --num_train_epochs 50 --checkpointing_steps epoch --name pile8_1e-5_900000 --weight_decay 0.01 --learning_rate 1e-5 --max_train_steps 1000000 --max_step 1000001 --resume_from_checkpoint /projectnb/aclab/tranhp/test_properties/transformers/examples/pytorch/language-modeling/pile_1e-5/0_900000 --resume_from_checkpoint_torch /projectnb/aclab/tranhp/test_properties/transformers/examples/pytorch/language-modeling/pile_1e-5/0_900000.pth.tar --starting_step 900000