Skip to content

optimizedlearning/test_properties

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

71 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Project overview

In this project, we would like to check some common properties of the loss functions during training deep learning tasks. In theory, we usually assume the loss to be convex and smooth but that might not be the case due to deep neural networks. For simplicity, let us denote our model parameter as $x,y \in R^d$ and our loss function $f(x): R^d \mapsto R$. Then we want to check the following.

  • Convexity_gap: We compute the additive convexity gap in every iterate as $f(x_t) -f(y) - \langle \nabla f(x_t), x_t-y \rangle$ where $x_t$ is the current iterate and $y$ is some reference point. We then report the average of this quantity in every epoch (negative convexity gap means the function is convex).
  • Smoothness: We compute the smoothness constant $L= |x_t -y|/| | \nabla f(x_t)-\nabla f(y)|$ where $x_t$ is the current iterate and $y$ is some reference point. We then report the maximum L of every epoch.
  • Ratio: We also compute the multiplicative convexity gap which is $\langle \nabla f(x_t), x_t-y \rangle/(f(x_t) -f(y)) $. We then report the sum of the numerator/sum of the denominator in each epoch (our function is "well-behaved" if this ratio is a positive constant).

Installing Packages

  1. For BU SCC

Before installing additional packages, we need to set up a virtual environment. Use 'python3 -m venv <env_name>' to create your environment, then 'source <env_name>/bin/activate' to activate. Then, we need to load some existing modules:

module load python3 pytorch cuda

To install the rest of the packages, go to the appropriate project and run pip install -r requirements.txt. If there are any missing packages, just keep pip install <package_name> until there's no error left.

Running instruction

To run the new_clm.py on scc, we can use these following commands:

module load python3 cuda pytorch
source /projectnb/aclab/tranhp/venv/mynewenv/bin/activate

python /projectnb/aclab/tranhp/test_properties/transformers/new_clm.py  --dataset_name the_pile  --model_name_or_path gpt2 --streaming True  --output_dir /projectnb/aclab/tranhp/test_properties/transformers/examples/pytorch/language-modeling/pile_1e-5/ --num_train_epochs 50  --checkpointing_steps epoch  --name pile8_1e-5_900000 --weight_decay 0.01 --learning_rate 1e-5 --max_train_steps 1000000 --max_step 1000001

The results would be logged at new_transformer_project in optimizedlearning wandb project. To resume from the checkpoint, we can add the following arguments (change the last 3 arguments accordingly depending on the current step):

python /projectnb/aclab/tranhp/test_properties/transformers/new_clm.py  --dataset_name the_pile  --model_name_or_path gpt2 --streaming True  --output_dir /projectnb/aclab/tranhp/test_properties/transformers/examples/pytorch/language-modeling/pile_1e-5/ --num_train_epochs 50  --checkpointing_steps epoch  --name pile8_1e-5_900000 --weight_decay 0.01 --learning_rate 1e-5 --max_train_steps 1000000 --max_step 1000001 --resume_from_checkpoint /projectnb/aclab/tranhp/test_properties/transformers/examples/pytorch/language-modeling/pile_1e-5/0_900000  --resume_from_checkpoint_torch /projectnb/aclab/tranhp/test_properties/transformers/examples/pytorch/language-modeling/pile_1e-5/0_900000.pth.tar --starting_step 900000 

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages