Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rm CastOutputToFloat when finetuning #81

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

maybeluo
Copy link

  1. CastOutputToFloat seems unnecessary when finetuning. When computing loss, the code lm_logits = lm_logits.to(torch.float32) will cast half to float32. I also compare the result w/wo the CastOutputToFloat op and run finetuning two times, the loss curves are similar(I only test with first 500 steps with a small proportion of data from alpaca.json).
  2. rm an invisible space right after the backslash in README.md.

With CastOutputToFloat:

{'loss': 2.2166, 'learning_rate': 9.140000000000001e-05, 'epoch': 5.0}
{'loss': 0.9893, 'learning_rate': 8.14e-05, 'epoch': 10.0}
{'loss': 0.394, 'learning_rate': 7.14e-05, 'epoch': 15.0}
{'loss': 0.1696, 'learning_rate': 6.14e-05, 'epoch': 20.0}
{'loss': 0.063, 'learning_rate': 5.14e-05, 'epoch': 25.0}
{'loss': 0.0224, 'learning_rate': 4.14e-05, 'epoch': 30.0}
{'loss': 0.0126, 'learning_rate': 3.1400000000000004e-05, 'epoch': 35.0}
{'loss': 0.0093, 'learning_rate': 2.1400000000000002e-05, 'epoch': 40.0}
{'loss': 0.0077, 'learning_rate': 1.1400000000000001e-05, 'epoch': 45.0}
{'loss': 0.0068, 'learning_rate': 1.4000000000000001e-06, 'epoch': 50.0}
{'train_runtime': 250.4525, 'train_samples_per_second': 3.993, 'train_steps_per_second': 1.996, 'train_loss': 0.389122870862484, 'epoch': 50.0}

2nd run:

{'loss': 2.2138, 'learning_rate': 9.120000000000001e-05, 'epoch': 5.0}
{'loss': 0.9816, 'learning_rate': 8.120000000000001e-05, 'epoch': 10.0}
{'loss': 0.4238, 'learning_rate': 7.12e-05, 'epoch': 15.0}
{'loss': 0.2162, 'learning_rate': 6.12e-05, 'epoch': 20.0}
{'loss': 0.0774, 'learning_rate': 5.1200000000000004e-05, 'epoch': 25.0}
{'loss': 0.0294, 'learning_rate': 4.12e-05, 'epoch': 30.0}
{'loss': 0.0156, 'learning_rate': 3.12e-05, 'epoch': 35.0}
{'loss': 0.0106, 'learning_rate': 2.12e-05, 'epoch': 40.0}
{'loss': 0.0086, 'learning_rate': 1.1200000000000001e-05, 'epoch': 45.0}
{'loss': 0.0075, 'learning_rate': 1.2000000000000002e-06, 'epoch': 50.0}
{'train_runtime': 250.4801, 'train_samples_per_second': 3.992, 'train_steps_per_second': 1.996, 'train_loss': 0.3984642353057861, 'epoch': 50.0}

Without CastOutputToFloat: 16.4G
1st run:

{'loss': 2.2132, 'learning_rate': 9.120000000000001e-05, 'epoch': 5.0}
{'loss': 0.9828, 'learning_rate': 8.120000000000001e-05, 'epoch': 10.0}
{'loss': 0.3797, 'learning_rate': 7.12e-05, 'epoch': 15.0}
{'loss': 0.1552, 'learning_rate': 6.12e-05, 'epoch': 20.0}
{'loss': 0.0552, 'learning_rate': 5.1200000000000004e-05, 'epoch': 25.0}
{'loss': 0.0218, 'learning_rate': 4.12e-05, 'epoch': 30.0}
{'loss': 0.0123, 'learning_rate': 3.12e-05, 'epoch': 35.0}
{'loss': 0.0091, 'learning_rate': 2.12e-05, 'epoch': 40.0}
{'loss': 0.0076, 'learning_rate': 1.1200000000000001e-05, 'epoch': 45.0}
{'loss': 0.0067, 'learning_rate': 1.2000000000000002e-06, 'epoch': 50.0}
{'train_runtime': 251.2769, 'train_samples_per_second': 3.98, 'train_steps_per_second': 1.99, 'train_loss': 0.38434655278921126, 'epoch': 50.0}

2nd run:

{'loss': 2.2095, 'learning_rate': 9.120000000000001e-05, 'epoch': 5.0}
{'loss': 0.9806, 'learning_rate': 8.120000000000001e-05, 'epoch': 10.0}
{'loss': 0.3772, 'learning_rate': 7.12e-05, 'epoch': 15.0}
{'loss': 0.1662, 'learning_rate': 6.12e-05, 'epoch': 20.0}
{'loss': 0.0543, 'learning_rate': 5.1200000000000004e-05, 'epoch': 25.0}
{'loss': 0.0235, 'learning_rate': 4.12e-05, 'epoch': 30.0}
{'loss': 0.0128, 'learning_rate': 3.12e-05, 'epoch': 35.0}
{'loss': 0.0094, 'learning_rate': 2.12e-05, 'epoch': 40.0}
{'loss': 0.0078, 'learning_rate': 1.1200000000000001e-05, 'epoch': 45.0}
{'loss': 0.007, 'learning_rate': 1.2000000000000002e-06, 'epoch': 50.0}

@datalee
Copy link

datalee commented Jul 31, 2023

mark

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants