Updating the Flash Attention version to fix cross entropy loss #812

ShashankMosaicML · 2023-12-19T21:54:26Z

The cross entropy loss of Flash Attention v2.3.2 (and lower) throws illegal memory access error when used with large (device train microbatch size X sequence length X vocabulary). To fix this we had reverted back to FA v1's CE loss in this PR (#795). However, we discovered that for very large (device train microbatch size X sequence length X vocabulary), FA v1's CE loss runs into numerical precision errors, causing divergence.

The newer versions of Flash Attention (v2.3.3 and higher) seem to have solved both of these problems, and hence in this PR, we update the repo to start using FA v2.3.6 (the latest version) instead of FA v2.3.2.

The blue loss curve below corresponds to the training run using FA v2.3.6's CE loss, and the pink curve corresponds to training run using FA v1's CE loss.

Pulling the latest commits from main fork

Pulling from the main repo

Pulling from mosaicml/llm-foundry main

Merging from mosaic main

Pulling from mosaic main

Pulling from mosaic main.

setup.py

dakinggg · 2023-12-20T00:31:48Z

throughput, memory, and loss before and after:

dakinggg

LGTM, please add a PR description explaining stuff

ShashankMosaicML · 2023-12-20T00:57:09Z

LGTM, please add a PR description explaining stuff

Done.

ShashankMosaicML and others added 17 commits October 9, 2023 10:27

Merge pull request #1 from mosaicml/main

04dd334

Pulling the latest commits from main fork

Merge pull request #8 from mosaicml/main

87b2fdc

Pulling from the main repo

Merge pull request #12 from mosaicml/main

c9a42e4

Pulling from mosaicml/llm-foundry main

Merge branch 'mosaicml:main' into main

ddea9ee

Merge pull request #13 from mosaicml/main

0bcd8ee

Merging from mosaic main

Merge pull request #14 from mosaicml/main

f209b58

Pulling from mosaic main

Merge pull request #15 from mosaicml/main

ec4378d

Pulling from mosaic main.

Merge branch 'mosaicml:main' into main

b436706

..

bcace03

Merge branch 'mosaicml:main' into main

cf4aa58

Merge branch 'mosaicml:main' into main

7c35ce8

..

0a8ebfb

..

6f18a33

Merge branch 'mosaicml:main' into main

f42d585

Merge branch 'mosaicml:main' into main

2f3f53c

..

a452998

..

7530205

dakinggg reviewed Dec 19, 2023

View reviewed changes

setup.py Show resolved Hide resolved

setup.py Show resolved Hide resolved

ShashankMosaicML added 2 commits December 19, 2023 23:16

..

ce8da0e

..

f3990fa

ShashankMosaicML requested review from nik-mosaic and vchiley December 20, 2023 00:19

..

511f92f

ShashankMosaicML marked this pull request as ready for review December 20, 2023 00:22

ShashankMosaicML requested a review from a team as a code owner December 20, 2023 00:22

Merge branch 'main' into shashank/update_FA_version_fix_CE

b5cb0fa

dakinggg approved these changes Dec 20, 2023

View reviewed changes

ShashankMosaicML merged commit 2ba9224 into mosaicml:main Dec 20, 2023
10 checks passed

ShashankMosaicML deleted the shashank/update_FA_version_fix_CE branch December 20, 2023 00:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating the Flash Attention version to fix cross entropy loss #812

Updating the Flash Attention version to fix cross entropy loss #812

ShashankMosaicML commented Dec 19, 2023 •

edited

Loading

dakinggg commented Dec 20, 2023

dakinggg left a comment

ShashankMosaicML commented Dec 20, 2023

Updating the Flash Attention version to fix cross entropy loss #812

Updating the Flash Attention version to fix cross entropy loss #812

Conversation

ShashankMosaicML commented Dec 19, 2023 • edited Loading

dakinggg commented Dec 20, 2023

dakinggg left a comment

Choose a reason for hiding this comment

ShashankMosaicML commented Dec 20, 2023

ShashankMosaicML commented Dec 19, 2023 •

edited

Loading