Give example on how to handle gradient accumulation with cross-entropy #5282
Job | Run time |
---|---|
1m 57s | |
2m 54s | |
2m 32s | |
2m 30s | |
2m 19s | |
2m 23s | |
2m 15s | |
3m 52s | |
3m 27s | |
9m 41s | |
2m 41s | |
3m 55s | |
3m 56s | |
3m 24s | |
3m 19s | |
3m 29s | |
3m 28s | |
4m 51s | |
4m 45s | |
11m 54s | |
1h 19m 32s |