Give example on how to handle gradient accumulation with cross-entropy #5283
Job | Run time |
---|---|
1m 48s | |
3m 12s | |
2m 29s | |
2m 23s | |
2m 20s | |
2m 27s | |
2m 22s | |
4m 2s | |
3m 34s | |
9m 57s | |
2m 46s | |
4m 11s | |
3m 45s | |
3m 40s | |
3m 27s | |
3m 26s | |
3m 22s | |
5m 10s | |
4m 29s | |
10m 37s | |
1h 19m 27s |