forked from pytorch/pytorch
-
Notifications
You must be signed in to change notification settings - Fork 56
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[release/2.4] [ROCm][layer_norm] Use __builtin_amdgcn_rcpf(x) instead…
… of 1.f/x (#1688) Replace (more) exact calculation with hardware approximation. Benefits: Reduced code size. Improved performance for certain scenarios. Experiments show low reduction in precision. Experiments show no significant performance regressions. bfloat16 as well as float16 related calculations may benefit largely from this change. vectorized_layer_norm_kernel: Gains performance esp. for the following tensor shapes. Lower values for dim1 do not change performance significantly. dim1 = 8k-65k may gain considerable performance, but decline gradually with size. ``` dim0 dim1 ---- ---- 1024 8192 1024 16384 1024 32768 1024 65536 1024 131072 1024 262144 1024 524288 ``` Co-authored-by: Hashem Hashemi <[email protected]>
- Loading branch information
1 parent
579c159
commit f0a620f
Showing
3 changed files
with
28 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters