Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernels with LDS bank conflicts #177

Open
rosenrodt opened this issue Apr 7, 2022 · 2 comments
Open

Kernels with LDS bank conflicts #177

rosenrodt opened this issue Apr 7, 2022 · 2 comments
Labels
Performance Issue performance issue due to regression or something fishy Under Investigation

Comments

@rosenrodt
Copy link
Contributor

rosenrodt commented Apr 7, 2022

Our kernel sees partial 2-way bank conflict for K-contiguous matrices, and full 2-way/4-way conflict for MN-contiguous matrices (dependent on tile sizes). Profiling has shown partial 2-way bank conflict for K-contiguous matrices doesn't result in observable impact on LDS issue latency. Therefore, our focus is to resolve bank conflict for MN-contiguous matrices.

The following LDS layout is chosen to avoid LDS write bank conflict for MN-contiguous matrices:

  • K0_M_1/K0_N_1 (fp32)
  • K0_M_2/K0_2_N (bf16/fp16)
  • K0_M_4/K0_N_4 (int8)

Conflict-free kernels are already added to C-shuffle device GEMM instances defined in library/src/tensor_operation_instance/gpu/gemm/device_gemm_xdl_c_shuffle_(type)_(layout)_instance.cpp. GridwiseGemm_k0mk1_k0nk1_mn_xdl_cshuffle_v1 (the underlying implementation of C-shuffle device GEMM) supports different K1 value for each individual A/B matrix to facilitate preferred conflict-free LDS layout for each A/B tile.

Other GEMM/Conv device kernels that do not yet facilitate the latest C-shuffle gridwise GEMM implementation still observe bank conflicts.

GEMM

  • Batched GEMM
  • Group GEMM
  • Split-K GEMM
  • Non-C-shuffle ordinary GEMM DeviceGemmXdl

Conv

  • Backward data (conflict in weight)
  • Backward weight (conflict in both output gradient and activation)
  • Forward prop maps to K-contiguous implicit GEMM layout so there's no observable bank conflict.
@weixingzhang
Copy link

@rosenrodt how can I check whether it is partial 2-way or full 2-way/4-way conflict? Thanks!

@asroy asroy added the Performance Issue performance issue due to regression or something fishy label May 7, 2022
@ppanchad-amd
Copy link

@rosenrodt Apologies for the lack of response. Can you please if this is still an issue on the latest ROCm 6.2? If not, please close the ticket. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Performance Issue performance issue due to regression or something fishy Under Investigation
Projects
None yet
Development

No branches or pull requests

4 participants