Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit specific instruction tests to proper platforms #1371

Open
junliume opened this issue Jul 3, 2024 · 3 comments
Open

Limit specific instruction tests to proper platforms #1371

junliume opened this issue Jul 3, 2024 · 3 comments
Assignees
Labels
bug Something isn't working urgency_blocker blocking feature deliverables

Comments

@junliume
Copy link
Contributor

junliume commented Jul 3, 2024

reg_c.template AsType<float4_t>()(Number<0>{}) = __builtin_amdgcn_smfmac_f32_16x16x32_f16(

in #1309 This instruction should be built for only gfx94 platforms

@junliume junliume added the bug Something isn't working label Jul 3, 2024
@junliume
Copy link
Contributor Author

junliume commented Jul 3, 2024

@illsilin our CI should have one stage with GPU_TARGETS of "gfx1100;gfx90a;gfx942" :)

#1358 and #1372 are both for GPU_TARGETS="gfx1100;gfx90a;gfx942"

How to reproduce:

CXX=/opt/rocm/bin/amdclang++ cmake -DCMAKE_PREFIX_PATH=/opt/rocm -DCMAKE_BUILD_TYPE=Release -DGPU_TARGETS="gfx1100;gfx90a;gfx942" ..

@junliume
Copy link
Contributor Author

junliume commented Jul 3, 2024

More problem fixed in 4b81c7a

Hence GPU_TARGETS MATCHES is very problematic because it find matches only, excluding other targets based on match is very fragile.

@junliume junliume added the urgency_blocker blocking feature deliverables label Jul 3, 2024
@junliume
Copy link
Contributor Author

junliume commented Jul 3, 2024

FYI: additional issues are found when building client_example with multiple targets:

cd ${composable_kernel}/client_example/build
CXX=/opt/rocm/llvm/bin/clang++ cmake -DCMAKE_PREFIX_PATH="${composable_kernel}/install/;/opt/rocm/" -DCMAKE_BUILD_TYPE=release -DGPU_TARGETS="gfx1100;gfx90a" -DCMAKE_CXX_FLAGS=" -O3 " ..
make -j$(nproc)

will cause problems

/data/driver/composable_kernel/install/include/ck/tensor_operation/gpu/block/blockwise_gemm_xdlops.hpp:171:57: note: expression evaluates to '256 == 128'
  171 |         static_assert(ThisThreadBlock::GetNumOfThread() == MWaves * NWaves * WaveSize,
      |                       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated when compiling for gfx1100.

because this above should not be compiled for gfx1100 targets.

meanwhile
https://github.com/ROCm/composable_kernel/blob/fix_1371/client_example/25_wrapper/CMakeLists.txt#L5-L10
is problematic because there could be multiple targets by default or such as GPU_TARGETS="gfx1100;gfx90a"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working urgency_blocker blocking feature deliverables
Projects
None yet
Development

No branches or pull requests

3 participants