You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This affects also HIP so maybe I should move this issue there. Actually it is rather LLVM issue. I put it here because I encountered this with hcc. Many issues here in hcc apply also for hip.
#include<hc.hpp>intmain()
{
hc::array_view<int> data(1);
parallel_for_each(hc::extent<1>(1), [=](hc::index<1> i) [[hc]]
{
int__amdgcn_update_dpp(int old, int src, int dpp_ctrl, int row_mask, int bank_mask, bool bound_ctrl) [[hc]] asm("llvm.amdgcn.update.dpp.i32");
int d = data[i[0]];
d = __amdgcn_update_dpp(0, d, 1, 14, 15, false) + d;
data[i[0]] = d;
});
return0;
}
This is probably happening because when is run "GCN DPP Combine" pass v_add instruction is in form V_ADD_U32_e64 with immediate value 0 which seems DPP combine pass will not combine when row and bank masks are not full:
Xor and max are working because old argument to llvm.amdgcn.update.dpp is identity for respective operation. When is source register out of bounds or masked by row or bank mask then __amdgcn_update_dpp will "return" identity and xor/max operation is nop and hence v_mov_dpp can be combined with v_xor into v_xor_dpp (which will behave equivalently).
In case of addition identity is zero so it should also work.
Btw this combining is also happening on gfx803 where v_add modifies vcc if I am not wrong. But that probably does not matter if vcc from this v_add is not used.
But seems it is not working when translating from LLVM IR.
Also llvm.amdgcn.update.dpp is not the most happy solution because when I want for example implement parallel reduction using binary operation as template argument then I need to also define identity value for each possible binary operation. Ideally it should be easier to generate _dpp instructions without need to use identity.
The text was updated successfully, but these errors were encountered:
This affects also HIP so maybe I should move this issue there. Actually it is rather LLVM issue. I put it here because I encountered this with hcc. Many issues here in hcc apply also for hip.
dump-gfx900.isa:
This is probably happening because when is run "GCN DPP Combine" pass
v_add
instruction is in formV_ADD_U32_e64
with immediate value 0 which seems DPP combine pass will not combine when row and bank masks are not full:Only later is v_add changed to V_ADD_U32_e32 and now would DPP combine work as shown bellow with dpp_combine.mir.
But this works:
When I change operation for example to xor or max:
Then is dpp combining working:
Xor and max are working because
old
argument tollvm.amdgcn.update.dpp
is identity for respective operation. When is source register out of bounds or masked by row or bank mask then__amdgcn_update_dpp
will "return" identity and xor/max operation is nop and hencev_mov_dpp
can be combined withv_xor
intov_xor_dpp
(which will behave equivalently).In case of addition identity is zero so it should also work.
Test "old_is_0" from here demonstrates it: https://github.com/llvm/llvm-project/blob/master/llvm/test/CodeGen/AMDGPU/dpp_combine.mir
Result from
/opt/rocm/hcc/bin/llc -march=amdgcn -mcpu=gfx900 -run-pass=gcn-dpp-combine
:Btw this combining is also happening on gfx803 where
v_add
modifiesvcc
if I am not wrong. But that probably does not matter ifvcc
from thisv_add
is not used.But seems it is not working when translating from LLVM IR.
Also
llvm.amdgcn.update.dpp
is not the most happy solution because when I want for example implement parallel reduction using binary operation as template argument then I need to also define identity value for each possible binary operation. Ideally it should be easier to generate_dpp
instructions without need to use identity.The text was updated successfully, but these errors were encountered: