You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If a previous VALU instruction modifies a VGPR read by DPP, two wait states are required. Note that this hazard affects only the operand that DPP reads. Consider instructions 2 and 3 in the example above; they consume the output from the previous VALU instruction by reading v1 . But DPP applies to v0 , and because v0 is unmodified, wait states are unnecessary.
But when I tried to implement this using hc::__amdgcn_move_dpp I got two NOPs between each two of these instructions:
These NOPs does not must be here because next instruction does not have "DPP dependency" from previous. DPP modifier here affects first source operand.
Ideally this should generate 3 instructions as shown above without NOPs.
Also it is interesting how generated code changes with small modifications to source:
This last has correctly only one NOP for v4 dependency so together with v_add_u32_dpp v3, v2, v2... there are 2 independent instructions between modification to v4 v_mov_b32_e32 v4, v2 and DPP read v_mov_b32_dpp v4, v4.... And uses add3 but this still looks worse than 3 v_add_dpp.
The text was updated successfully, but these errors were encountered:
According to https://gpuopen.com/amd-gcn-assembly-cross-lane-operations/ and "Vega" Instruction Set Architecture Reference Guide:
But when I tried to implement this using
hc::__amdgcn_move_dpp
I got two NOPs between each two of these instructions:dump-gfx900.isa:
These NOPs does not must be here because next instruction does not have "DPP dependency" from previous. DPP modifier here affects first source operand.
Ideally this should generate 3 instructions as shown above without NOPs.
Also it is interesting how generated code changes with small modifications to source:
Removing
asm("s_nop 0");
:Changing 0 to i[0]:
This last has correctly only one NOP for v4 dependency so together with
v_add_u32_dpp v3, v2, v2...
there are 2 independent instructions between modification to v4v_mov_b32_e32 v4, v2
and DPP readv_mov_b32_dpp v4, v4...
. And usesadd3
but this still looks worse than 3v_add_dpp
.The text was updated successfully, but these errors were encountered: