-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Short (8/16 bit) atomic RMW operations on RISCV #297
Conversation
8253ce1
to
4dd002b
Compare
src/jit/MemoryInl.h
Outdated
if (!(operationSize & SLJIT_32) && operationSize != SLJIT_MOV32) { | ||
compareTopFalse = sljit_emit_cmp(compiler, SLJIT_NOT_EQUAL, SLJIT_IMM, 0, srcExpectedPair.arg2, srcExpectedPair.arg2w); | ||
} | ||
if (noShortAtomic && size <= 2) { | ||
sljit_emit_op2(compiler, SLJIT_AND, maskReg, 0, baseReg, 0, SLJIT_IMM, 0x3); | ||
sljit_emit_op2(compiler, SLJIT_SHL, maskReg, 0, maskReg, 0, SLJIT_IMM, 3); // multiply by 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
32 bit atomic should be present, so mutiply by 4 should be enough
src/jit/MemoryInl.h
Outdated
sljit_s32 operationSize = SLJIT_MOV; | ||
sljit_s32 size = 0; | ||
sljit_s32 offset = 0; | ||
sljit_s32 operation; | ||
uint32_t options = MemAddress::CheckNaturalAlignment | MemAddress::AbsoluteAddress; | ||
sljit_sw stackTmpStart = CompileContext::get(compiler)->stackTmpStart; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need this newline.
src/jit/ByteCodeParser.cpp
Outdated
case ByteCode::I32AtomicRmwAndOpcode: | ||
case ByteCode::I32AtomicRmwOrOpcode: | ||
case ByteCode::I32AtomicRmwXorOpcode: | ||
case ByteCode::I32AtomicRmwXchgOpcode: { | ||
info = Instruction::kIs32Bit; | ||
requiredInit = OTAtomicRmwI32; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be an OTAtomicRmwShort
which adds an extra tmp register when short atomic is not present. Could use the compiler option bits to check this.
4dd002b
to
28b7aeb
Compare
src/jit/MemoryInl.h
Outdated
maskReg = instr->requiredReg(3); | ||
} | ||
|
||
if (SLJIT_IS_IMM(memValueReg)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the case of unshared memory, the arguments are immediate zeroes
walrus/test/extended/threads/atomic.wast
Line 440 in 7f492b3
;; unshared memory is OK |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it always 0? Cannot use other immediates?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon further investigation memValueReg
is only an immediate in the RMW cases (not RMW Cmpxchg), but in my observation only in the unshared memory test cases. The only value I have seen it take has been 0, but that does not guarantee it will always be 0, but its value is not used in any case; its register is needed as a working TMP register.
But the deciding factor is not the memory being unshared, as the code works fine without the check for immediates even if I remove the shared flag from the very first module at the top of the test file for example.
However just returning in these cases is most likely incorrect; I have updated it so a TMP register is assigned in these cases instead.
src/jit/MemoryInl.h
Outdated
JITArg memValue(operands + 0); | ||
sljit_s32 memValueReg = SLJIT_EXTRACT_REG(memValue.arg); | ||
sljit_s32 maskReg; | ||
sljit_s32 shiftReg; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually these should be initialized to 0.
#if (defined SLJIT_32BIT_ARCHITECTURE && SLJIT_32BIT_ARCHITECTURE) | ||
sljit_emit_op2(compiler, SLJIT_AND32, baseReg, 0, baseReg, 0, SLJIT_IMM, ~0x3); | ||
#else /* !SLJIT_32BIT_ARCHITECTURE */ | ||
sljit_emit_op2(compiler, SLJIT_AND, baseReg, 0, baseReg, 0, SLJIT_IMM, ~0x3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AND and AND32 is the same on 32 bit systems.
src/jit/MemoryInl.h
Outdated
} | ||
|
||
if (noShortAtomic && size <= 2) { | ||
sljit_emit_op2(compiler, SLJIT_AND32, shiftReg, 0, baseReg, 0, SLJIT_IMM, 0x3); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't baseReg is a word sized reg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is; would the more proper way to do this be using SLJIT_AND
here, and then using an SLJIT_MOV32
from and to shiftReg
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, use an and(shiftreg, basereg, 0x3), then move32(shiftreg,shiftreg). The latter is optimized out if not necessary.
src/jit/MemoryInl.h
Outdated
|
||
if (noShortAtomic && size <= 2) { | ||
sljit_emit_op2(compiler, SLJIT_AND32, tmpReg, 0, SLJIT_TMP_DEST_REG, 0, maskReg, 0); | ||
sljit_emit_op2(compiler, SLJIT_LSHR32, SLJIT_TMP_DEST_REG, 0, tmpReg, 0, shiftReg, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these operations are reserved, and used and immedate, then maskReg can be reused, if that helps in some code above.
Is the code tested on RISCV? |
28b7aeb
to
498e142
Compare
I have now tested it on both 32 and 64 bit RISCV. (using vorosl's WIP RISCV pathes so the project compiles, because currently the base project does not compile on 32 or 64 bit RISCV.) |
c7e61d1
to
c16ccd8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change the status from draft to ready for review
src/jit/MemoryInl.h
Outdated
if (noShortAtomic && size <= 2) { | ||
sljit_emit_op2(compiler, SLJIT_AND, shiftReg, 0, baseReg, 0, SLJIT_IMM, 0x3); | ||
sljit_emit_op1(compiler, SLJIT_MOV32, shiftReg, 0, shiftReg, 0); | ||
sljit_emit_op1(compiler, SLJIT_MOV32, shiftReg, 0, shiftReg, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it once is enough.
Since RISCV does not support short atomic RMW operations, only 32 and 64 bit ones, we must modify the correct values inside the 32 bit values ourselves. Signed-off-by: Máté Tokodi [email protected]
c16ccd8
to
bd9a5fa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Add support for short atomic RMW operations for RISCV
Since RISCV does not support short atomic RMW operations, only 32 and 64 bit ones, we must modify the correct values inside the 32 bit values ourselves.