-
Notifications
You must be signed in to change notification settings - Fork 569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i#6758: Fix AArch64 FP state at fcache events and attach #6757
Conversation
Fixes the slot used to save and restore FP regs at fcache enter and return events. PR #6725 adjusted the slots used during signal handling in core/unix/signal_linux_aarch64.c but did not adjust the same in fcache enter/exit and attach events. Prior to that the FP regs were simply stored in a continuous manner in signal handling code, and fcache enter/exit routines. The mismatch between slot usage caused failueres in the signalNNN1 tests on some systems.
This PR is probably not ready completely yet, but would be good to get some quick feedback since this is blocking downstream pulls. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change makes sense to me but probably best to have @AssadHashmi take a look as I am not an expert in a64 SIMD: as you can tell I'm confused by the svep field being a full dr_simd_t and other things.
#6761 is fixed now. Rerunning tests. |
PR #6757 fixed the way we read/write SVE register slots but unfortunately it is now broken on systems with 128-bit vector length. Both SVE vector and predicate registers use dr_simd_t slots which is a 64-byte type meant to store up to 512-bit vector registers. SVE predicate registers are always 1/8 the size of the vector register so for 512-bit vector length systems we only really need 64 / 8 = 8 bytes to store predicate registers. The ldr/str instructions we use to read and write the predicate register slots have a base+offset memory operand where the offset is a value in the range [-256, 255] scaled based by predicate register length. We read and write the registers by setting the base address to the address of the first slot, and setting the offset to n * sizeof(dr_simd_t) for each register Pn. For systems with 128-bit vector length, this means the predicate registers are 16 / 8 = 2 bytes so the maximum offset we can reach is 2 * 255 = 510 bytes. This means on 128-bit VL systems we can only reach the first 8 predicate registers (8 * sizeof(dr_simd_t) = 512). By changing the predicate register and FFR slots to use a new type dr_svep_t which is 1/8 the size of dr_simd_t we can fix this bug and save space. dr_svep_t is currently 8 bytes to correspond to 64 byte vectors, but even if we extend DynamoRIO to support the maximum SVE vector length of 2048-bits (256 bytes) dr_svep_t will only need to be increased to 256 / 8 = 32 bytes so the maximum offset (15 * 32 = 480 bytes) will always be in range. As this changes the size of the predicate register and FFR slots, this changes the size of the dr_mcontext_t structure and breaks backwards compatibility with earlier versions of DynamoRIO so the version number is increased to 10.90. Issues: #6760, #5365 Fixes: #6760
…6774) PR #6757 fixed the way we read/write SVE register slots but unfortunately it is now broken on systems with 128-bit vector length. Both SVE vector and predicate registers use dr_simd_t slots which is a 64-byte type meant to store up to 512-bit vector registers. SVE predicate registers are always 1/8 the size of the vector register so for 512-bit vector length systems we only really need 64 / 8 = 8 bytes to store predicate registers. The ldr/str instructions we use to read and write the predicate register slots have a base+offset memory operand where the offset is a value in the range [-256, 255] scaled based by predicate register length. We read and write the registers by setting the base address to the address of the first slot, and setting the offset to n * sizeof(dr_simd_t) for each register Pn. For systems with 128-bit vector length, this means the predicate registers are 16 / 8 = 2 bytes so the maximum offset we can reach is 2 * 255 = 510 bytes. This means on 128-bit VL systems we can only reach the first 8 predicate registers (8 * sizeof(dr_simd_t) = 512). By changing the predicate register and FFR slots to use a new type dr_svep_t which is 1/8 the size of dr_simd_t we can fix this bug and save space. dr_svep_t is currently 8 bytes to correspond to 64 byte vectors, but even if we extend DynamoRIO to support the maximum SVE vector length of 2048-bits (256 bytes) dr_svep_t will only need to be increased to 256 / 8 = 32 bytes so the maximum offset (15 * 32 = 480 bytes) will always be in range. As this changes the size of the predicate register and FFR slots, this changes the size of the dr_mcontext_t structure and breaks backwards compatibility with earlier versions of DynamoRIO so the version number is increased to 10.90. Issues: #6760, #5365 Fixes: #6760
Fixes the slot used to save and restore FP regs at fcache enter and return events. PR #6725 adjusted the slots used during signal handling in core/unix/signal_linux_aarch64.c but did not adjust the same in fcache enter/return and attach events. Prior to that PR, the FP regs were simply stored in a contiguous manner in signal handling code and fcache enter/return routines (instead of in their respective dr_simd_t struct member), which was a bit confusing.
The mismatch between slot usage in signal handling and fcache enter/return code caused failures in the signalNNN1 tests on some systems. Verified that those tests pass with this fix.
Also fixes the same issue in save_priv_mcontext_helper which is used in the dr_app_start API. Unit tests for this scenario will be added as part of #6759.
Issue: #5036, #6755, #5365
Fixes #6758