diff --git a/pocs/cpus/errata/amd/genoa-evex-rsp/README.md b/pocs/cpus/errata/amd/genoa-evex-rsp/README.md new file mode 100644 index 000000000..3d1b0a685 --- /dev/null +++ b/pocs/cpus/errata/amd/genoa-evex-rsp/README.md @@ -0,0 +1,86 @@ +# The EVEX.X bit can load the wrong RSP value into vector registers + +

+Tavis Ormandy
+

+ +> *This document is a Work In Progress and represents an errata currently under investigation* + +## Introduction + +We have observed an error on the AMD Zen 4 family of processors with +EVEX encoded instructions that access the stack pointer. + +The error can be observed with instructions that operate on both vector +registers and general purpose registers simultaneously, such as `vpinsrw`, +`vmovq`, `vctsi2ss`, and so on. + +The error only occurs if you use `RSP` with these instructions. + +It would be a valid but unusual operation to use `RSP` with these instructions, +we believe it is unlikely that any compiler generated code is affected. + +## Details + +If you attempt to load the value of `RSP` into a vector register, the value +actually loaded may lag behind the actual stack pointer. + +We have confirmed the bug is reproducible on the following SKU: + +- `Family=0x19 Model=0x11 Stepping=0x01 Patch=0xa10113b` + +You can verify the current Model, Family, Stepping and Patch level by +examining `/proc/cpuinfo`. + +### Reproducing + +The program `zenrsp.c` is the testcase. + +It should not produce any output unless an affected core detected. + +#### Building + +``` +$ gcc -mavx512vl -o zenrsp zenrsp.c +``` + +#### Running + +The normal expected output of `zenrsp` should be empty. + +On an affected CPU, the output might look like this: + +``` +$ ./zenrsp +after 11125090: 0x697e1d18 vs 0x697e1d20 +after 23257786: 0x697e1d18 vs 0x697e1d20 +after 34307607: 0x697e1d18 vs 0x697e1d20 +after 80446822: 0x697e1d18 vs 0x697e1d20 +after 85419804: 0x697e1d18 vs 0x697e1d20 +after 110056364: 0x697e1d18 vs 0x697e1d20 +after 140417725: 0x697e1d18 vs 0x697e1d20 +after 152543052: 0x697e1d18 vs 0x697e1d20 +after 163199133: 0x697e1d18 vs 0x697e1d20 +after 177559018: 0x697e1d18 vs 0x697e1d20 +``` + +This indicates that sometimes the wrong value was loaded into a vector register. + +### Analysis + +The code simply manipulates `rsp` with a `push`/`pop` sequence, then loads +the stackpointer into `xmm13` with the following instruction: + +``` +{evex} vmovq xmm13, rsp +``` + +We believe that stack operations are not correctly considered dependencies when +the EVEX.X bit is set. + +This results in stale values occasionally being loaded into registers. + +## Conclusion + +It is not clear if any code ever loads the stack pointer into vector registers, +but it is not impossible, and we document it here for reference. diff --git a/pocs/cpus/errata/amd/genoa-evex-rsp/zenrsp.c b/pocs/cpus/errata/amd/genoa-evex-rsp/zenrsp.c new file mode 100644 index 000000000..de88db907 --- /dev/null +++ b/pocs/cpus/errata/amd/genoa-evex-rsp/zenrsp.c @@ -0,0 +1,60 @@ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define __aligned __attribute__((aligned(32))) + +static uint64_t vpinsrw_testcase(uint64_t *correct) +{ + uint64_t regstate[2] __aligned = {0}; + register __m128i r1 asm("xmm13"); + + _mm256_zeroall(); + + // Record stack pointer so we know the correct value. + asm volatile ("mov %%rsp, %0" : "=m"(*correct)); + + // Trigger bug + asm volatile (".intel_syntax noprefix \n" + // The bug is that these stack operations are ignored by the vmovq. + "push rax \n" // stack += 8 + "pop rax \n" // stack -= 8 + //"{evex} vmovq xmm13, rsp \n" + ".byte 0x62 \n" // evex + // RXBR00mm + ".byte 0b00110001 \n" // P0 + // Wvvvv1pp + ".byte 0b11111101 \n" // P1 + // zLLbVaaa + ".byte 0b00001000 \n" // P2 + ".byte 0x6e, 0xec \n" // movq + ".att_syntax prefix \n" + ); + + // Grab the first word, which should be equal to sp, right? + _mm_storeu_si128((void *) regstate, r1); + return regstate[0]; +} + +int main(int argc, char **argv) +{ + uint64_t correct; + uint64_t result; + + for (uint64_t i = 0 ;; i++) { + result = vpinsrw_testcase(&correct); + + if (correct != result) { + fprintf(stderr, "after %llu: %#x vs %#x\n", i, result, correct); + } + } + return 0; +} diff --git a/pocs/cpus/errata/amd/genoa-lps-hps/README.md b/pocs/cpus/errata/amd/genoa-lps-hps/README.md new file mode 100644 index 000000000..d9e4b9ae2 --- /dev/null +++ b/pocs/cpus/errata/amd/genoa-lps-hps/README.md @@ -0,0 +1,105 @@ +# EVEX encoded MOVLPS/MOVHPS can modify incorrect destination +

+Tavis Ormandy
+

+ +> *This document is a Work In Progress and represents an errata currently under investigation* + +## Introduction + +We have observed an error on the AMD Zen 4 family of processors with +EVEX encoded `VMOVLPS` and `VMOVHPS`. + +The `MOVLPS` and `MOVHPS` instructions load two 32-bit packed single precision +floats from the source operand into the low or high 64-bits of a vector +register. + +To illustrate this, consider this minimal example: + +```asm +section .data + a: dd 0x11111111, 0x22222222 + b: dd 0x33333333, 0x44444444 + +section .text + movhps xmm0, [rel a] + movlps xmm0, [rel b] +``` + +The result should be `xmm0` has the value `0x22222222111111114444444433333333`. + +## Details + +It is possible to use a three operand form of these instructions, where the two +merged source operands are placed in a third destination operand. For example: + +``` + vmovhps xmm0, xmm1, [rel a] +``` + +Consider this sequence: + +```asm +section .data + data: dd 0x11111111, 0x22222222, 0x33333333, 0x44444444 + zero: dd 0,0,0,0 + +section .text + vmovdqu xmm0, [rel data] + vmovlps xmm1, xmm0, [rel zero] + vmovhps xmm17, xmm0, [rel zero] +``` + +The expected result would be: + +``` +xmm0 = 0x44444444333333332222222211111111 +xmm1 = 0x44444444333333330000000000000000 +xmm17 = 0x00000000000000002222222211111111 +``` + +However, on genoa we non-deterministically get `xmm1=0`. + +- `Family=0x19 Model=0x11 Stepping=0x01 Patch=0xa10113b` + +You can verify the current Model, Family, Stepping and Patch level by +examining `/proc/cpuinfo`. + +### Reproducing + +The program `movhps.c` is the testcase. + +It should not produce any output unless an affected core detected. + +#### Building + +``` +$ gcc -mavx512vl -o movhps movhps.c +``` + +#### Running + +The normal expected output of `movhps` should be empty. + +On an affected CPU, the output might look like this: + +``` +$ ./movhps +After 1: 0000000000000000, 0000000000000000 +After 2: 0000000000000000, 0000000000000000 +After 1: 0000000000000000, 0000000000000000 +After 2: 0000000000000000, 0000000000000000 +After 1: 0000000000000000, 0000000000000000 +After 2: 0000000000000000, 0000000000000000 +``` + +This indicates that sometimes the wrong value was tested. + +### Conclusion + +It is possible for incorrect code to be generated when using compiler +intrinsics. It is not clear what values are being tested, or if it is possible +to infer any other state. + +AMD have indicated that they do not believe this is a security issue, but gave +no further explanation when asked. diff --git a/pocs/cpus/errata/amd/genoa-lps-hps/movhps.c b/pocs/cpus/errata/amd/genoa-lps-hps/movhps.c new file mode 100644 index 000000000..d8c3b1cac --- /dev/null +++ b/pocs/cpus/errata/amd/genoa-lps-hps/movhps.c @@ -0,0 +1,53 @@ +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define __aligned __attribute__((aligned(32))) + +#if !defined(__AVX512VL__) +# error You must compile this with -mavx512vl to get the needed intrinsics +#endif + +static const uint64_t kData[] = { 0x4444444444444444, 0x4242424242424242 }; +static const uint64_t kZero; + +static void vmovhps_testcase() +{ + uint64_t result[2] __aligned = {0}; + register __m128i r0 asm("xmm0"); + register __m128i r1 asm("xmm1"); + register __m128i r17 asm("xmm17"); + uint64_t count = 0; + + _mm256_zeroall(); + + do { + count++; + + // Trigger bug + asm volatile ("vmovdqu %1, %0" : "=v"(r0) : "m"(kData)); + asm volatile ("vmovlps %2, %1, %0" : "=v"(r1) : "v"(r0), "m"(kZero)); + asm volatile ("vmovhps %2, %1, %0" : "=v"(r17) : "v"(r0), "m"(kZero)); + } while (!_mm_testz_si128(r1, r1)); + + _mm_storeu_si128((void *) result, r1); + + fprintf(stderr, "After %llu: %016llx, %016llx\n", count, result[0], result[1]); + return; +} + +int main(int argc, char **argv) +{ + while (true) { + vmovhps_testcase(); + } + return 0; +}