Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
DWARF Operation to Create Runtime Overlay Composite Location Description
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
- Loading branch information