From f36a8e130fb1f37515c67b0e4ff2beeb80968a7e Mon Sep 17 00:00:00 2001 From: Tony Tye Date: Thu, 5 Jan 2023 15:36:02 -0500 Subject: [PATCH] DWARF Operation to Create Runtime Overlay Composite Location Description It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory #1 // with the register #2 positioned as an overlay at offset #3 of size #4: DW_OP_overlay --- ...verlay-composite-location-descriptions.txt | 145 ++++++++++++++++++ DWARF Specification.txt | 55 ++++++- 2 files changed, 198 insertions(+), 2 deletions(-) create mode 100644 016-overlay-composite-location-descriptions.txt diff --git a/016-overlay-composite-location-descriptions.txt b/016-overlay-composite-location-descriptions.txt new file mode 100644 index 0000000..771108a --- /dev/null +++ b/016-overlay-composite-location-descriptions.txt @@ -0,0 +1,145 @@ +Part 16: DWARF Operation to Create Runtime Overlay Composite Location Description + +PROBLEM DESCRIPTION + +It is common in SIMD vectorization for the compiler to generate code that +promotes portions of an array into vector registers. For example, if the +hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the +compiler may vectorize a loop so that is executes 8 iterations concurrently for +each vectorized loop iteration. + +On the first iteration of the generated vectorized loop, iterations 0 to 7 of +the source language loop will be executed using SIMD instructions. Then on the +next iteration of the generated vectorized loop, iteration 8 to 15 will be +executed, and so on. + +If the source language loop accesses an array element based on the loop +iteration index, the compiler may read the element into a register for the +duration of that iteration. Next iteration it will read the next element into +the register, and so on. With SIMD, this generalizes to the compiler reading +array elements 0 to 7 into a vector register on the first vectorized loop +iteration, then array elements 8 to 15 on the next iteration, and so on. + +The DWARF location description for the array needs to express that all elements +are in memory, except the slice that has been promoted to the vector register. +The starting position of the slice is a runtime value based on the iteration +index modulo the vectorization size. This cannot be expressed by DW_OP_piece +and DW_OP_bit_piece which only allow constant offsets to be expressed. + +Therefore, a new operator is defined that takes two location descriptions, an +offset and a size, and creates a composite that effectively uses the second +location description as an overlay of the first, positioned according to the +offset and size. + +Consider an array that has been partially registerized such that the currently +processed elements are held in registers, whereas the remainder of the array +remains in memory. Consider the loop in this C function, for example: + + extern void foo(uint32_t dst[], uint32_t src[], int len) { + for (int i = 0; i < len; ++i) + dst[i] += src[i]; + } + +Inside the loop body, the machine code loads src[i] and dst[i] into registers, +adds them, and stores the result back into dst[i]. + +Considering the location of dst and src in the loop body, the elements dst[i] +and src[i] would be located in registers, all other elements are located in +memory. Let register R0 contain the base address of dst, register R1 contain i, +and register R2 contain the registerized dst[i] element. We can describe the +location of dst as a memory location with a register location overlaid at a +runtime offset involving i: + + // 1. Memory location description of dst elements located in memory: + DW_OP_breg0 0 + + // 2. Register location description of element dst[i] is located in R2: + DW_OP_reg2 + + // 3. Offset of the register within the memory of dst: + DW_OP_breg1 0 + DW_OP_lit4 + DW_OP_mul + + // 4. The size of the register element: + DW_OP_lit4 + + // 5. Make a composite location description for dst that is the memory #1 + // with the register #2 positioned as an overlay at offset #3 of size #4: + DW_OP_overlay + +PROPOSAL + +In Section 2.5.4.4.6 "Composite Location Description Operations" of [Allow +location description on the DWARF evaluation stack], add the following +operations after DW_OP_bit_piece and DW_OP_piece_end: + + ---------------------------------------------------------------------------- + 4. DW_OP_overlay + DW_OP_overlay pops four stack entries. The first must be an integral + type value that represents the overlay byte size value S. The second + must be an integral type value that represents the overlay byte offset + value O. The third must be a location description that represents the + overlay location description OL. The fourth must be a location + description that represents the base location description BL. + + The action is the same as for DW_OP_bit_overlay, except that the overlay + bit size BS and overlay bit offset BO used are S and O respectively + scaled by 8 (the byte size). + + 5. DW_OP_bit_overlay + DW_OP_bit_overlay pops four stack entries. The first must be an integral + type value that represents the overlay bit size value BS. The second + must be an integral type value that represents the overlay bit offset + value BO. The third must be a location description that represents the + overlay location description OL. The fourth must be a location + description that represents the base location description BL. + + The DWARF expression is ill-formed if BS or BO are negative values. + + rbss(L) is the minimum remaining bit storage size of L which is defined + as follows. LS is the location storage and LO is the location bit offset + specified by a single location description SL of L. The remaining bit + storage size RBSS of SL is the bit size of LS minus LO. rbss(L) is the + minimum RBSS of each single location description SL of L. + + The DWARF expression is ill-formed if rbss(BL) is less than BO plus BS. + + If BS is 0, then the operation pushes BL. + + If BO is 0 and BS equals rbss(BL), then the operation pushes OL. + + Otherwise, the operation is equivalent to performing the following steps + to push a composite location description. + + [non-normative] The composite location description is conceptually the + base location description BL with the overlay location description OL + positioned as an overlay starting at the overlay offset BO and covering + overlay bit size BS. + + 1. If BO is not 0 then push BL followed by performing the + DW_OP_bit_piece BO, 0 operation. + 2. Push OL followed by performing the DW_OP_bit_piece BS, 0 operation. + 3. If rbss(BL) is greater than BO plus BS, push BL followed by + performing the DW_OP_bit_piece (rbss(BL) - BO - BS), (BO + BS) + operation. + 4. Perform the DW_OP_piece_end operation. + ---------------------------------------------------------------------------- + +In Section 7.7.1 Operation Expressions of [Allow +location description on the DWARF evaluation stack], add the following +rows to Table 7.9 "DWARF Operation Encodings": + + ---------------------------------------------------------------------------- + + Table 7.9: DWARF Operation Encodings + ================================== ===== ======== ========================== + Operation Code Number Notes + of + Operands + ================================== ===== ======== ========================== + DW_OP_overlay TBA 0 + DW_OP_bit_overlay TBA 0 + ================================== ===== ======== ========================== + + ---------------------------------------------------------------------------- diff --git a/DWARF Specification.txt b/DWARF Specification.txt index 2f5e5c0..78a7704 100644 --- a/DWARF Specification.txt +++ b/DWARF Specification.txt @@ -1875,7 +1875,56 @@ defined to be compatible with the definitions in DWARF Version 5. updated to be a complete composite location description with the same parts. -4. DW_OP_extend +4. DW_OP_overlay + DW_OP_overlay pops four stack entries. The first must be an integral type + value that represents the overlay byte size value S. The second must be an + integral type value that represents the overlay byte offset value O. The + third must be a location description that represents the overlay location + description OL. The fourth must be a location description that represents + the base location description BL. + + The action is the same as for DW_OP_bit_overlay, except that the overlay + bit size BS and overlay bit offset BO used are S and O respectively scaled + by 8 (the byte size). + +5. DW_OP_bit_overlay + DW_OP_bit_overlay pops four stack entries. The first must be an integral + type value that represents the overlay bit size value BS. The second must + be an integral type value that represents the overlay bit offset value BO. + The third must be a location description that represents the overlay + location description OL. The fourth must be a location description that + represents the base location description BL. + + The DWARF expression is ill-formed if BS or BO are negative values. + + rbss(L) is the minimum remaining bit storage size of L which is defined as + follows. LS is the location storage and LO is the location bit offset + specified by a single location description SL of L. The remaining bit + storage size RBSS of SL is the bit size of LS minus LO. rbss(L) is the + minimum RBSS of each single location description SL of L. + + The DWARF expression is ill-formed if rbss(BL) is less than BO plus BS. + + If BS is 0, then the operation pushes BL. + + If BO is 0 and BS equals rbss(BL), then the operation pushes OL. + + Otherwise, the operation is equivalent to performing the following steps to + push a composite location description. + + [non-normative] The composite location description is conceptually the base + location description BL with the overlay location description OL positioned + as an overlay starting at the overlay offset BO and covering overlay bit + size BS. + + 1. If BO is not 0 then push BL followed by performing the DW_OP_bit_piece + BO, 0 operation. + 2. Push OL followed by performing the DW_OP_bit_piece BS, 0 operation. + 3. If rbss(BL) is greater than BO plus BS, push BL followed by performing + the DW_OP_bit_piece (rbss(BL) - BO - BS), (BO + BS) operation. + 4. Perform the DW_OP_piece_end operation. + +6. DW_OP_extend DW_OP_extend has two operands. The first is an unsigned LEB128 integer that represents the element bit size S. The second is an unsigned LEB128 integer that represents a count C. @@ -1893,7 +1942,7 @@ defined to be compatible with the definitions in DWARF Version 5. The DWARF expression is ill-formed if the element bit size or count are 0. -5. DW_OP_select_bit_piece +7. DW_OP_select_bit_piece DW_OP_select_bit_piece has two operands. The first is an unsigned LEB128 integer that represents the element bit size S. The second is an unsigned LEB128 integer that represents a count C. @@ -3486,6 +3535,8 @@ DW_OP_aspace_bregx TBA 2 ULEB128 register number, DW_OP_aspace_implicit_pointer TBA 2 4-byte or 8-byte offset of DIE, SLEB128 byte displacement +DW_OP_overlay TBA 0 +DW_OP_bit_overlay TBA 0 DW_OP_extend TBA 2 ULEB128 bit size, ULEB128 count DW_OP_select_bit_piece TBA 2 ULEB128 bit size,