-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DW_OP_entry_value deprecation #1
Comments
Could you give an example of how DW_OP_entry_value would be used for a non-trivial expression? Gdb only implements the operation if the expression is exactly It seems impractical to require a consumer to halt every subprogram entry and capture all DW_OP_entry_value expressions which is what seems to be necessary for many non-trivial expressions. Consider expressions that read memory that may get clobbered by the subprogrm execution. Using CFI only seems to work if the expression only refers to registers, hence the proposal of DW_OP_call_frame_entry_reg to cover this case. |
I recognize that the cost to stop a GPU and transfer all those memory locations would be vastly higher on a GPU than it is on a CPU and I remember John DelSignore not liking having insert a breakpoint to copy down the values even in the CPU case so I understand the impracticality concern. |
You would basically have to make the debugger stop at entry to all functions, as you never know which function will a thread stop at, given SIGSEGV, SIGINT, etc. Also, it wouldn't work with core files. |
I think it is kind of best effor isn't it. I'm going to admit that I haven't dug into the contents of complex location lists but from what I understand from Jakub is that they are heavily dependent on these entry value expressions and in spite of their limitations, they do seem to work well in practice providing a much broader range of coverage for variables than would be possible if they were not there. |
I don't think it is helpful to have features in the specification that a consumer does not full implement. If a producer heavily used a feature, and the consumer did not support it, then things no longer work. The benefit of a standard is that it mandates what produces and consumers must support. Looking a gdb, it seems it only supports one case of DW_OP_entry_value involving registers. So it seems maybe that that is the only case that is really needed. So the suggestion is to define an operation that would support registers that all consumers can support using CFI. Do you know of other cases besides registers in which DW_OP_entry_value is used? If so what debugger supports it since gdb seems limited to only registers? |
Jakub was kind enough to provide me an example: Consider -O2 -gdwarf-5
and put breakpoint on the ++v; line. Only DW_OP_entry_value can When you look at the call site parameters they are different:
The real crux of it is that in the last call by bar, its parameter from its calling scope needs to be referenced to be able to find out what foo's x is. It is not just hoisting it from the calling frame it is a dynamic property. if you look at real-world code, it is very common that not all arguments are used up to the last instruction in each function and when it isn't needed, there is no point when optimizing to keep it in some register or memory. Yet when debugging, being able to determine those values is often essential. |
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
It is common in SIMD vectorization for the compiler to generate code that promotes portions of an array into vector registers. For example, if the hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the compiler may vectorize a loop so that is executes 8 iterations concurrently for each vectorized loop iteration. On the first iteration of the generated vectorized loop, iterations 0 to 7 of the source language loop will be executed using SIMD instructions. Then on the next iteration of the generated vectorized loop, iteration 8 to 15 will be executed, and so on. If the source language loop accesses an array element based on the loop iteration index, the compiler may read the element into a register for the duration of that iteration. Next iteration it will read the next element into the register, and so on. With SIMD, this generalizes to the compiler reading array elements 0 to 7 into a vector register on the first vectorized loop iteration, then array elements 8 to 15 on the next iteration, and so on. The DWARF location description for the array needs to express that all elements are in memory, except the slice that has been promoted to the vector register. The starting position of the slice is a runtime value based on the iteration index modulo the vectorization size. This cannot be expressed by DW_OP_piece and DW_OP_bit_piece which only allow constant offsets to be expressed. Therefore, a new operator is defined that takes two location descriptions, an offset and a size, and creates a composite that effectively uses the second location description as an overlay of the first, positioned according to the offset and size. Consider an array that has been partially registerized such that the currently processed elements are held in registers, whereas the remainder of the array remains in memory. Consider the loop in this C function, for example: extern void foo(uint32_t dst[], uint32_t src[], int len) { for (int i = 0; i < len; ++i) dst[i] += src[i]; } Inside the loop body, the machine code loads src[i] and dst[i] into registers, adds them, and stores the result back into dst[i]. Considering the location of dst and src in the loop body, the elements dst[i] and src[i] would be located in registers, all other elements are located in memory. Let register R0 contain the base address of dst, register R1 contain i, and register R2 contain the registerized dst[i] element. We can describe the location of dst as a memory location with a register location overlaid at a runtime offset involving i: // 1. Memory location description of dst elements located in memory: DW_OP_breg0 0 // 2. Register location description of element dst[i] is located in R2: DW_OP_reg2 // 3. Offset of the register within the memory of dst: DW_OP_breg1 0 DW_OP_lit4 DW_OP_mul // 4. The size of the register element: DW_OP_lit4 // 5. Make a composite location description for dst that is the memory ccoutant#1 // with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4: DW_OP_overlay
DW_OP_entry_value is a DWARF expression operator.
However DW_TAG_call_* may specify a DWARF expression which can specify the location of value of a parameter when called from a particular location. I can see how the evaluation of DW_OP_entry_value can refer back to an expression for the parameter in the call site but I do not see how, you can deprecate an a DWARF expression operator with a tag.
Is the idea to replace DW_OP_entry_value with an some sort of indirect DWARF expression call that refers to the value at a particular call site. I can see implementing DW_OP_entry_value by making reference to a call site where the value or location is stored rather than setting a breakpoint on function entry or virtually unwinding the stack. However, since different call sites could have different calling conventions for non-standard (static) calls, I don't see how such an indirection could be implemented in a DWARF expression itself.
Can you explain how you would rewrite the DWARF expressions that currently make use of DW_OP_entry_value it it were removed?
I can also see how if location lists were using the overlay operator you could unwind the overlays to the PC for the function entry rather than virtually unwinding the stack. But the overlay operator hasn't been introduced yet.
The text was updated successfully, but these errors were encountered: