Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DW_OP_entry_value deprecation #1

Open
woodard opened this issue Nov 28, 2022 · 6 comments
Open

DW_OP_entry_value deprecation #1

woodard opened this issue Nov 28, 2022 · 6 comments

Comments

@woodard
Copy link
Contributor

woodard commented Nov 28, 2022

DW_OP_entry_value is a DWARF expression operator.
However DW_TAG_call_* may specify a DWARF expression which can specify the location of value of a parameter when called from a particular location. I can see how the evaluation of DW_OP_entry_value can refer back to an expression for the parameter in the call site but I do not see how, you can deprecate an a DWARF expression operator with a tag.

Is the idea to replace DW_OP_entry_value with an some sort of indirect DWARF expression call that refers to the value at a particular call site. I can see implementing DW_OP_entry_value by making reference to a call site where the value or location is stored rather than setting a breakpoint on function entry or virtually unwinding the stack. However, since different call sites could have different calling conventions for non-standard (static) calls, I don't see how such an indirection could be implemented in a DWARF expression itself.

Can you explain how you would rewrite the DWARF expressions that currently make use of DW_OP_entry_value it it were removed?

I can also see how if location lists were using the overlay operator you could unwind the overlays to the PC for the function entry rather than virtually unwinding the stack. But the overlay operator hasn't been introduced yet.

@t-tye
Copy link
Contributor

t-tye commented Nov 29, 2022

Could you give an example of how DW_OP_entry_value would be used for a non-trivial expression? Gdb only implements the operation if the expression is exactly DW_OP_reg* or DW_OP_breg*; DW_OP_deref*. If the purpose is to get parameter values on entry to the subprogram then the call site attributes could be used. If the value of a register on entry to the subprogram is required then the proposed DW_OP_call_frame_entry_reg (not part of this proposal) can be used.

It seems impractical to require a consumer to halt every subprogram entry and capture all DW_OP_entry_value expressions which is what seems to be necessary for many non-trivial expressions. Consider expressions that read memory that may get clobbered by the subprogrm execution. Using CFI only seems to work if the expression only refers to registers, hence the proposal of DW_OP_call_frame_entry_reg to cover this case.

@woodard
Copy link
Contributor Author

woodard commented Nov 29, 2022

I recognize that the cost to stop a GPU and transfer all those memory locations would be vastly higher on a GPU than it is on a CPU and I remember John DelSignore not liking having insert a breakpoint to copy down the values even in the CPU case so I understand the impracticality concern.

@palves
Copy link

palves commented Nov 29, 2022

You would basically have to make the debugger stop at entry to all functions, as you never know which function will a thread stop at, given SIGSEGV, SIGINT, etc.

Also, it wouldn't work with core files.

@woodard
Copy link
Contributor Author

woodard commented Nov 30, 2022

You would basically have to make the debugger stop at entry to all functions, as you never know which function will a thread stop at, given SIGSEGV, SIGINT, etc.

Also, it wouldn't work with core files.

I think it is kind of best effor isn't it. I'm going to admit that I haven't dug into the contents of complex location lists but from what I understand from Jakub is that they are heavily dependent on these entry value expressions and in spite of their limitations, they do seem to work well in practice providing a much broader range of coverage for variables than would be possible if they were not there.

@t-tye
Copy link
Contributor

t-tye commented Nov 30, 2022

I don't think it is helpful to have features in the specification that a consumer does not full implement. If a producer heavily used a feature, and the consumer did not support it, then things no longer work. The benefit of a standard is that it mandates what produces and consumers must support.

Looking a gdb, it seems it only supports one case of DW_OP_entry_value involving registers. So it seems maybe that that is the only case that is really needed. So the suggestion is to define an operation that would support registers that all consumers can support using CFI.

Do you know of other cases besides registers in which DW_OP_entry_value is used? If so what debugger supports it since gdb seems limited to only registers?

@woodard
Copy link
Contributor Author

woodard commented Nov 30, 2022

Jakub was kind enough to provide me an example:

Consider -O2 -gdwarf-5

volatile int v;
__attribute__((noipa)) void baz (int x)
{
}
__attribute__((noipa)) void foo (int x)
{
  int y = 2 * x;
  baz (5);
  ++v;
}
__attribute__((noipa)) void bar (int x)
{
  foo (1);
  foo (2);
  foo (3);
  foo (4 + x);
}
int
main ()
{
  bar (0);
}

and put breakpoint on the ++v; line. Only DW_OP_entry_value can
ensure you can see values of x and y correctly there.

When you look at the call site parameters they are different:

 <3><b9>: Abbrev Number: 1 (DW_TAG_call_site_parameter)
    <ba>   DW_AT_location    : 1 byte block: 55         (DW_OP_reg5 (rdi))
    <bc>   DW_AT_call_value  : 1 byte block: 31         (DW_OP_lit1)
in the second by
 <3><d0>: Abbrev Number: 1 (DW_TAG_call_site_parameter)
    <d1>   DW_AT_location    : 1 byte block: 55         (DW_OP_reg5 (rdi))
    <d3>   DW_AT_call_value  : 1 byte block: 32         (DW_OP_lit2)
in the third by
 <3><e7>: Abbrev Number: 1 (DW_TAG_call_site_parameter)
    <e8>   DW_AT_location    : 1 byte block: 55         (DW_OP_reg5 (rdi))
    <ea>   DW_AT_call_value  : 1 byte block: 33         (DW_OP_lit3)
and in the fourth by
 <3><fa>: Abbrev Number: 1 (DW_TAG_call_site_parameter)
    <fb>   DW_AT_location    : 1 byte block: 55         (DW_OP_reg5 (rdi))
    <fd>   DW_AT_call_value  : 5 byte block: a3 1 55 23 4       (DW_OP_entry_value: (DW_OP_reg5 (rdi)); DW_OP_plus_uconst: 4)

The real crux of it is that in the last call by bar, its parameter from its calling scope needs to be referenced to be able to find out what foo's x is. It is not just hoisting it from the calling frame it is a dynamic property.

if you look at real-world code, it is very common that not all arguments are used up to the last instruction in each function and when it isn't needed, there is no point when optimizing to keep it in some register or memory. Yet when debugging, being able to determine those values is often essential.

t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 5, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 6, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 7, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 7, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 10, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 10, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 10, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 10, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 10, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 11, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 11, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 12, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 20, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 20, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 20, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 20, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 21, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 24, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 31, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Jan 31, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Feb 1, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Feb 2, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
t-tye added a commit to t-tye/dwarf-locations that referenced this issue Feb 2, 2023
It is common in SIMD vectorization for the compiler to generate code that
promotes portions of an array into vector registers. For example, if the
hardware has vector registers with 8 elements, and 8 wide SIMD instructions, the
compiler may vectorize a loop so that is executes 8 iterations concurrently for
each vectorized loop iteration.

On the first iteration of the generated vectorized loop, iterations 0 to 7 of
the source language loop will be executed using SIMD instructions. Then on the
next iteration of the generated vectorized loop, iteration 8 to 15 will be
executed, and so on.

If the source language loop accesses an array element based on the loop
iteration index, the compiler may read the element into a register for the
duration of that iteration. Next iteration it will read the next element into
the register, and so on. With SIMD, this generalizes to the compiler reading
array elements 0 to 7 into a vector register on the first vectorized loop
iteration, then array elements 8 to 15 on the next iteration, and so on.

The DWARF location description for the array needs to express that all elements
are in memory, except the slice that has been promoted to the vector register.
The starting position of the slice is a runtime value based on the iteration
index modulo the vectorization size. This cannot be expressed by DW_OP_piece
and DW_OP_bit_piece which only allow constant offsets to be expressed.

Therefore, a new operator is defined that takes two location descriptions, an
offset and a size, and creates a composite that effectively uses the second
location description as an overlay of the first, positioned according to the
offset and size.

Consider an array that has been partially registerized such that the currently
processed elements are held in registers, whereas the remainder of the array
remains in memory. Consider the loop in this C function, for example:

    extern void foo(uint32_t dst[], uint32_t src[], int len) {
        for (int i = 0; i < len; ++i)
        dst[i] += src[i];
    }

Inside the loop body, the machine code loads src[i] and dst[i] into registers,
adds them, and stores the result back into dst[i].

Considering the location of dst and src in the loop body, the elements dst[i]
and src[i] would be located in registers, all other elements are located in
memory. Let register R0 contain the base address of dst, register R1 contain i,
and register R2 contain the registerized dst[i] element. We can describe the
location of dst as a memory location with a register location overlaid at a
runtime offset involving i:

    // 1. Memory location description of dst elements located in memory:
    DW_OP_breg0 0

    // 2. Register location description of element dst[i] is located in R2:
    DW_OP_reg2

    // 3. Offset of the register within the memory of dst:
    DW_OP_breg1 0
    DW_OP_lit4
    DW_OP_mul

    // 4. The size of the register element:
    DW_OP_lit4

    // 5. Make a composite location description for dst that is the memory ccoutant#1
    //    with the register ccoutant#2 positioned as an overlay at offset ccoutant#3 of size ccoutant#4:
    DW_OP_overlay
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants