forked from ccoutant/dwarf-locations
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path020-simd-hardware.txt
193 lines (148 loc) · 9.61 KB
/
020-simd-hardware.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
Part 16: Support for Source Language Optimizations that Result in Concurrent Iteration Execution
PROBLEM DESCRIPTION
A compiler can perform loop optimizations that result in the generated code
executing multiple iterations concurrently. For example, software pipelining
schedules multiple iterations in an interleaved fashion to allow the
instructions of one iteration to hide the latencies of the instructions of
another iteration. Another example is vectorization that can exploit SIMD
hardware to allow a single instruction to execute multiple iterations using
vector registers.
Note that although this is similar to SIMT execution, the way a client debugger
uses the information is fundamentally different. In SIMT execution the debugger
needs to present the concurrent execution as distinct source language threads
that the user can list and switch focus between. With iteration concurrency
optimizations, such as software pipelining and vectorized SIMD, the debugger
must not present the concurrency as distinct source language threads. Instead,
it must inform the user that multiple loop iterations are executing in parallel
and allow the user to select between them.
In general, SIMT execution fixes the number of concurrent executions per target
architecture thread. However, both software pipelining and SIMD vectorization
may vary the number of concurrent iterations for different loops executed by a
single source language thread.
It is possible for the compiler to use both SIMT concurrency and iteration
concurrency techniques in the code of a single source language thread.
Therefore, a DWARF operation, DW_OP_push_iteration, is added to denote the
current concurrent iteration instance, much like DW_OP_push_object_address
denotes the current object.
In addition, a way is needed for the compiler to communicate how many source
language loop iterations are executing concurrently. Therefore, a
DW_AT_iterations DWARF attribute is added.
PROPOSAL
In Section 2.2 "Attribute Types", add the following row to Table 2.2 "Attribute
names":
----------------------------------------------------------------------------
Table 2.2: Attribute names
=========================== ====================================
Attribute Usage
=========================== ====================================
DW_AT_iterations Concurrent iteration count (see 3.3.5 "Low-Level Information")
=========================== ====================================
----------------------------------------------------------------------------
In Section 2.5.1 "DWARF Expression Evaluation Context" of [Support for Source
Languages Mapped to SIMT Hardware], add the following after the description of
"A current lane":
----------------------------------------------------------------------------
4. A current iteration
The 0 based source language iteration instance to be used in evaluating
a user presented expression. This applies to target architectures that
support optimizations that result in executing multiple source language
loop iterations concurrently.
[non-normative] For example, software pipelining and SIMD vectorization.
It is required for operations that are related to source language loop
iterations.
[non-normative] For example, the DW_OP_push_iteration operation.
If specified, it must be consistent with the value of the
DW_AT_iterations attribute of the subprogram corresponding to context's
frame and program location. It is consistent if the value is greater
than or equal to 0 and less than the, possibly default, value of the
DW_AT_iterations attribute. Otherwise the result is undefined.
----------------------------------------------------------------------------
In Section 2.5.4.3.1 "Literal Operations" of [Support for Source Languages
Mapped to SIMT Hardware], add the following operation after the DW_OP_push_lane
description:
----------------------------------------------------------------------------
9. DW_OP_push_iteration
DW_OP_push_iteration pushes the current iteration as a value with the
generic type.
[non-normative] For source language implementations with optimizations
that cause multiple loop iterations to execute concurrently, this is the
zero-based iteration number that corresponds to the source language
concurrent loop iteration upon which the user is focused.
The value must be greater than or equal to 0 and less than the value of
the DW_AT_iterations attribute, otherwise the DWARF expression is
ill-formed. See Section 3.3.5 "Low-Level Information".
----------------------------------------------------------------------------
In Section 3.3.5 "Low-Level Information" of [Support for Source Languages Mapped
to SIMT Hardware], add the following after the description of the DW_AT_lanes
attribute:
----------------------------------------------------------------------------
A DW_TAG_subprogram, DW_TAG_inlined_subroutine, or DW_TAG_entry_point
debugger information entry may have a DW_AT_iterations attribute whose value
is an integer constant or a DWARF expression E. Its value is the number of
source language loop iterations executing concurrently by the target
architecture for a single source language thread of execution.
[non-normative] A compiler may generate code that executes more than one
iteration of a source language loop concurrently using optimization
techniques such as software pipelining or SIMD vectorization. The number of
concurrent iterations may vary for different loop nests in the same
subprogram. Typically, this attribute will use a loclist to express
different values at different program locations.
If the attribute is an integer constant, then the value is the constant. The
DWARF is ill-formed if the constant is less than or equal to 0.
Otherwise, E is evaluated with a context that has a result kind of a
location description, an unspecified object, the compilation unit that
contains E, an empty initial stack, and other context elements corresponding
to the source language thread of execution upon which the user is focused,
if any. The DWARF is ill-formed if the result is not a location description
comprised of one implicit location description, that when read as the
generic type, results in a value V that is less than or equal to 0. The
result of the attribute is the value V.
If not present, the default value of 1 is used.
----------------------------------------------------------------------------
In Section 6.4.2 "Call Frame Instructions" of [Support for Source Languages
Mapped to SIMT Hardware] proposal, replace the following bullet:
----------------------------------------------------------------------------
* DW_OP_push_lane is not allowed because the call frame instructions
describe the actions for the whole target architecture thread, not the
lanes independently.
----------------------------------------------------------------------------
with:
----------------------------------------------------------------------------
* DW_OP_push_lane and DW_OP_push_iteration are not allowed because the call
frame instructions describe the actions for the whole target architecture
thread, not the lanes or iterations independently.
----------------------------------------------------------------------------
In Section 7.5.4 "Attribute Encodings", add the following row to Table 7.5
"Attribute encodings":
----------------------------------------------------------------------------
Table 7.5: Attribute encodings
================================== ====== ==================================
Attribute Name Value Classes
================================== ====== ==================================
DW_AT_iterations TBA constant, exprloc, loclist
================================== ====== ==================================
----------------------------------------------------------------------------
In Section 7.7.1 "Operation Expressions" of [General Support for Address Spaces]
proposal, add the following row to Table 7.9 "DWARF Operation Encodings":
----------------------------------------------------------------------------
Table 7.9: DWARF Operation Encodings
================================== ===== ======== ==========================
Operation Code Number Notes
of
Operands
================================== ===== ======== ==========================
DW_OP_push_iteration TBA 0
================================== ===== ======== ==========================
----------------------------------------------------------------------------
In Appendix A "Attributes by Tag Value (Informative)", add the following to
Table A.1 Attributes by tag value":
----------------------------------------------------------------------------
Table A.1: Attributes by tag value
============================= =============================
Tag Name Applicable Attributes
============================= =============================
DW_TAG_entry_point DW_AT_iterations
DW_TAG_inlined_subroutine DW_AT_iterations
DW_TAG_subprogram DW_AT_iterations
============================= =============================
----------------------------------------------------------------------------