-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.txt
180 lines (138 loc) · 6.12 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
Cellgen: OpenMP-like support for the Cell processor
Scott Schneider, http://www.cs.vt.edu/~scschnei
See the project page, http://www.cs.vt.edu/~scschnei/cellgen, for publications.
Compiling
---------
Cellgen uses Boost.Spirit for parsing, which uses extensively nested templates. When debugging
information is turned on (-g), each level of nesting is not compiled away, and remains in the
executable. Consequently, including debugging information makes an order of magnitude difference
in executable size.
Cellgen also relies on other Boost libraries, but they should be installed on most Linux systems.
Other than that, a simple "make" should do.
Usage
-----
cellgen foo.cellgen [-n <# SPEs>] [-I <include file for PPE/SPE>]
Brief Programming Tutorital
---------------------------
Cellgen shares semantics with OpenMP, but legal OpenMP code is not legal Cellgen code, and
vice-versa. This section presents a brief tutorial of Cellgen, which serves to both provide
the reader with an intuitive feel for the programming model, and to highlight supported features.
Regarding mechanics, Cellgen is a source-to-source compiler: it accepts C code and emits
C code. The current workflow requires a programmer to call "cellgen" on a "*.cellgen" file,
which will produce code for both the PPE and SPE. Currently, we rely on the sophisticated Make
files provided by the IBM SDK to produce executable code.
In all of these code examples, we assume the Cellgen blocks of code reside in a legal C program.
The Basics
----------
All Cellgen code is preceded by a "#pragma cell" directive. Cellgen ignores all other lines of
code until it reaches that pragma. The Cellgen code is also enclosed in braces. The simplest
Cellgen code transfers no data in or out of the SPE:
#pragma cell
{
printf("Hello world");
}
This code will print the string "Hello world" from each SPE. All code within a Cellgen region
will be executed on the SPE, and all code outside will be executed on the PPE. In code terms:
printf("I will always execute on the PPE.");
#pragma cell
{
printf("I will always execute on each SPE.");
}
In the previous two examples, the SPEs all behaved the same. While the Cellgen model is to
distribute the same code to each SPE, the power comes from giving each SPE different data. In
the following example, each SPE executes different parts of the iteration space for a loop.
#pragma cell private(int N = N)
{
int i;
for (i = 0; i < N; ++i) {
printf("iteration %d\n, i);
}
}
In this case, each SPE executes a subset of the iteration space [0--10).
Computations with Flat Arrays
-----------------------------
None of the prior examples performed any interestion computations or even transferred any data
beyond loop parameters. The following example multiplies each element of a single-dimensional
array by a constant.
int vector[N];
int factor; // presumabley set elsewhere
#pragma cell shared(int* v = vector) private(int f = factor, int N = N)
{
int i;
for (i = 0; i < N; ++i) {
v[i] = v[i] * f;
}
}
This code sample introduces several new concepts. First, in order to pass data into a Cell
region, we must specify if it is "shared" or "private". Variables declared "shared" will have
their data distributed among all SPEs, streamed in or out as needed. Cellgen performs reference
analysis to determine how to stream the variables. In this example, the data for "vector"
will be both streamed in and out of the SPEs; its result will be visible to code beyond the
Cell region. Variables declared "private" will be transferred to each SPE once, and all SPEs
will have their own local copy.
Each SPE will carry out its computation in parallel, and there is an implicit barrier at the end
of the Cell region. Note that all of the iterations of the loop are *independent*. Currently,
Cellgen can only handle independent loops.
Reductions
----------
The result from the previous example was an entire array. Cellgen can also handle reductions,
where the computation relies on a large dataset, but the result is reduced to a single value.
int vector[N];
int sum = 0;
#pragma cell shared(int* v = vector) reduction(+: int s = sum) private(int N = N)
{
int i;
for (i = 0; i < N; ++i) {
s += v[i];
}
}
After all SPEs have finished, "sum" contains the summation of all elements of "vector". Cellgen
supports reductions for addition ("+") and multiplication ("*").
Multidimensional Arrays
-----------------------
Dense matrices are usually implemented with multidimensional arrays in C. Cellgen can handle
multidimensional arrays, but it requires more information than with flat arrays, and some
programmer assistance is required with column accesses.
To start with, we shall consider row accesses. The following code multiplies each element of
a 3-dimensional array by a constant factor:
int matrix[N1][N2][N3];
int factor;
#pragma cell shared(int* m = matrix[N1][N2][N3]) private(int f = factor)
{
int i, j, k;
for (i = 0; i < N1; ++i) {
for (j = 0; j < N2; ++j) {
for (k = 0; k < N3; ++k) {
m[i][j][k] = m[i][j][k] * f;
}
}
}
}
Cellgen needs to know the dimensions of the matrix, which are provided in the "shared"
directive. The dimensions can be either constants or variables only known at runtime. Cellgen
requires the matrix dimensions so that it can compute addresses for the DMAs which will get
and put values in main memory. All of the dimensions of the matrix are implicitly passed as
private variables.
Column accesses currently require more work from the programmer. Because DMA lists work best
with addresses that are 16-byte aligned, Cellgen expects the programmer to pad their data. The
same computation, but accessing columns:
typedef struct int16b_t {
int num;
char pad[12];
};
int16b_t matrix[N1][N2][N3];
int factor;
#pragma cell shared(int16b_t* m = matrix[N1][N2][N3]) private(int f = factor)
{
int i, j, k;
for (i = 0; i < N1; ++i) {
for (j = 0; j < N3; ++j) {
for (k = 0; k < N1; ++k) {
m[k][i][j].num = m[k][i][j].num * f;
}
}
}
}
In the future, Cellgen will handle data padding so that row and column accesses appear the same
to programmers.
Further examples are available in the "unit_tests" directory.