forked from andikleen/pmu-tools
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathtoplev.man
278 lines (278 loc) · 7 KB
/
toplev.man
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.13.
.TH TOPLEV.PY "1" "April 2020" "toplev.py toplev" "User Commands"
.SH NAME
toplev.py \- manual page for toplev.py toplev
.SH DESCRIPTION
usage: toplev [options] perf\-arguments
.PP
Estimate on which part of the CPU pipeline a workload bottlenecks using the TopDown model.
The bottlenecks are expressed as a tree with different levels.
Requires a modern Intel CPU.
.SH ENVIRONMENT
.TP
\fB\-\-force\-cpu\fR {snb,jkt,ivb,ivt,hsw,hsx,slm,bdw,bdx,simple,skl,knl,skx,clx,icl}
Force CPU type
.TP
\fB\-\-force\-topology\fR findsysoutput
Use specified topology file (find \fI\,/sys/devices\/\fP)
.TP
\fB\-\-force\-cpuinfo\fR cpuinfo
Use specified cpuinfo file (\fI\,/proc/cpuinfo\/\fP)
.TP
\fB\-\-force\-hypervisor\fR
Assume running under hypervisor (no uncore, no
offcore, no PEBS)
.TP
\fB\-\-no\-uncore\fR
Disable uncore events
.TP
\fB\-\-no\-check\fR
Do not check that PMU units exist
.SS "Additional information:"
.TP
\fB\-\-print\-group\fR, \fB\-g\fR
Print event group assignments
.TP
\fB\-\-raw\fR
Print raw values
.TP
\fB\-\-valcsv\fR VALCSV, \fB\-V\fR VALCSV
Write raw counter values into CSV file
.TP
\fB\-\-stats\fR
Show statistics on what events counted
.SS "Sampling:"
.TP
\fB\-\-show\-sample\fR
Show command line to rerun workload with sampling
.TP
\fB\-\-run\-sample\fR
Automatically rerun workload with sampling
.TP
\fB\-\-sample\-args\fR SAMPLE_ARGS
Extra rguments to pass to perf record for sampling.
Use + to specify \-
.TP
\fB\-\-sample\-repeat\fR SAMPLE_REPEAT
Repeat measurement and sampling N times. This
interleaves counting and sampling. Useful for
background collection with \fB\-a\fR sleep X.
.TP
\fB\-\-sample\-basename\fR SAMPLE_BASENAME
Base name of sample perf.data files
.PP
Other perf arguments allowed (see the perf documentation)
After \fB\-\-\fR perf arguments conflicting with toplev can be used.
.PP
Some caveats:
.PP
toplev defaults to measuring the full system and show data
for all CPUs. Use taskset to limit the workload to known CPUs if needed.
In some cases (idle system, single threaded workload) \fB\-\-single\-thread\fR
can also be used.
.PP
The lower levels of the measurement tree are less reliable
than the higher levels. They also rely on counter multi\-plexing,
and can not run each equation in a single group, which can cause larger
measurement errors with non steady state workloads
.PP
(If you don't understand this terminology; it means measurements
in higher levels are less accurate and it works best with programs that primarily
do the same thing over and over)
.PP
If the program is very reproducible \fB\-\-\fR such as a simple kernel \fB\-\-\fR
it is also possible to use \fB\-\-no\-multiplex\fR. In this case the
workload is rerun multiple times until all data is collected.
Do not use together with sleep.
.PP
toplev needs a new enough perf tool and has specific requirements on
the kernel. See http://github.com/andikleen/pmu\-tools/wiki/toplev\-kernel\-support
.PP
Other CPUs can be forced with FORCECPU=name
This usually requires setting the correct event map with EVENTMAP=...
The topology can be overriden with TOPOLOGY=file (sysfs filenames) and CPUINFO=file
(\fI\,/proc/cpuinfo\/\fP replacement)
Valid CPU names: snb jkt ivb ivt hsw hsx slm bdw bdx simple skl knl skx clx icl
.SH EXAMPLES
toplev.py \-l2 program
measure whole system in level 2 while program is running
.PP
toplev.py \-l1 \-\-single\-thread program
measure single threaded program. system must be idle.
.PP
toplev.py \-l3 \-\-no\-desc \-I 100 \-x, sleep X
measure whole system for X seconds every 100ms, outputting in CSV format.
.PP
toplev.py \-\-all \-\-core C0 taskset \-c 0,1 program
Measure program running on core 0 with all nodes and metrics enables
.SS "optional arguments:"
.TP
\-h, \-\-help
show this help message and exit
.SS "General operation:"
.TP
\-\-interval INTERVAL, \-I INTERVAL
Measure every ms instead of only once
.TP
\-\-no\-multiplex
Do not multiplex, but run the workload multiple times
as needed. Requires reproducible workloads.
.TP
\-\-single\-thread, \-S
Measure workload as single thread. Workload must run
single threaded. In SMT mode other thread must be
idle.
.TP
\-\-fast, \-F
Skip sanity checks to optimize CPU consumption
.TP
\-\-import _IMPORT
Import specified perf stat output file instead of
running perf. Must be for same cpu, same arguments,
same /proc/cpuinfo, same topology, unless overriden
.TP
\-\-gen\-script
Generate script to collect perfmon information for
\-\-import later
.SS "Measurement filtering:"
.TP
\-\-kernel
Only measure kernel code
.TP
\-\-user
Only measure user code
.TP
\-\-core CORE
Limit output to cores. Comma list of Sx\-Cx\-Tx. All
parts optional.
.SS "Select events:"
.TP
\-\-level LEVEL, \-l LEVEL
Measure upto level N (max 6)
.TP
\-\-metrics, \-m
Print extra metrics
.TP
\-\-sw
Measure perf Linux metrics
.TP
\-\-no\-util
Do not measure CPU utilization
.TP
\-\-tsx
Measure TSX metrics
.TP
\-\-all
Measure everything available
.TP
\-\-frequency
Measure frequency
.TP
\-\-power
Display power metrics
.TP
\-\-nodes NODES
Include or exclude nodes (with + to add, \-|^ to
remove, comma separated list, wildcards allowed)
.TP
\-\-reduced
Use reduced server subset of nodes/metrics
.TP
\-\-metric\-group METRIC_GROUP
Add (+) or remove (\-|^) metric groups of metrics,
comma separated list from \-\-list\-metric\-groups.
.SS "Query nodes:"
.TP
\-\-list\-metrics
List all metrics
.TP
\-\-list\-nodes
List all nodes
.TP
\-\-list\-metric\-groups
List metric groups
.TP
\-\-list\-all
List every supported node/metric/metricgroup
.SS "Workarounds:"
.TP
\-\-no\-group
Dont use groups
.TP
\-\-force\-events
Assume kernel supports all events. May give wrong
results.
.TP
\-\-ignore\-errata
Do not disable events with errata
.TP
\-\-handle\-errata
Disable events with errata
.SS "Output:"
.TP
\-\-per\-core
Aggregate output per core
.TP
\-\-per\-socket
Aggregate output per socket
.TP
\-\-per\-thread
Aggregate output per CPU thread
.TP
\-\-global
Aggregate output for all CPUs
.TP
\-\-no\-desc
Do not print event descriptions
.TP
\-\-desc
Force event descriptions
.TP
\-\-verbose, \-v
Print all results even when below threshold or
exceeding boundaries. Note this can result in bogus
values, as the TopDown methodology relies on
thresholds to correctly characterize workloads.
.TP
\-\-csv CSV, \-x CSV
Enable CSV mode with specified delimeter
.TP
\-\-output OUTPUT, \-o OUTPUT
Set output file
.TP
\-\-split\-output
Generate multiple output files, one for each specified
aggregation option (with \-o)
.TP
\-\-graph
Automatically graph interval output with tl\-barplot.py
.TP
\-\-graph\-cpu GRAPH_CPU
CPU to graph using \-\-graph
.TP
\-\-title TITLE
Set title of graph
.TP
\-\-quiet
Avoid unnecessary status output
.TP
\-\-long\-desc
Print long descriptions instead of abbreviated ones.
.TP
\-\-columns
Print CPU output in multiple columns for each node
.TP
\-\-summary
Print summary at the end. Only useful with \-I
.TP
\-\-no\-area
Hide area column
.TP
\-\-perf\-output PERF_OUTPUT
Save perf stat output in specified file
.TP
\-\-no\-perf
Don't print perf command line
.TP
\-\-print
Only print perf command line. Don't run