-
Notifications
You must be signed in to change notification settings - Fork 0
/
perf_event_open.html
4306 lines (3103 loc) · 125 KB
/
perf_event_open.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<!-- Creator : groff version 1.22.4 -->
<!-- CreationDate: Wed Jan 29 11:26:07 2020 -->
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta name="generator" content="groff -Thtml, see www.gnu.org">
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
<meta name="Content-Style" content="text/css">
<style type="text/css">
p { margin-top: 0; margin-bottom: 0; vertical-align: top }
pre { margin-top: 0; margin-bottom: 0; vertical-align: top }
table { margin-top: 0; margin-bottom: 0; vertical-align: top }
h1 { text-align: center }
</style>
<title>PERF_EVENT_OPEN</title>
</head>
<body>
<h1 align="center">PERF_EVENT_OPEN</h1>
<a href="#NAME">NAME</a><br>
<a href="#SYNOPSIS">SYNOPSIS</a><br>
<a href="#DESCRIPTION">DESCRIPTION</a><br>
<a href="#RETURN VALUE">RETURN VALUE</a><br>
<a href="#ERRORS">ERRORS</a><br>
<a href="#VERSION">VERSION</a><br>
<a href="#CONFORMING TO">CONFORMING TO</a><br>
<a href="#NOTES">NOTES</a><br>
<a href="#BUGS">BUGS</a><br>
<a href="#EXAMPLE">EXAMPLE</a><br>
<a href="#SEE ALSO">SEE ALSO</a><br>
<a href="#COLOPHON">COLOPHON</a><br>
<hr>
<h2>NAME
<a name="NAME"></a>
</h2>
<p style="margin-left:11%; margin-top: 1em">perf_event_open
- set up performance monitoring</p>
<h2>SYNOPSIS
<a name="SYNOPSIS"></a>
</h2>
<p style="margin-left:11%; margin-top: 1em"><b>#include
<linux/perf_event.h> <br>
#include <linux/hw_breakpoint.h></b></p>
<p style="margin-left:11%; margin-top: 1em"><b>int
perf_event_open(struct perf_event_attr *</b><i>attr</i><b>,
<br>
pid_t</b> <i>pid</i><b>, int</b> <i>cpu</i><b>, int</b>
<i>group_fd</i><b>, <br>
unsigned long</b> <i>flags</i><b>);</b></p>
<p style="margin-left:11%; margin-top: 1em"><i>Note</i>:
There is no glibc wrapper for this system call; see
NOTES.</p>
<h2>DESCRIPTION
<a name="DESCRIPTION"></a>
</h2>
<p style="margin-left:11%; margin-top: 1em">Given a list of
parameters, <b>perf_event_open</b>() returns a file
descriptor, for use in subsequent system calls
(<b>read</b>(2), <b>mmap</b>(2), <b>prctl</b>(2),
<b>fcntl</b>(2), etc.).</p>
<p style="margin-left:11%; margin-top: 1em">A call to
<b>perf_event_open</b>() creates a file descriptor that
allows measuring performance information. Each file
descriptor corresponds to one event that is measured; these
can be grouped together to measure multiple events
simultaneously.</p>
<p style="margin-left:11%; margin-top: 1em">Events can be
enabled and disabled in two ways: via <b>ioctl</b>(2) and
via <b>prctl</b>(2). When an event is disabled it does not
count or generate overflows but does continue to exist and
maintain its count value.</p>
<p style="margin-left:11%; margin-top: 1em">Events come in
two flavors: counting and sampled. A <i>counting</i> event
is one that is used for counting the aggregate number of
events that occur. In general, counting event results are
gathered with a <b>read</b>(2) call. A <i>sampling</i> event
periodically writes measurements to a buffer that can then
be accessed via <b>mmap</b>(2).</p>
<p style="margin-left:11%; margin-top: 1em"><b>Arguments</b>
<br>
The <i>pid</i> and <i>cpu</i> arguments allow specifying
which process and CPU to monitor: <b><br>
pid == 0</b> and <b>cpu == -1</b></p>
<p style="margin-left:22%;">This measures the calling
process/thread on any CPU.</p>
<p style="margin-left:11%;"><b>pid == 0</b> and <b>cpu
>= 0</b></p>
<p style="margin-left:22%;">This measures the calling
process/thread only when running on the specified CPU.</p>
<p style="margin-left:11%;"><b>pid > 0</b> and <b>cpu ==
-1</b></p>
<p style="margin-left:22%;">This measures the specified
process/thread on any CPU.</p>
<p style="margin-left:11%;"><b>pid > 0</b> and <b>cpu
>= 0</b></p>
<p style="margin-left:22%;">This measures the specified
process/thread only when running on the specified CPU.</p>
<p style="margin-left:11%;"><b>pid == -1</b> and <b>cpu
>= 0</b></p>
<p style="margin-left:22%;">This measures all
processes/threads on the specified CPU. This requires
<b>CAP_SYS_ADMIN</b> capability or a
<i>/proc/sys/kernel/perf_event_paranoid</i> value of less
than 1.</p>
<p style="margin-left:11%;"><b>pid == -1</b> and <b>cpu ==
-1</b></p>
<p style="margin-left:22%;">This setting is invalid and
will return an error.</p>
<p style="margin-left:11%; margin-top: 1em">When <i>pid</i>
is greater than zero, permission to perform this system call
is governed by a ptrace access mode
<b>PTRACE_MODE_READ_REALCREDS</b> check; see
<b>ptrace</b>(2).</p>
<p style="margin-left:11%; margin-top: 1em">The
<i>group_fd</i> argument allows event groups to be created.
An event group has one event which is the group leader. The
leader is created first, with <i>group_fd</i> = -1. The rest
of the group members are created with subsequent
<b>perf_event_open</b>() calls with <i>group_fd</i> being
set to the file descriptor of the group leader. (A single
event on its own is created with <i>group_fd</i> = -1 and is
considered to be a group with only 1 member.) An event group
is scheduled onto the CPU as a unit: it will be put onto the
CPU only if all of the events in the group can be put onto
the CPU. This means that the values of the member events can
be meaningfully compared—added, divided (to get
ratios), and so on—with each other, since they have
counted events for the same set of executed
instructions.</p>
<p style="margin-left:11%; margin-top: 1em">The
<i>flags</i> argument is formed by ORing together zero or
more of the following values: <b><br>
PERF_FLAG_FD_CLOEXEC</b> (since Linux 3.14)</p>
<p style="margin-left:22%;">This flag enables the
close-on-exec flag for the created event file descriptor, so
that the file descriptor is automatically closed on
<b>execve</b>(2). Setting the close-on-exec flags at
creation time, rather than later with <b>fcntl</b>(2),
avoids potential race conditions where the calling thread
invokes <b>perf_event_open</b>() and <b>fcntl</b>(2) at the
same time as another thread calls <b>fork</b>(2) then
<b>execve</b>(2).</p>
<p style="margin-left:11%;"><b>PERF_FLAG_FD_NO_GROUP</b></p>
<p style="margin-left:22%;">This flag tells the event to
ignore the <i>group_fd</i> parameter except for the purpose
of setting up output redirection using the
<b>PERF_FLAG_FD_OUTPUT</b> flag.</p>
<p style="margin-left:11%;"><b>PERF_FLAG_FD_OUTPUT</b>
(broken since Linux 2.6.35)</p>
<p style="margin-left:22%;">This flag re-routes the
event’s sampled output to instead be included in the
mmap buffer of the event specified by <i>group_fd</i>.</p>
<p style="margin-left:11%;"><b>PERF_FLAG_PID_CGROUP</b>
(since Linux 2.6.39)</p>
<p style="margin-left:22%;">This flag activates
per-container system-wide monitoring. A container is an
abstraction that isolates a set of resources for
finer-grained control (CPUs, memory, etc.). In this mode,
the event is measured only if the thread running on the
monitored CPU belongs to the designated container (cgroup).
The cgroup is identified by passing a file descriptor opened
on its directory in the cgroupfs filesystem. For instance,
if the cgroup to monitor is called <i>test</i>, then a file
descriptor opened on <i>/dev/cgroup/test</i> (assuming
cgroupfs is mounted on <i>/dev/cgroup</i>) must be passed as
the <i>pid</i> parameter. cgroup monitoring is available
only for system-wide events and may therefore require extra
permissions.</p>
<p style="margin-left:11%; margin-top: 1em">The
<i>perf_event_attr</i> structure provides detailed
configuration information for the event being created.</p>
<p style="margin-left:17%; margin-top: 1em">struct
perf_event_attr { <br>
__u32 type; /* Type of event */ <br>
__u32 size; /* Size of attribute structure */ <br>
__u64 config; /* Type-specific configuration */</p>
<p style="margin-left:17%; margin-top: 1em">union { <br>
__u64 sample_period; /* Period of sampling */ <br>
__u64 sample_freq; /* Frequency of sampling */ <br>
};</p>
<p style="margin-left:17%; margin-top: 1em">__u64
sample_type; /* Specifies values included in sample */ <br>
__u64 read_format; /* Specifies values returned in read
*/</p>
<p style="margin-left:17%; margin-top: 1em">__u64 disabled
: 1, /* off by default */ <br>
inherit : 1, /* children inherit it */ <br>
pinned : 1, /* must always be on PMU */ <br>
exclusive : 1, /* only group on PMU */ <br>
exclude_user : 1, /* don’t count user */ <br>
exclude_kernel : 1, /* don’t count kernel */ <br>
exclude_hv : 1, /* don’t count hypervisor */ <br>
exclude_idle : 1, /* don’t count when idle */ <br>
mmap : 1, /* include mmap data */ <br>
comm : 1, /* include comm data */ <br>
freq : 1, /* use freq, not period */ <br>
inherit_stat : 1, /* per task counts */ <br>
enable_on_exec : 1, /* next exec enables */ <br>
task : 1, /* trace fork/exit */ <br>
watermark : 1, /* wakeup_watermark */ <br>
precise_ip : 2, /* skid constraint */ <br>
mmap_data : 1, /* non-exec mmap data */ <br>
sample_id_all : 1, /* sample_type all events */ <br>
exclude_host : 1, /* don’t count in host */ <br>
exclude_guest : 1, /* don’t count in guest */ <br>
exclude_callchain_kernel : 1, <br>
/* exclude kernel callchains */ <br>
exclude_callchain_user : 1, <br>
/* exclude user callchains */ <br>
mmap2 : 1, /* include mmap with inode data */ <br>
comm_exec : 1, /* flag comm events that are <br>
due to exec */ <br>
use_clockid : 1, /* use clockid for time fields */ <br>
context_switch : 1, /* context switch data */</p>
<p style="margin-left:17%; margin-top: 1em">__reserved_1 :
37;</p>
<p style="margin-left:17%; margin-top: 1em">union { <br>
__u32 wakeup_events; /* wakeup every n events */ <br>
__u32 wakeup_watermark; /* bytes before wakeup */ <br>
};</p>
<p style="margin-left:17%; margin-top: 1em">__u32 bp_type;
/* breakpoint type */</p>
<p style="margin-left:17%; margin-top: 1em">union { <br>
__u64 bp_addr; /* breakpoint address */ <br>
__u64 kprobe_func; /* for perf_kprobe */ <br>
__u64 uprobe_path; /* for perf_uprobe */ <br>
__u64 config1; /* extension of config */ <br>
};</p>
<p style="margin-left:17%; margin-top: 1em">union { <br>
__u64 bp_len; /* breakpoint length */ <br>
__u64 kprobe_addr; /* with kprobe_func == NULL */ <br>
__u64 probe_offset; /* for perf_[k,u]probe */ <br>
__u64 config2; /* extension of config1 */ <br>
}; <br>
__u64 branch_sample_type; /* enum perf_branch_sample_type */
<br>
__u64 sample_regs_user; /* user regs to dump on samples */
<br>
__u32 sample_stack_user; /* size of stack to dump on <br>
samples */ <br>
__s32 clockid; /* clock to use for time fields */ <br>
__u64 sample_regs_intr; /* regs to dump on samples */ <br>
__u32 aux_watermark; /* aux bytes before wakeup */ <br>
__u16 sample_max_stack; /* max frames in callchain */ <br>
__u16 __reserved_2; /* align to u64 */</p>
<p style="margin-left:17%; margin-top: 1em">};</p>
<p style="margin-left:11%; margin-top: 1em">The fields of
the <i>perf_event_attr</i> structure are described in more
detail below:</p>
<table width="100%" border="0" rules="none" frame="void"
cellspacing="0" cellpadding="0">
<tr valign="top" align="left">
<td width="11%"></td>
<td width="6%">
<p><i>type</i></p></td>
<td width="5%"></td>
<td width="78%">
<p>This field specifies the overall event type. It has one
of the following values:</p></td></tr>
</table>
<p style="margin-left:22%;"><b>PERF_TYPE_HARDWARE</b></p>
<p style="margin-left:32%;">This indicates one of the
"generalized" hardware events provided by the
kernel. See the <i>config</i> field definition for more
details.</p>
<p style="margin-left:22%;"><b>PERF_TYPE_SOFTWARE</b></p>
<p style="margin-left:32%;">This indicates one of the
software-defined events provided by the kernel (even if no
hardware support is available).</p>
<p style="margin-left:22%;"><b>PERF_TYPE_TRACEPOINT</b></p>
<p style="margin-left:32%;">This indicates a tracepoint
provided by the kernel tracepoint infrastructure.</p>
<p style="margin-left:22%;"><b>PERF_TYPE_HW_CACHE</b></p>
<p style="margin-left:32%;">This indicates a hardware cache
event. This has a special encoding, described in the
<i>config</i> field definition.</p>
<p style="margin-left:22%;"><b>PERF_TYPE_RAW</b></p>
<p style="margin-left:32%;">This indicates a
"raw" implementation-specific event in the
<i>config</i> field.</p>
<p style="margin-left:22%;"><b>PERF_TYPE_BREAKPOINT</b>
(since Linux 2.6.33)</p>
<p style="margin-left:32%;">This indicates a hardware
breakpoint as provided by the CPU. Breakpoints can be
read/write accesses to an address as well as execution of an
instruction address.</p>
<p style="margin-left:22%;">dynamic PMU</p>
<p style="margin-left:32%;">Since Linux 2.6.38,
<b>perf_event_open</b>() can support multiple PMUs. To
enable this, a value exported by the kernel can be used in
the <i>type</i> field to indicate which PMU to use. The
value to use can be found in the sysfs filesystem: there is
a subdirectory per PMU instance under
<i>/sys/bus/event_source/devices</i>. In each subdirectory
there is a <i>type</i> file whose content is an integer that
can be used in the <i>type</i> field. For instance,
<i>/sys/bus/event_source/devices/cpu/type</i> contains the
value for the core CPU PMU, which is usually 4.</p>
<p style="margin-left:22%;"><b>kprobe</b> and <b>uprobe</b>
(since Linux 4.17)</p>
<p style="margin-left:32%;">These two dynamic PMUs create a
kprobe/uprobe and attach it to the file descriptor generated
by perf_event_open. The kprobe/uprobe will be destroyed on
the destruction of the file descriptor. See fields
<i>kprobe_func</i>, <i>uprobe_path</i>, <i>kprobe_addr</i>,
and <i>probe_offset</i> for more details.</p>
<table width="100%" border="0" rules="none" frame="void"
cellspacing="0" cellpadding="0">
<tr valign="top" align="left">
<td width="11%"></td>
<td width="6%">
<p><i>size</i></p></td>
<td width="5%"></td>
<td width="78%">
<p>The size of the <i>perf_event_attr</i> structure for
forward/backward compatibility. Set this using
<i>sizeof(struct perf_event_attr)</i> to allow the kernel to
see the struct size at the time of compilation.</p></td></tr>
</table>
<p style="margin-left:22%; margin-top: 1em">The related
define <b>PERF_ATTR_SIZE_VER0</b> is set to 64; this was the
size of the first published struct.
<b>PERF_ATTR_SIZE_VER1</b> is 72, corresponding to the
addition of breakpoints in Linux 2.6.33.
<b>PERF_ATTR_SIZE_VER2</b> is 80 corresponding to the
addition of branch sampling in Linux 3.4.
<b>PERF_ATTR_SIZE_VER3</b> is 96 corresponding to the
addition of <i>sample_regs_user</i> and
<i>sample_stack_user</i> in Linux 3.7.
<b>PERF_ATTR_SIZE_VER4</b> is 104 corresponding to the
addition of <i>sample_regs_intr</i> in Linux 3.19.
<b>PERF_ATTR_SIZE_VER5</b> is 112 corresponding to the
addition of <i>aux_watermark</i> in Linux 4.1.</p>
<table width="100%" border="0" rules="none" frame="void"
cellspacing="0" cellpadding="0">
<tr valign="top" align="left">
<td width="11%"></td>
<td width="9%">
<p style="margin-top: 1em"><i>config</i></p></td>
<td width="2%"></td>
<td width="78%">
<p style="margin-top: 1em">This specifies which event you
want, in conjunction with the <i>type</i> field. The
<i>config1</i> and <i>config2</i> fields are also taken into
account in cases where 64 bits is not enough to fully
specify the event. The encoding of these fields are event
dependent.</p> </td></tr>
</table>
<p style="margin-left:22%; margin-top: 1em">There are
various ways to set the <i>config</i> field that are
dependent on the value of the previously described
<i>type</i> field. What follows are various possible
settings for <i>config</i> separated out by <i>type</i>.</p>
<p style="margin-left:22%; margin-top: 1em">If <i>type</i>
is <b>PERF_TYPE_HARDWARE</b>, we are measuring one of the
generalized hardware CPU events. Not all of these are
available on all platforms. Set <i>config</i> to one of the
following:</p>
<p style="margin-left:29%;"><b>PERF_COUNT_HW_CPU_CYCLES</b></p>
<p style="margin-left:40%;">Total cycles. Be wary of what
happens during CPU frequency scaling.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_HW_INSTRUCTIONS</b></p>
<p style="margin-left:40%;">Retired instructions. Be
careful, these can be affected by various issues, most
notably hardware interrupt counts.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_HW_CACHE_REFERENCES</b></p>
<p style="margin-left:40%;">Cache accesses. Usually this
indicates Last Level Cache accesses but this may vary
depending on your CPU. This may include prefetches and
coherency messages; again this depends on the design of your
CPU.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_HW_CACHE_MISSES</b></p>
<p style="margin-left:40%;">Cache misses. Usually this
indicates Last Level Cache misses; this is intended to be
used in conjunction with the
<b>PERF_COUNT_HW_CACHE_REFERENCES</b> event to calculate
cache miss rates.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_HW_BRANCH_INSTRUCTIONS</b></p>
<p style="margin-left:40%;">Retired branch instructions.
Prior to Linux 2.6.35, this used the wrong event on AMD
processors.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_HW_BRANCH_MISSES</b></p>
<p style="margin-left:40%;">Mispredicted branch
instructions.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_HW_BUS_CYCLES</b></p>
<p style="margin-left:40%;">Bus cycles, which can be
different from total cycles.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_HW_STALLED_CYCLES_FRONTEND</b>
(since Linux 3.0)</p>
<p style="margin-left:40%;">Stalled cycles during
issue.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_HW_STALLED_CYCLES_BACKEND</b>
(since Linux 3.0)</p>
<p style="margin-left:40%;">Stalled cycles during
retirement.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_HW_REF_CPU_CYCLES</b>
(since Linux 3.3)</p>
<p style="margin-left:40%;">Total cycles; not affected by
CPU frequency scaling.</p>
<p style="margin-left:22%; margin-top: 1em">If <i>type</i>
is <b>PERF_TYPE_SOFTWARE</b>, we are measuring software
events provided by the kernel. Set <i>config</i> to one of
the following:</p>
<p style="margin-left:29%;"><b>PERF_COUNT_SW_CPU_CLOCK</b></p>
<p style="margin-left:40%;">This reports the CPU clock, a
high-resolution per-CPU timer.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_SW_TASK_CLOCK</b></p>
<p style="margin-left:40%;">This reports a clock count
specific to the task that is running.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_SW_PAGE_FAULTS</b></p>
<p style="margin-left:40%;">This reports the number of page
faults.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_SW_CONTEXT_SWITCHES</b></p>
<p style="margin-left:40%;">This counts context switches.
Until Linux 2.6.34, these were all reported as user-space
events, after that they are reported as happening in the
kernel.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_SW_CPU_MIGRATIONS</b></p>
<p style="margin-left:40%;">This reports the number of
times the process has migrated to a new CPU.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_SW_PAGE_FAULTS_MIN</b></p>
<p style="margin-left:40%;">This counts the number of minor
page faults. These did not require disk I/O to handle.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_SW_PAGE_FAULTS_MAJ</b></p>
<p style="margin-left:40%;">This counts the number of major
page faults. These required disk I/O to handle.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_SW_ALIGNMENT_FAULTS</b>
(since Linux 2.6.33)</p>
<p style="margin-left:40%;">This counts the number of
alignment faults. These happen when unaligned memory
accesses happen; the kernel can handle these but it reduces
performance. This happens only on some architectures (never
on x86).</p>
<p style="margin-left:29%;"><b>PERF_COUNT_SW_EMULATION_FAULTS</b>
(since Linux 2.6.33)</p>
<p style="margin-left:40%;">This counts the number of
emulation faults. The kernel sometimes traps on
unimplemented instructions and emulates them for user space.
This can negatively impact performance.</p>
<p style="margin-left:29%;"><b>PERF_COUNT_SW_DUMMY</b>
(since Linux 3.12)</p>
<p style="margin-left:40%;">This is a placeholder event
that counts nothing. Informational sample record types such
as mmap or comm must be associated with an active event.
This dummy event allows gathering such records without
requiring a counting event.</p>
<p style="margin-left:22%; margin-top: 1em">If <i>type</i>
is <b>PERF_TYPE_TRACEPOINT</b>, then we are measuring kernel
tracepoints. The value to use in <i>config</i> can be
obtained from under debugfs <i>tracing/events/*/*/id</i> if
ftrace is enabled in the kernel.</p>
<p style="margin-left:22%; margin-top: 1em">If <i>type</i>
is <b>PERF_TYPE_HW_CACHE</b>, then we are measuring a
hardware CPU cache event. To calculate the appropriate
<i>config</i> value use the following equation:</p>
<p style="margin-left:28%; margin-top: 1em">(perf_hw_cache_id)
| (perf_hw_cache_op_id << 8) | <br>
(perf_hw_cache_op_result_id << 16)</p>
<p style="margin-left:28%; margin-top: 1em">where
<i>perf_hw_cache_id</i> is one of:</p>
<p style="margin-left:34%;"><b>PERF_COUNT_HW_CACHE_L1D</b></p>
<p style="margin-left:45%;">for measuring Level 1 Data
Cache</p>
<p style="margin-left:34%;"><b>PERF_COUNT_HW_CACHE_L1I</b></p>
<p style="margin-left:45%;">for measuring Level 1
Instruction Cache</p>
<p style="margin-left:34%;"><b>PERF_COUNT_HW_CACHE_LL</b></p>
<p style="margin-left:45%;">for measuring Last-Level
Cache</p>
<p style="margin-left:34%;"><b>PERF_COUNT_HW_CACHE_DTLB</b></p>
<p style="margin-left:45%;">for measuring the Data TLB</p>
<p style="margin-left:34%;"><b>PERF_COUNT_HW_CACHE_ITLB</b></p>
<p style="margin-left:45%;">for measuring the Instruction
TLB</p>
<p style="margin-left:34%;"><b>PERF_COUNT_HW_CACHE_BPU</b></p>
<p style="margin-left:45%;">for measuring the branch
prediction unit</p>
<p style="margin-left:34%;"><b>PERF_COUNT_HW_CACHE_NODE</b>
(since Linux 3.1)</p>
<p style="margin-left:45%;">for measuring local memory
accesses</p>
<p style="margin-left:28%; margin-top: 1em">and
<i>perf_hw_cache_op_id</i> is one of:</p>
<p style="margin-left:34%;"><b>PERF_COUNT_HW_CACHE_OP_READ</b></p>
<p style="margin-left:45%;">for read accesses</p>
<p style="margin-left:34%;"><b>PERF_COUNT_HW_CACHE_OP_WRITE</b></p>
<p style="margin-left:45%;">for write accesses</p>
<p style="margin-left:34%;"><b>PERF_COUNT_HW_CACHE_OP_PREFETCH</b></p>
<p style="margin-left:45%;">for prefetch accesses</p>
<p style="margin-left:28%; margin-top: 1em">and
<i>perf_hw_cache_op_result_id</i> is one of:</p>
<p style="margin-left:34%;"><b>PERF_COUNT_HW_CACHE_RESULT_ACCESS</b></p>
<p style="margin-left:45%;">to measure accesses</p>
<p style="margin-left:34%;"><b>PERF_COUNT_HW_CACHE_RESULT_MISS</b></p>
<p style="margin-left:45%;">to measure misses</p>
<p style="margin-left:22%; margin-top: 1em">If <i>type</i>
is <b>PERF_TYPE_RAW</b>, then a custom "raw"
<i>config</i> value is needed. Most CPUs support events that
are not covered by the "generalized" events. These
are implementation defined; see your CPU manual (for example
the Intel Volume 3B documentation or the AMD BIOS and Kernel
Developer Guide). The libpfm4 library can be used to
translate from the name in the architectural manuals to the
raw hex value <b>perf_event_open</b>() expects in this
field.</p>
<p style="margin-left:22%; margin-top: 1em">If <i>type</i>
is <b>PERF_TYPE_BREAKPOINT</b>, then leave <i>config</i> set
to zero. Its parameters are set in other places.</p>
<p style="margin-left:22%; margin-top: 1em">If <i>type</i>
is <b>kprobe</b> or <b>uprobe</b>, set <i>retprobe</i> (bit
0 of <i>config</i>, see
<i>/sys/bus/event_source/devices/[k,u]probe/format/retprobe</i>)
for kretprobe/uretprobe. See fields <i>kprobe_func</i>,
<i>uprobe_path</i>, <i>kprobe_addr</i>, and
<i>probe_offset</i> for more details.</p>
<p style="margin-left:11%;"><i>kprobe_func</i>,
<i>uprobe_path</i>, <i>kprobe_addr</i>, and
<i>probe_offset</i></p>
<p style="margin-left:22%;">These fields describe the
kprobe/uprobe for dynamic PMUs <b>kprobe</b> and
<b>uprobe</b>. For <b>kprobe</b>: use <i>kprobe_func</i> and
<i>probe_offset</i>, or use <i>kprobe_addr</i> and leave
<i>kprobe_func</i> as NULL. For <b>uprobe</b>: use
<i>uprobe_path</i> and <i>probe_offset</i>.</p>
<p style="margin-left:11%;"><i>sample_period</i>,
<i>sample_freq</i></p>
<p style="margin-left:22%;">A "sampling" event is
one that generates an overflow notification every N events,
where N is given by <i>sample_period</i>. A sampling event
has <i>sample_period</i> > 0. When an overflow occurs,
requested data is recorded in the mmap buffer. The
<i>sample_type</i> field controls what data is recorded on
each overflow.</p>
<p style="margin-left:22%; margin-top: 1em"><i>sample_freq</i>
can be used if you wish to use frequency rather than period.
In this case, you set the <i>freq</i> flag. The kernel will
adjust the sampling period to try and achieve the desired
rate. The rate of adjustment is a timer tick.</p>
<p style="margin-left:11%;"><i>sample_type</i></p>
<p style="margin-left:22%;">The various bits in this field
specify which values to include in the sample. They will be
recorded in a ring-buffer, which is available to user space
using <b>mmap</b>(2). The order in which the values are
saved in the sample are documented in the MMAP Layout
subsection below; it is not the <i>enum
perf_event_sample_format</i> order. <b><br>
PERF_SAMPLE_IP</b></p>
<p style="margin-left:32%;">Records instruction
pointer.</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_TID</b></p>
<p style="margin-left:32%;">Records the process and thread
IDs.</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_TIME</b></p>
<p style="margin-left:32%;">Records a timestamp.</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_ADDR</b></p>
<p style="margin-left:32%;">Records an address, if
applicable.</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_READ</b></p>
<p style="margin-left:32%;">Record counter values for all
events in a group, not just the group leader.</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_CALLCHAIN</b></p>
<p style="margin-left:32%;">Records the callchain (stack
backtrace).</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_ID</b></p>
<p style="margin-left:32%;">Records a unique ID for the
opened event’s group leader.</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_CPU</b></p>
<p style="margin-left:32%;">Records CPU number.</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_PERIOD</b></p>
<p style="margin-left:32%;">Records the current sampling
period.</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_STREAM_ID</b></p>
<p style="margin-left:32%;">Records a unique ID for the
opened event. Unlike <b>PERF_SAMPLE_ID</b> the actual ID is
returned, not the group leader. This ID is the same as the
one returned by <b>PERF_FORMAT_ID</b>.</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_RAW</b></p>
<p style="margin-left:32%;">Records additional data, if
applicable. Usually returned by tracepoint events.</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_BRANCH_STACK</b>
(since Linux 3.4)</p>
<p style="margin-left:32%;">This provides a record of
recent branches, as provided by CPU branch sampling hardware
(such as Intel Last Branch Record). Not all hardware
supports this feature.</p>
<p style="margin-left:32%; margin-top: 1em">See the
<i>branch_sample_type</i> field for how to filter which
branches are reported.</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_REGS_USER</b>
(since Linux 3.7)</p>
<p style="margin-left:32%;">Records the current user-level
CPU register state (the values in the process before the
kernel was called).</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_STACK_USER</b>
(since Linux 3.7)</p>
<p style="margin-left:32%;">Records the user level stack,
allowing stack unwinding.</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_WEIGHT</b>
(since Linux 3.10)</p>
<p style="margin-left:32%;">Records a hardware provided
weight value that expresses how costly the sampled event
was. This allows the hardware to highlight expensive events
in a profile.</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_DATA_SRC</b>
(since Linux 3.10)</p>
<p style="margin-left:32%;">Records the data source: where
in the memory hierarchy the data associated with the sampled
instruction came from. This is available only if the
underlying hardware supports this feature.</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_IDENTIFIER</b>
(since Linux 3.12)</p>
<p style="margin-left:32%;">Places the <b>SAMPLE_ID</b>
value in a fixed position in the record, either at the
beginning (for sample events) or at the end (if a non-sample
event).</p>
<p style="margin-left:32%; margin-top: 1em">This was
necessary because a sample stream may have records from
various different event sources with different
<i>sample_type</i> settings. Parsing the event stream
properly was not possible because the format of the record
was needed to find <b>SAMPLE_ID</b>, but the format could
not be found without knowing what event the sample belonged
to (causing a circular dependency).</p>
<p style="margin-left:32%; margin-top: 1em">The
<b>PERF_SAMPLE_IDENTIFIER</b> setting makes the event stream
always parsable by putting <b>SAMPLE_ID</b> in a fixed
location, even though it means having duplicate
<b>SAMPLE_ID</b> values in records.</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_TRANSACTION</b>
(since Linux 3.13)</p>
<p style="margin-left:32%;">Records reasons for
transactional memory abort events (for example, from Intel
TSX transactional memory support).</p>
<p style="margin-left:32%; margin-top: 1em">The
<i>precise_ip</i> setting must be greater than 0 and a
transactional memory abort event must be measured or no
values will be recorded. Also note that some perf_event
measurements, such as sampled cycle counting, may cause
extraneous aborts (by causing an interrupt during a
transaction).</p>
<p style="margin-left:22%;"><b>PERF_SAMPLE_REGS_INTR</b>
(since Linux 3.19)</p>
<p style="margin-left:32%;">Records a subset of the current
CPU register state as specified by <i>sample_regs_intr</i>.
Unlike <b>PERF_SAMPLE_REGS_USER</b> the register values will
return kernel register state if the overflow happened while
kernel code is running. If the CPU supports hardware
sampling of register state (i.e., PEBS on Intel x86) and
<i>precise_ip</i> is set higher than zero then the register
values returned are those captured by hardware at the time
of the sampled instruction’s retirement.</p>
<p style="margin-left:11%;"><i>read_format</i></p>
<p style="margin-left:22%;">This field specifies the format
of the data returned by <b>read</b>(2) on a
<b>perf_event_open</b>() file descriptor. <b><br>
PERF_FORMAT_TOTAL_TIME_ENABLED</b></p>
<p style="margin-left:32%;">Adds the 64-bit
<i>time_enabled</i> field. This can be used to calculate
estimated totals if the PMU is overcommitted and
multiplexing is happening.</p>
<p style="margin-left:22%;"><b>PERF_FORMAT_TOTAL_TIME_RUNNING</b></p>
<p style="margin-left:32%;">Adds the 64-bit
<i>time_running</i> field. This can be used to calculate
estimated totals if the PMU is overcommitted and
multiplexing is happening.</p>
<p style="margin-left:22%;"><b>PERF_FORMAT_ID</b></p>
<p style="margin-left:32%;">Adds a 64-bit unique value that
corresponds to the event group.</p>
<p style="margin-left:22%;"><b>PERF_FORMAT_GROUP</b></p>
<p style="margin-left:32%;">Allows all counter values in an
event group to be read with one read.</p>
<p style="margin-left:11%;"><i>disabled</i></p>
<p style="margin-left:22%;">The <i>disabled</i> bit
specifies whether the counter starts out disabled or
enabled. If disabled, the event can later be enabled by
<b>ioctl</b>(2), <b>prctl</b>(2), or
<i>enable_on_exec</i>.</p>
<p style="margin-left:22%; margin-top: 1em">When creating
an event group, typically the group leader is initialized
with <i>disabled</i> set to 1 and any child events are
initialized with <i>disabled</i> set to 0. Despite
<i>disabled</i> being 0, the child events will not start
until the group leader is enabled.</p>
<p style="margin-left:11%;"><i>inherit</i></p>
<p style="margin-left:22%;">The <i>inherit</i> bit
specifies that this counter should count events of child
tasks as well as the task specified. This applies only to
new children, not to any existing children at the time the
counter is created (nor to any new children of existing
children).</p>
<p style="margin-left:22%; margin-top: 1em">Inherit does
not work for some combinations of <i>read_format</i> values,
such as <b>PERF_FORMAT_GROUP</b>.</p>
<table width="100%" border="0" rules="none" frame="void"
cellspacing="0" cellpadding="0">
<tr valign="top" align="left">
<td width="11%"></td>
<td width="9%">
<p><i>pinned</i></p></td>
<td width="2%"></td>
<td width="78%">
<p>The <i>pinned</i> bit specifies that the counter should
always be on the CPU if at all possible. It applies only to
hardware counters and only to group leaders. If a pinned
counter cannot be put onto the CPU (e.g., because there are
not enough hardware counters or because of a conflict with
some other event), then the counter goes into an
’error’ state, where reads return end-of-file
(i.e., <b>read</b>(2) returns 0) until the counter is
subsequently enabled or disabled.</p></td></tr>
</table>
<p style="margin-left:11%;"><i>exclusive</i></p>
<p style="margin-left:22%;">The <i>exclusive</i> bit
specifies that when this counter’s group is on the
CPU, it should be the only group using the CPU’s
counters. In the future this may allow monitoring programs
to support PMU features that need to run alone so that they
do not disrupt other hardware counters.</p>
<p style="margin-left:22%; margin-top: 1em">Note that many
unexpected situations may prevent events with the
<i>exclusive</i> bit set from ever running. This includes
any users running a system-wide measurement as well as any
kernel use of the performance counters (including the
commonly enabled NMI Watchdog Timer interface).</p>
<p style="margin-left:11%;"><i>exclude_user</i></p>