forked from ThomasKaiser/sbc-bench
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path1ism.txt
344 lines (290 loc) · 18.9 KB
/
1ism.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
Distributor ID: Raspbian
Description: Raspbian GNU/Linux 9.4 (stretch)
Release: 9.4
Codename: stretch
Architecture: armhf
Linux raspberrypi 4.14.50-v7+ #1122 SMP Tue Jun 19 12:26:26 BST 2018 armv7l GNU/Linux
16:56:39 up 1:09, 1 user, load average: 0.28, 0.18, 0.55
Linux 4.14.50-v7+ (raspberrypi) 26/07/18 _armv7l_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
12.73 0.03 0.68 0.70 0.00 85.86
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
mmcblk0 6.76 194.66 80.51 814685 336953
total used free shared buff/cache available
Mem: 927M 72M 455M 2.4M 399M 799M
Swap: 99M 63M 36M
Filename Type Size Used Priority
/var/swap file 102396 65044 -2
tinymembench v0.4.9 (simple benchmark for memory throughput and latency)
==========================================================================
== Memory bandwidth tests ==
== ==
== Note 1: 1MB = 1000000 bytes ==
== Note 2: Results for 'copy' tests show how many bytes can be ==
== copied per second (adding together read and writen ==
== bytes would have provided twice higher numbers) ==
== Note 3: 2-pass copy means that we are using a small temporary buffer ==
== to first fetch data into it, and only then write it to the ==
== destination (source -> L1 cache, L1 cache -> destination) ==
== Note 4: If sample standard deviation exceeds 0.1%, it is shown in ==
== brackets ==
==========================================================================
C copy backwards : 1098.0 MB/s
C copy backwards (32 byte blocks) : 1097.9 MB/s
C copy backwards (64 byte blocks) : 1102.0 MB/s (0.5%)
C copy : 1112.3 MB/s (0.1%)
C copy prefetched (32 bytes step) : 1131.1 MB/s
C copy prefetched (64 bytes step) : 1131.5 MB/s
C 2-pass copy : 917.8 MB/s
C 2-pass copy prefetched (32 bytes step) : 937.5 MB/s
C 2-pass copy prefetched (64 bytes step) : 936.3 MB/s
C fill : 1529.8 MB/s
C fill (shuffle within 16 byte blocks) : 1531.4 MB/s (0.3%)
C fill (shuffle within 32 byte blocks) : 1532.1 MB/s (0.3%)
C fill (shuffle within 64 byte blocks) : 1532.1 MB/s (0.1%)
---
standard memcpy : 1132.0 MB/s (0.1%)
standard memset : 1532.7 MB/s (0.2%)
---
NEON read : 1881.7 MB/s (0.7%)
NEON read prefetched (32 bytes step) : 2049.6 MB/s (0.2%)
NEON read prefetched (64 bytes step) : 2049.8 MB/s (0.1%)
NEON read 2 data streams : 1766.1 MB/s (0.1%)
NEON read 2 data streams prefetched (32 bytes step) : 1906.2 MB/s (0.9%)
NEON read 2 data streams prefetched (64 bytes step) : 1910.6 MB/s (1.8%)
NEON copy : 1101.4 MB/s (0.7%)
NEON copy prefetched (32 bytes step) : 1138.9 MB/s
NEON copy prefetched (64 bytes step) : 1136.4 MB/s
NEON unrolled copy : 1112.3 MB/s (0.1%)
NEON unrolled copy prefetched (32 bytes step) : 1136.9 MB/s (0.1%)
NEON unrolled copy prefetched (64 bytes step) : 1138.0 MB/s (0.9%)
NEON copy backwards : 1094.7 MB/s (0.9%)
NEON copy backwards prefetched (32 bytes step) : 1127.5 MB/s
NEON copy backwards prefetched (64 bytes step) : 1126.2 MB/s
NEON 2-pass copy : 932.3 MB/s (0.2%)
NEON 2-pass copy prefetched (32 bytes step) : 957.6 MB/s
NEON 2-pass copy prefetched (64 bytes step) : 958.3 MB/s (0.1%)
NEON unrolled 2-pass copy : 914.1 MB/s (0.1%)
NEON unrolled 2-pass copy prefetched (32 bytes step) : 929.3 MB/s
NEON unrolled 2-pass copy prefetched (64 bytes step) : 928.5 MB/s
NEON fill : 1532.6 MB/s (0.1%)
NEON fill backwards : 1529.4 MB/s
VFP copy : 1111.2 MB/s (0.8%)
VFP 2-pass copy : 909.1 MB/s (0.6%)
ARM fill (STRD) : 1529.6 MB/s (0.6%)
ARM fill (STM with 8 registers) : 1533.2 MB/s (0.2%)
ARM fill (STM with 4 registers) : 1531.3 MB/s (0.1%)
ARM copy prefetched (incr pld) : 1135.6 MB/s
ARM copy prefetched (wrap pld) : 1132.7 MB/s
ARM 2-pass copy prefetched (incr pld) : 941.7 MB/s
ARM 2-pass copy prefetched (wrap pld) : 938.4 MB/s
==========================================================================
== Framebuffer read tests. ==
== ==
== Many ARM devices use a part of the system memory as the framebuffer, ==
== typically mapped as uncached but with write-combining enabled. ==
== Writes to such framebuffers are quite fast, but reads are much ==
== slower and very sensitive to the alignment and the selection of ==
== CPU instructions which are used for accessing memory. ==
== ==
== Many x86 systems allocate the framebuffer in the GPU memory, ==
== accessible for the CPU via a relatively slow PCI-E bus. Moreover, ==
== PCI-E is asymmetric and handles reads a lot worse than writes. ==
== ==
== If uncached framebuffer reads are reasonably fast (at least 100 MB/s ==
== or preferably >300 MB/s), then using the shadow framebuffer layer ==
== is not necessary in Xorg DDX drivers, resulting in a nice overall ==
== performance improvement. For example, the xf86-video-fbturbo DDX ==
== uses this trick. ==
==========================================================================
NEON read (from framebuffer) : 62.5 MB/s
NEON copy (from framebuffer) : 59.8 MB/s
NEON 2-pass copy (from framebuffer) : 60.4 MB/s
NEON unrolled copy (from framebuffer) : 61.8 MB/s
NEON 2-pass unrolled copy (from framebuffer) : 60.1 MB/s
VFP copy (from framebuffer) : 396.9 MB/s
VFP 2-pass copy (from framebuffer) : 340.8 MB/s (0.3%)
ARM copy (from framebuffer) : 193.6 MB/s (0.5%)
ARM 2-pass copy (from framebuffer) : 201.9 MB/s
==========================================================================
== Memory latency test ==
== ==
== Average time is measured for random memory accesses in the buffers ==
== of different sizes. The larger is the buffer, the more significant ==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM ==
== accesses. For extremely large buffer sizes we are expecting to see ==
== page table walk with several requests to SDRAM for almost every ==
== memory access (though 64MiB is not nearly large enough to experience ==
== this effect to its fullest). ==
== ==
== Note 1: All the numbers are representing extra time, which needs to ==
== be added to L1 cache latency. The cycle timings for L1 cache ==
== latency can be usually found in the processor documentation. ==
== Note 2: Dual random read means that we are simultaneously performing ==
== two independent memory accesses at a time. In the case if ==
== the memory subsystem can't handle multiple outstanding ==
== requests, dual random read has the same timings as two ==
== single reads performed one after another. ==
==========================================================================
block size : single random read / dual random read
1024 : 0.0 ns / 0.0 ns
2048 : 0.0 ns / 0.0 ns
4096 : 0.0 ns / 0.0 ns
8192 : 0.0 ns / 0.0 ns
16384 : 0.0 ns / 0.0 ns
32768 : 0.0 ns / 0.0 ns
65536 : 5.4 ns / 9.2 ns
131072 : 8.2 ns / 13.1 ns
262144 : 9.7 ns / 14.8 ns
524288 : 13.8 ns / 21.1 ns
1048576 : 89.1 ns / 139.8 ns
2097152 : 129.3 ns / 180.9 ns
4194304 : 155.9 ns / 202.0 ns
8388608 : 169.9 ns / 211.4 ns
16777216 : 178.6 ns / 217.6 ns
33554432 : 184.0 ns / 222.0 ns
67108864 : 187.1 ns / 224.7 ns
7-Zip (a) [32] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,32 bits,4 CPUs LE)
LE
CPU Freq: 1058 1166 1165 1173 1173 1173 1173 1173 1173
RAM size: 927 MB, # CPU hardware threads: 4
RAM usage: 882 MB, # Benchmark threads: 4
Compressing | Decompressing
Dict Speed Usage R/U Rating | Speed Usage R/U Rating
KiB/s % MIPS MIPS | KiB/s % MIPS MIPS
22: 613 100 597 597 | 14389 100 1228 1228
23: 597 100 609 609 | 14144 100 1224 1224
24: 578 100 622 622 | 13765 100 1209 1208
25: 562 100 643 642 | 13262 100 1181 1180
---------------------------------- | ------------------------------
Avr: 100 618 617 | 100 1210 1210
Tot: 100 914 914
7-Zip (a) [32] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,32 bits,4 CPUs LE)
LE
CPU Freq: 1197 1198 1198 1199 1199 1199 1199 1199 1198
RAM size: 927 MB, # CPU hardware threads: 4
RAM usage: 882 MB, # Benchmark threads: 4
Compressing | Decompressing
Dict Speed Usage R/U Rating | Speed Usage R/U Rating
KiB/s % MIPS MIPS | KiB/s % MIPS MIPS
22: 1808 323 545 1759 | 55023 385 1220 4694
23: 1761 328 547 1795 | 54341 388 1210 4702
24: 1675 322 558 1801 | 53224 390 1198 4672
25: 1650 330 572 1884 | 50799 384 1177 4521
---------------------------------- | ------------------------------
Avr: 326 555 1810 | 387 1201 4647
Tot: 356 878 3229
7-Zip (a) [32] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,32 bits,4 CPUs LE)
LE
CPU Freq: 1173 1174 1174 1174 1173 1172 1173 1173 1173
RAM size: 927 MB, # CPU hardware threads: 4
RAM usage: 882 MB, # Benchmark threads: 4
Compressing | Decompressing
Dict Speed Usage R/U Rating | Speed Usage R/U Rating
KiB/s % MIPS MIPS | KiB/s % MIPS MIPS
22: 1793 321 544 1745 | 55182 386 1220 4708
23: 1781 328 553 1815 | 54223 387 1211 4692
24: 1716 328 563 1846 | 53114 389 1197 4663
25: 1659 332 570 1895 | 50657 383 1177 4508
---------------------------------- | ------------------------------
Avr: 327 557 1825 | 386 1201 4643
Tot: 357 879 3234
7-Zip (a) [32] 16.02 : Copyright (c) 1999-2016 Igor Pavlov : 2016-05-21
p7zip Version 16.02 (locale=en_GB.UTF-8,Utf16=on,HugeFiles=on,32 bits,4 CPUs LE)
LE
CPU Freq: 1192 1189 1193 1193 1187 1192 1185 1192 1192
RAM size: 927 MB, # CPU hardware threads: 4
RAM usage: 882 MB, # Benchmark threads: 4
Compressing | Decompressing
Dict Speed Usage R/U Rating | Speed Usage R/U Rating
KiB/s % MIPS MIPS | KiB/s % MIPS MIPS
22: 1804 323 544 1756 | 55533 387 1224 4738
23: 1807 332 554 1841 | 54255 387 1212 4694
24: 1668 320 560 1794 | 53184 390 1197 4669
25: 1735 346 573 1982 | 50997 386 1177 4539
---------------------------------- | ------------------------------
Avr: 330 558 1843 | 387 1203 4660
Tot: 359 880 3252
Compression: 1810,1825,1843
Decompression: 4647,4643,4660
Total: 3229,3234,3252
OpenSSL 1.1.0f 25 May 2017
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
aes-128-cbc 30443.07k 41855.19k 46518.27k 47647.74k 48207.19k 48229.03k
aes-128-cbc 30656.82k 42144.41k 46898.26k 48249.17k 48482.99k 48611.33k
aes-192-cbc 27670.34k 36476.54k 40087.98k 41066.84k 41227.61k 41298.60k
aes-192-cbc 26882.09k 35798.91k 39068.93k 40194.05k 40580.44k 40452.10k
aes-256-cbc 25480.00k 32825.15k 35353.34k 36172.46k 36413.44k 36574.55k
aes-256-cbc 25390.35k 32830.42k 35614.63k 36268.03k 36620.97k 36634.62k
Monitoring data for each run:
System health while running tinymembench:
Time fake/real load %cpu %sys %usr %nice %io %irq CPU VCore
16:56:39: 1400/1400MHz 0.28 14% 0% 12% 0% 0% 0% 60.1°C 1.3250V
16:57:39: 1400/1200MHz 0.67 21% 0% 21% 0% 0% 0% 62.8°C 1.2313V
16:58:39: 1400/1200MHz 0.99 25% 0% 25% 0% 0% 0% 65.5°C 1.2313V
16:59:39: 1400/1200MHz 1.00 25% 0% 25% 0% 0% 0% 65.5°C 1.2313V
17:00:39: 1400/1200MHz 1.00 25% 0% 25% 0% 0% 0% 65.5°C 1.2313V
17:01:39: 1400/1200MHz 1.00 25% 0% 25% 0% 0% 0% 63.4°C 1.2313V
17:02:40: 1400/1200MHz 1.00 25% 0% 25% 0% 0% 0% 64.5°C 1.2313V
17:03:40: 1400/1200MHz 1.00 25% 0% 25% 0% 0% 0% 64.5°C 1.2313V
17:04:40: 1400/1200MHz 1.00 25% 0% 25% 0% 0% 0% 63.4°C 1.2313V
System health while running 7-zip single core benchmark:
Time fake/real load %cpu %sys %usr %nice %io %irq CPU VCore
17:05:07: 1400/1200MHz 1.00 15% 0% 14% 0% 0% 0% 64.5°C 1.2313V
17:05:22: 1400/1200MHz 1.30 25% 0% 24% 0% 0% 0% 64.5°C 1.2313V
17:05:37: 1400/1200MHz 1.76 25% 0% 24% 0% 0% 0% 64.5°C 1.2313V
17:05:52: 1400/1200MHz 2.11 25% 0% 24% 0% 0% 0% 64.5°C 1.2313V
17:06:07: 1400/1200MHz 2.16 25% 0% 24% 0% 0% 0% 64.5°C 1.2313V
17:06:22: 1400/1200MHz 2.57 25% 0% 24% 0% 0% 0% 65.0°C 1.2313V
17:06:37: 1400/1200MHz 2.68 27% 0% 25% 0% 0% 0% 65.0°C 1.2313V
17:06:52: 1400/1200MHz 2.83 25% 0% 24% 0% 0% 0% 64.5°C 1.2313V
17:07:07: 1400/1200MHz 2.64 25% 0% 24% 0% 0% 0% 64.5°C 1.2313V
17:07:22: 1400/1200MHz 2.71 25% 0% 24% 0% 0% 0% 64.5°C 1.2313V
17:07:37: 1400/1200MHz 2.76 25% 0% 24% 0% 0% 0% 64.5°C 1.2313V
17:07:52: 1400/1200MHz 2.66 25% 0% 24% 0% 0% 0% 64.5°C 1.2313V
17:08:08: 1400/1200MHz 2.67 25% 0% 24% 0% 0% 0% 65.5°C 1.2313V
17:08:23: 1400/1200MHz 2.59 25% 0% 24% 0% 0% 0% 64.5°C 1.2313V
17:08:38: 1400/1200MHz 2.76 25% 0% 24% 0% 0% 0% 64.5°C 1.2313V
17:08:53: 1400/1200MHz 2.59 25% 0% 24% 0% 0% 0% 64.5°C 1.2313V
17:09:08: 1400/1200MHz 2.61 25% 0% 24% 0% 0% 0% 63.9°C 1.2313V
17:09:23: 1400/1200MHz 2.48 25% 0% 24% 0% 0% 0% 64.5°C 1.2313V
17:09:38: 1400/1200MHz 2.68 25% 0% 24% 0% 0% 0% 65.0°C 1.2313V
System health while running 7-zip multi core benchmark:
Time fake/real load %cpu %sys %usr %nice %io %irq CPU VCore
17:09:41: 1400/1200MHz 2.78 15% 0% 14% 0% 0% 0% 65.0°C 1.2313V
17:10:27: 1400/1200MHz 2.56 60% 1% 59% 0% 0% 0% 70.9°C 1.2313V
17:11:12: 1400/1200MHz 3.08 82% 1% 81% 0% 0% 0% 72.0°C 1.2313V
17:11:57: 1400/1200MHz 3.29 78% 1% 77% 0% 0% 0% 73.6°C 1.2313V
17:12:42: 1400/1200MHz 3.62 82% 1% 81% 0% 0% 0% 74.1°C 1.2313V
17:13:28: 1400/1200MHz 3.54 80% 1% 78% 0% 0% 0% 74.7°C 1.2313V
17:14:13: 1400/1200MHz 3.41 82% 1% 81% 0% 0% 0% 75.2°C 1.2313V
System health while running OpenSSL benchmark:
Time fake/real load %cpu %sys %usr %nice %io %irq CPU VCore
17:14:30: 1400/1200MHz 3.63 19% 0% 18% 0% 0% 0% 75.8°C 1.2313V
17:14:40: 1400/1200MHz 3.07 0% 0% 0% 0% 0% 0% 70.4°C 1.2313V
17:14:50: 1400/1200MHz 2.76 25% 0% 25% 0% 0% 0% 70.9°C 1.2313V
17:15:00: 1400/1200MHz 2.49 25% 0% 25% 0% 0% 0% 70.9°C 1.2313V
17:15:10: 1400/1200MHz 2.26 26% 0% 25% 0% 0% 0% 69.8°C 1.2313V
17:15:20: 1400/1200MHz 2.06 25% 0% 25% 0% 0% 0% 69.8°C 1.2313V
17:15:30: 1400/1200MHz 1.90 25% 0% 25% 0% 0% 0% 70.4°C 1.2313V
17:15:40: 1400/1200MHz 1.76 25% 0% 25% 0% 0% 0% 69.8°C 1.2313V
17:15:50: 1400/1200MHz 1.64 25% 0% 25% 0% 0% 0% 69.8°C 1.2313V
17:16:00: 1400/1200MHz 1.54 25% 0% 25% 0% 0% 0% 69.8°C 1.2313V
17:16:10: 1400/1200MHz 1.46 25% 0% 25% 0% 0% 0% 69.3°C 1.2313V
17:16:21: 1400/1200MHz 1.39 25% 0% 25% 0% 0% 0% 69.8°C 1.2313V
Linux 4.14.50-v7+ (raspberrypi) 26/07/18 _armv7l_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
18.19 0.03 0.67 0.56 0.00 80.55
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
mmcblk0 5.54 157.05 63.64 844101 342025
total used free shared buff/cache available
Mem: 927M 67M 821M 1.3M 38M 816M
Swap: 99M 64M 35M
Filename Type Size Used Priority
/var/swap file 102396 66388 -2
/boot/config.txt:
disable_overscan=1
dtparam=audio=on