Benchdnn drivers support a set of options that are available for every
driver. Some of them represent global state settings which modify the execution
behavior. Options may be supported in any mode
or be specific to correctness
or performance validation.
--allow-enum-tags-only=BOOL
instructs the driver to validate format tags
against the documented tags from dnnl_format_tag_t
enumeration only when
BOOL
is true
(the default). When BOOL
is false
, any valid format tag
is accepted when creating a testing object.
--batch=FILE
instructs the driver to take options and problem descriptors
from a FILE
. If several --batch
options are specified, the driver reads
input files consecutively. Nested inclusion of the --batch
option is
supported. The driver searches for a file by extracting a directory where FILE
is located and tries to open dirname(FILE)/FILE
. If the file is not found, it
tries to find the file in a default path
/path_to_benchdnn_binary/inputs/DRIVER/FILE
. If the file is still not
found, an error is reported. Note that --batch
option doesn't change the
previous state.
--canonical=BOOL
instructs the driver to print a canonical form of a
reproducer line. When BOOL
is false
(the default), the driver prints the
minimal reproducer line, omitting options and problem descriptor entries with
default values.
--cpu-isa-hints=HINTS
specifies the ISA specific hints to the CPU engine.
HINTS
values can be none
(the default), no_hints
or prefer_ymm
.
None
value respects the DNNL_CPU_ISA_HINTS
environment variable setting,
while others override it with a chosen value. Settings other than none
take
place immediately after the parsing and subsequent attempts to set the hints
result in a runtime error.
--ctx-init=MAX_CONCURENCY[:CORE_TYPE[:THREADS_PER_CORE]]
specifies the
threading context for a testing object creation.
MAX_CONCURRENCY
is a positive integer value orauto
(default) and specifies the maximum number of threads in the context.CORE_TYPE
is a non-negative integer value orauto
(default) that specifies the type of cores used in the context for hybrid CPU systems,0
being the largest cores available on the system (TBB runtime only).THREADS_PER_CORE
is a positive integer value orauto
that allows users to enable (value2
) or disable (value1
) hyper-threading (TBB runtime only).
--ctx-exe=MAX_CONCURENCY[:CORE_TYPE[:THREADS_PER_CORE]]
specifies the
threading context for a testing object execution. The setting values follow ones
from the ctx-init
option.
--engine=KIND[:INDEX]
specifies an engine kind KIND
to be used for
benchmarking. KIND
values can be cpu
(the default) or gpu
. An optional
non-negative integer value of INDEX
may be specified followed by a colon
:
, such as --engine=gpu:1
, which means to use a second GPU device.
Enumeration follows the library enumeration identification. It may be
checked up with ONEDNN_VERBOSE output. By default, INDEX
is 0
. If the
index is greater or equal to the number of devices of requested kind
discovered on a system, a runtime error occurs.
--mem-check=BOOL
instructs the driver to perform a device RAM capability
check if the problem fits the device including all service memory allocations.
The check is enabled when BOOL
is true
(the default) and disabled otherwise.
--memory-kind=KIND
specifies the memory kind to test with DPC++ and OpenCL
runtimes. KIND
values can be usm
(default), buffer
, usm_device
(to use malloc_device) or usm_shared
(to use malloc_shared).
--mode=MODE
specifies benchdnn mode to be used for benchmarking.
MODE
values can be:
C
orc
for correctness testing (the default)P
orp
for performance testingF
orf
for fast performance testing, an alias for--mode=P --mode-modifier=PM --max-ms-per-prb=10
CP
orcp
for both correctness and performance testingB
orb
for bitwise (numerical determinism) testingR
orr
for run modeI
ori
for initialization modeL
orl
for listing mode
Refer to modes for details.
--mode-modifier=MODIFIER
specifies a modifier to a selected benchmarking mode.
MODIFIER
values can be:
- empty for no modifiers (the default)
P
orp
for parallel backend object creationM
orm
for disabling usage of host memory (GPU only)
Refer to mode modifiers for details.
Note: The P
modifier sets the default value of scratchpad mode to user
.
For the benchdnn functionality to work properly, the recommendation is to
pass this option before the driver name so that the modifier is processed
before the execution flow starts and can propagate a new scratchpad value. The
flow is affected when the user passes descriptors directly. When using batch
files, no difference is observed since a batch file starts a new parsing cycle
underneath, and a scratchpad value is propagated.
--stream-kind=KIND
specifies the stream kind to test with DPC++ and OpenCL
runtimes by providing flags to the stream. The queue object is managed inside
the library. KIND
values can be def
(default), in_order
, or
out_of_order
. Refer to dnnl_stream_flags_t
for more information.
--repeats-per-prb=N
specifies the N
number of times to run a given problem.
The default N
is 1
. When several problems are provided, each of them will be
executed N
times. The option is designed to help reproduce sporadic failures
when effects like race condition or garbage values in a memory may not be
triggered from a single run but from several runs.
--reset
instructs the driver to reset DRIVER-OPTIONS (not COMMON-OPTIONS!) to
their default values. The only exception is --perf-template
option which will
not be reset. COMMON-OPTIONS describe a global state and, thus, are not affected
by this option.
--skip-impl=STR
instructs the driver to jump to the next implementation
in the list if the name of the returned one matches STR
symbol-by-symbol.
STR
is a string literal with no spaces. When STR
is empty (the default), the
driver uses the first fetched implementation. STR
supports several patterns to
be matched against through the comma ,
delimiter between patterns. The name of
a fetched implementation is searched against all specified patterns; and if any
of the patterns match any part of the implementation name string, it counts as a
hit. For example, --skip-impl=ref,gemm
causes ref:any
or x64:gemm:jit
implementations to be skipped.
--start=N
specifies the test index N
to start testing from. All tests
before the index N
will be skipped.
--verbose=N
, or a short form -vN
, specifies the driver verbosity level.
Additional information is printed to the stdout depending on a level N
. N
is
a non-negative integer value. The default value is 0
. Refer to
verbose for details.
--attr-same-pd-check=BOOL
instructs the driver to compare two primitive
descriptors - the one with user requested attributes and the one without any
attributes. When BOOL
is true
, the check returns an error if implementation
names mismatch for two descriptors. It indicates that appending an attribute
changes the implementation dispatching which is an undesired behavior. When
BOOL
is false
(the default), the check is disabled.
--check-ref-impl=BOOL
instructs the driver to compare the implementation name
string against the ref
string pattern. When BOOL
is set to true
, the check
returns an error if the name matches the reference pattern. By default, the
check is disabled. It's useful to catch unexpected fallbacks to slow reference
implementations from a big batch of problems.
--fast-ref=BOOL
instructs the driver to use an optimized implementation
from the library as a reference path for correctness comparison when BOOL
is
true
(the default). Refer to additional documentation
for more information.
--cold-cache=MODE
instructs the driver to enable a cold cache measurement
mode. When MODE
is set to none
(the default), cold cache is disabled.
When MODE
is set to wei
, cold cache is enabled for weights argument
only. This mode targets forward and backward by data propagation kinds. When
MODE
is set to all
, cold cache is enabled for each execution argument.
This targets any propagation kind but mostly bandwidth-limited functionality
to emulate first access to data or branching cases. When MODE
is set to
custom
, cold cache is enabled for specified arguments, but it requires source
code adjustments. Refer to cold cache for more information.
--fix-times-per-prb=N
specifies the N
number of rounds per problem to run,
where N
is a non-negative integer value. When N
is set to 0
(the default),
the number of rounds will be established by the time criterion instead. For N
greater than 0
, the number of runs will be overridden by this setting. The
option makes performance profiling easier when a certain number of cycles is
desired or when a specific number of runs is expected.
--max-ms-per-prb=N
specifies the N
time limit in milliseconds per problem to
run. N
is a positive integer value in a [1e1, 6e4]
range. When a provided
value is out of the range, it is saturated to the board values. The default is
3e3
, or 3 seconds. The option is useful, for example, to stabilize the
performance numbers reported for small problems on CPU.
--num-streams=N
specifies the number N
of streams used for performance
benchmarking. The option takes place for GPU only and uses a single stream by
default.
--perf-template=STR
specifies the format of a performance report. STR
values can be def
(the default), csv
or a custom set of supported flags.
Refer to performance report for details.