Skip to content

Latest commit

 

History

History
328 lines (227 loc) · 12.8 KB

README.md

File metadata and controls

328 lines (227 loc) · 12.8 KB

Compliance Warden

A pluggable compliance checker (ISOBMFF, HEIF/MIAF/AVIF, AV1 HDR10+, AV1-ISOBMFF)

Introduction

The Compliance Warden, often abbreviated as "CW" or "cw" or "the warden", is a compliance checker. CW has been developed as a reference software for MPEG MIAF, AOM AVIF, and AOM AV1 HDR10+. It is meant to be extended to check MP4, CMAF, and many other file formats.

CW decouples the processing phases. First it parses the input to build an AST stored in very generic structures. Then it processes the AST to validate sets of rules attached to specifications. This approach offers a lot of flexibility and extensibility.

CW is written in modern C++. Binary test vectors are described in assembly (x86 nasm syntax) because why or Why Not. CW derives from a more generic effort called Abstract started by contributors from the GPAC open-source project.

The Compliance Warden is distributed under the BSD-3 permissive license.

Useful information

Online version

An online version is available here. Note that the software is executed in your browser and doesn't upload any data outside your computer.

Usage

New option parser (introduced in July 2023):

$ bin/cw.exe -h
Compliance Warden, version v32-master-rev14-g363d8d3

Usage:
    -s, --spec                              Specification name.
    -f, --format                            Output format: "raw" (default), or "json"
    -l, --list                              List available specifications or available rules.
    -v, --version                           Print version and exit.
    -h, --help                              Print usage and exit.
    -t, --test                              Don't print warnings when switching to legacy mode.

The old usage is deprecated and will be removed in v34:

$ bin/cw.exe
Compliance Warden, version v32-master-rev14-g363d8d3

Usage:
- Run conformance:          bin/cw.exe <spec> input.mp4 [json]
- List specifications:      bin/cw.exe list
- List specification rules: bin/cw.exe <spec> list
- Print version:            bin/cw.exe version

Specifications

CW is mainly sponsored by companies and standardization groups to validate specific versions of specifications they develop or use.

The master branch only references official specifications. Draft versions or updates are meant to be in separate branches. To know more please read the design principles.

However, once a specification is validated, we accept to add new rules progressively.

$ bin/cw.exe --list
================================================================================
Specification name: av1hdr10plus
            detail: HDR10+ AV1 Metadata Handling Specification, 7 December 2022
https://github.com/AOMediaCodec/av1-hdr10plus/commit/63bacd21bc5f75ea6094fc11a03f0e743366fbdf
https://aomediacodec.github.io/av1-hdr10plus/
        depends on: "av1isobmff" specifications.
================================================================================

================================================================================
Specification name: av1isobmff
            detail: AV1 Codec ISO Media File Format Binding v1.2.0, 12 December 2019
https://github.com/AOMediaCodec/av1-isobmff/commit/ee2f1f0d2c342478206767fb4b79a39870c0827e
https://aomediacodec.github.io/av1-isobmff/v1.2.0.html
        depends on: "isobmff" specifications.
================================================================================

================================================================================
Specification name: avif
            detail: AVIF v1.0.0, 19 February 2019
https://aomediacodec.github.io/av1-avif/
        depends on: "miaf" specifications.
================================================================================

================================================================================
Specification name: isobmff
            detail: ISO Base Media File Format
MPEG-4 part 12 - ISO/IEC 14496-12 - m17277 (6th+FDAM1+FDAM2+COR1-R4)
        depends on: none.
================================================================================

================================================================================
Specification name: heif
            detail: HEIF - ISO/IEC 23008-12 - 2nd Edition N18310
        depends on: "isobmff" specifications.
================================================================================

================================================================================
Specification name: miaf
            detail: MIAF (Multi-Image Application Format)
MPEG-A part 22 - ISO/IEC 23000-22 - w18260 FDIS - Jan 2019
        depends on: "heif" specifications.
================================================================================

Building

Prerequisites

NASM and a C++14 compiler.

Native build

Linux, Windows:

$ make -j

MacOS X and BSD-likes:

$ CXX=scripts/darwin.sh make -j

or

$ export CXX=scripts/darwin.sh
$ make -j

Cross-compiling

Simply override CXX to use your target toolchain.

Example for a Windows 64 bit target:

$ CXX=x86_64-w64-mingw32-g++ make

or

$ export CXX=x86_64-w64-mingw32-g++
$ make

or

$ BIN=bin_32 CXX=i686-linux-gnu-g++-12 ./check

Emscripten (WASM)

em++ -std=c++14 -DCW_WASM bin/cw_version.cpp `find src -name '*.cpp'` -Isrc -o ComplianceWarden.js -O3 -s WASM=1 -s EXPORTED_FUNCTIONS=_specFindC,_specCheckC,_specListRulesC,_printVersion,_malloc,_free -s FORCE_FILESYSTEM=1 -s EXIT_RUNTIME=0 -s ALLOW_MEMORY_GROWTH=1 -s EXPORTED_RUNTIME_METHODS=stringToUTF8 -sASSERTIONS --pre-js scripts/wasm-fs-pre.js

See https://gpac.github.io/ComplianceWarden-wasm/ for a demo.

The HTML integration source code is hosted at https://github.com/gpac/ComplianceWarden-wasm.

Testing

The Compliance Warden includes known good tests and known bad tests. This ensures the software is stable to false positives.

./check

NB: don't forget to set CXX when your toolchain requires so e.g. for Darwin (MacOS) CXX=scripts/darwin.sh ./check.

Contributing

Build dependencies

  • GNU Bash
  • GNU g++ version 7+
  • GNU make
  • NASM 2.01+

Code formatter (optional)

Install clang-format.

Pre-commit: format, build and run tests before committing

./check

Ensure good code coverage

You need lcov.

scripts/cov.sh

Note: On Darwin (MacOS) systems you may need to install GNU version of g++ and gcov (e.g. brew install gcc). Then change ./scripts/darwin.sh to alias GNU versions instead of Clang versions.

Modifying test results

The tests (launched with ./check) will stop running on first error.

To update the test results, uncomment the # cp "$new" "$ref" line in the tests/run script. This avoids tests to halt when an error occurs. Please review carefully the changes before updating test results.

Code architecture

Repository file structure

check                      Top-level full-test script. Reformats + builds + tests.
                           Must pass without error before each commit.

src/                       Source files

tests/                     Integration tests (tests calling the entry points)
tests/run                  Entry point for the test script. Usage: "tests/run bin"
scripts/cov.sh             Coverage script. Generates a coverage report reflecting the
                           current status of the test suite,
scripts/sanitize.sh        Runs the test suite under asan+ubsan.

Principles

The Compliance Warden is made of three parts:

  • a file parser common_boxes.cpp that can be extended (or superseded) by each specification,
  • some array of rules stored in specs/,
  • an application stored in src/app/cw.cpp that probes the files, launches the tests, and produces a human-readable report.

The parsing is decoupled from the rules. This allows a lot of flexibility such as:

  • the replacement of the parser by an external tool,
  • the implementation of rules in a different language.

The result of the parsing phase is comparable to an AST. This AST is then processed by the rules.

The datastructures are generic. This allows to easily serialize them. This is useful when plugging new languages or building new tests.

Tests

A test is a pair of a file format description in the NASM syntax (example) and a reference result (example).

Test vectors edition

Test vectors are represented using some x86 assembly, in a textual human-editable form. In practice only labels and two instructions (db to write 8 bits and dd to write 32 bits) are used. See this example of a mdat box containing some AV1 OBU:

mdat_start:
    dd BE(mdat_end - mdat_start)
    dd "mdat"
     ; obu(0) 
    db 0x0A ; forbidden(1) obu_type(4) obu_extension_flag(1) obu_has_size_field(1) obu_reserved_1bit(1) 
    db 0x0F ; leb128_byte(8) 
mdat_end:

At the time of creating the project, we couldn't find any way to create both valid and invalid editable test vectors. Annotated assembly (with symbols' names, num_bits, and values) looked like a sensible choice.

The easiest way is to create a new test vector is to derive an existing one.

When this is not possible, one needs to disassemble an existing binary file. Contact [email protected] if you need some help. The assembly file needs to be stripped from its data (generally shortining radically the mdat box) and metadata (removing unused boxes and strings).

The key point is to understand that these test vectors are not intended to be valid media files. We may want to add valid samples to the tests though (e.g. retrieving files and testing them) ; in this case other tools (e.g. MP4Box -diso or gpac -i FILE inspect:deep:analyze=bs) already provides some deep view of what's in the file.

Adding a test

A test is a function:

  • Input is both a box tree (from the parsing phase) and a link to the report.
  • Output is written to the report: warning, errors, and covered() to assess that the rule was exercised by the input sample.
struct RuleDesc
{
  [...]
  // human-readable description of the rule
  const char* caption;

  // optional id from the specification
  const char* id = nullptr;

  // apply this rule to the file 'root',
  // will push the results (messages) to the 'out' report.
  void (* check)(Box const& root, IReport* out);
};

Adding a specification

A specification is just a list of rules with some general information:

struct SpecDesc
{
  // short name for the spec (used for command-line spec selection).
  const char* name;

  // human-readable description of the spec (name, version, date, etc.).
  const char* caption;

  // list of specs which this spec depends on.
  std::vector<const char*> dependencies;

  // list of compliance checks for this spec.
  std::vector<RuleDesc> rules;

  // checks will only be executed if this returns true.
  bool (* valid)(Box const& root) = nullptr;
};

Limitations

Some aspects are not activated:

  • Brand presence checks are not fully activated. When activated, relaxed brands (e.g. 'MiPr') emit a lot of messages that bring little value to the user. Aggressive shall/should normative statements need to be balanced at standardization level.
  • Codec-level parsing is incomplete. It should be deferred in most case to an external project that can analyze both the metadata and the data (e.g. GPAC). This hasn't been done due to licensing concerns.
  • Some rules related to pixel formats (color spaces, ...) (computations and consistency) may only be checked by a player. Hence they are considered outside of the scope of this project.
  • Some rules related to pixel formats are only processed for AV1. Because we embed some codec-level parsing for AV1.
  • Some rules are not implemented due to missing content (e.g. Apple Audio Twos).

Acknowledgments

This work was initiated as part of the MPEG MIAF conformance software.

The Alliance for Open Media (AOM) sponsored the work on AVIF, AV1-ISOBMFF, and AV1 HDR10+.