Skip to content

Latest commit

 

History

History
45 lines (28 loc) · 2.13 KB

README.md

File metadata and controls

45 lines (28 loc) · 2.13 KB

Opus speech vs. music discriminator

Example fig

Command line speech vs. music discriminator tool based on the built in speech vs. music discriminator of the Opus codec. This tool calculates the framewise music probabilities for a given audio file. Optionally also provides speech-music segmentation. See above example figure showing the segmentation, the framewise music probabilities and the waveform of an example audio file.

The algorithm which calculates the framewise music probabilities was created by the Opus devs. Details: https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml

Build instructions

The tool is built when the custom-modes flag is enabled (it is enabled by default in this fork). Compilation steps (in the root directory):

./autogen.sh
./configure
make

The resulting executable opus_sm_demo is statically linked against opus libraries. This way it can be distributed individually.

To disable custom-modes, and thus this tool:

./configure --enable-custom-modes=no

Usage

The program will print the syntax if it is executed without arguments:

./opus_sm_demo
SM-Test speech music discriminator program

Usage: ./opus_sm_demo <infile> [outfile pmusic] [outfile labels] [sm min dur] [b min dur]

    infile           path to a 16 bit, 48KHz sample rate PCM WAVE file
    outfile pmusic   path of the music probability output file (default: stdout)
    outfile labels   path of the labels (m|s|b) output file
    sm min dur       speech & music labeled segments' min duration
    b min dur        both labeled segments' min duration

It is important to note, that only 16 bit, 48KHz PCM WAVE files are supported. The WAVE file should not contain any metadata. Using ffmpeg to convert an audio file to the expected format:

ffmpeg -i input.flac -ar 48000 -y -map_metadata -1 -flags +bitexact -acodec pcm_s16le output.wav

That's all folks!