Skip to content

Speech-music discriminator based on open source Opus codec.

License

Notifications You must be signed in to change notification settings

thomasdiesenreiter/opus_sm

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Opus speech vs. music discriminator

Example fig

Command line speech vs. music discriminator tool based on the built in speech vs. music discriminator of the Opus codec. This tool calculates the framewise music probabilities for a given audio file. Optionally also provides speech-music segmentation. See above example figure showing the segmentation, the framewise music probabilities and the waveform of an example audio file.

The algorithm which calculates the framewise music probabilities was created by the Opus devs. Details: https://people.xiph.org/~xiphmont/demo/opus/demo3.shtml

Build instructions

The tool is built when the custom-modes flag is enabled (it is enabled by default in this fork). Compilation steps (in the root directory):

./autogen.sh
./configure
make

The resulting executable opus_sm_demo is statically linked against opus libraries. This way it can be distributed individually.

To disable custom-modes, and thus this tool:

./configure --enable-custom-modes=no

Usage

The program will print the syntax if it is executed without arguments:

./opus_sm_demo
SM-Test speech music discriminator program

Usage: ./opus_sm_demo <infile> [outfile pmusic] [outfile labels] [sm min dur] [b min dur]

    infile           path to a 16 bit, 48KHz sample rate PCM WAVE file
    outfile pmusic   path of the music probability output file (default: stdout)
    outfile labels   path of the labels (m|s|b) output file
    sm min dur       speech & music labeled segments' min duration
    b min dur        both labeled segments' min duration

It is important to note, that only 16 bit, 48KHz PCM WAVE files are supported. The WAVE file should not contain any metadata. Using ffmpeg to convert an audio file to the expected format:

ffmpeg -i input.flac -ar 48000 -y -map_metadata -1 -flags +bitexact -acodec pcm_s16le output.wav

That's all folks!

About

Speech-music discriminator based on open source Opus codec.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • C 84.6%
  • C++ 6.2%
  • Objective-C 5.9%
  • M4 1.3%
  • Makefile 0.7%
  • Assembly 0.7%
  • Other 0.6%