SMART Audio classifier

Introduction

This package contains sample tools for audio classification. The following functionality is provided:

Training an audio classification model from a set of audio samples.
Testing the classification rates of the model using a set of audio samples.
Classifying an audio file and sending the results to an SMART EdgeNode server.
Creating a feed description file that can be used to register the feed on a SMART EdgeNode server.

Requirements

The following packages are required to be installed for this classification tool:

Python3: http://python.org/ (tested with version 3.2.3)
Speech Signal Processing Toolkit (SPTK): http://sp-tk.sourceforge.net/ (tested with version 3.6)
SoX - Sound eXchange: http://sox.sourceforge.net/ (tested with version 14.4.0)

Configuration

All the tools use the common configuration file: smart/audio/Configuration.py. Before running any tools it is best to review this configuration file. The user should verify the "sptk" variable points to the location of the SPTK binaries folder and that the "sox" variable points to the sox binary.

Audio data

The classification tools support any audio format which is supported by the sox utility. During operation the audio files will be converted by "sox" to 16 bit, single channel, raw PCM file with sample rate specified by the configuration file.

For training and testing the tools require a set of annotated audio files. For each audio file the user should create a labels files. The label files are plain text files which the following format:

<t1> <t2> <audio class>
...

where t1 and t2 are the start and end time of the range in seconds and 'audio class' is one word label of this range. The same classes will be used for classification. The time ranges can overlap so any given point of time can belong to zero, one or more classes.

An example for a label file could be:

0.0          70.77634604   traffic
92.93723717  146.67739816  traffic
104.15618831 119.39180096  people
130.88776323 145.1538369   people
141.69119766 203.74169282  music
175.07103992 202.21813156  traffic
206.51180421 237.26004065  traffic

Label files can be created with the help of tools such as Audacity (http://audacity.sourceforge.net/) and Praat(http://www.fon.hum.uva.nl/praat/).

An audio set is described by a text file containing the paths to pairs of audio and labels file. An example for audio set file could be:

audio1.wav audio1.lbl
audio2.wav audio2.lbl
...

Training audio classification models

The train.py is used for training the classification models. For example:

$ export PYTHONPATH="$PYTHONPATH:SmartCode"
$ python3 smart/audio/train.py -m Data/models Data/training_files.txt

This will train the classification modules using the files listed in the training_files.txt and write the results to the Data/models folder.

Use the '-h' switch for description of all available options. This also works for all the other commands listed below.

Testing classification rates

The testing.py script is used for testing the classification models. This allows to compare the labels produced by the models to labels produced manually. It is best to test the models with files that were not included in the training set. For example:

$ python3 smart/audio/test.py -m Data/models Data/testing_files.txt

This will test the classification modules in the Data/models folder using the files listed in the testing_files.txt. The output is the equal error rate (EER) for each class.

Generating a feed description file

Feed description files are used to register a feed in an EdgeNode (see http://opensoftware.smartfp7.eu/projects/smart/wiki/EdgeNodePostCommands#CreateFeed). A utility is included for generating this file from the audio classes information. This should be used after the models data is ready.

For example:

$ python3 smart/audio/description.py -m Data/models -f audio_feed_example -c microphone1 -e [email protected] -d "Audio classification from outdoor microphone" -t "audio,street,traffic,people" feed.xml

This will create the feed.xml file. It is recommended to edit this file and verify its content before posting it to the server.

Posting data to the edge server

The classify utility can process an audio file and post the results to the EdgeNode (see: http://opensoftware.smartfp7.eu/projects/smart/wiki/EdgeNodePostCommands#Append).

For example:

$ python3 smart/audio/classify.py -m Data/models -e http://dusk.ait.gr/couchdb/audio_feed_example -n microphone1 audio.wav

Pre-trained models

The models directory contains pre-trained models files. The models were trained using more than 10 hours of audio data collected in the SMART project. The models supports the following audio classes:

Applause
Crowd
Music
Siren
Speaker
Traffic

Audio data

A list of audio files and annotations is provided at Samples/sample_list.txt. Those sample were to generate the database for the article: Zvi Kons, Orith Toledo-Ronen: Audio event classification using deep neural networks. INTERSPEECH 2013.

We were told that some of the audio files listed in this set are no longer avaliable. Since we don't know why they were removed, we can't redistribute them to other poeple.

License

SMART FP7 - Search engine for MultimediA enviRonment generated contenT Webpage: http://smartfp7.eu

This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.

Contributor(s): Zvi Kons [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
models		models
samples		samples
smart		smart
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README.txt		README.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SMART Audio classifier

Introduction

Requirements

Configuration

Audio data

Training audio classification models

Testing classification rates

Generating a feed description file

Posting data to the edge server

Pre-trained models

Audio data

License

About

Releases

Packages

Contributors 2

Languages

License

SmartSearch/Audio-Scene-Analysis

Folders and files

Latest commit

History

Repository files navigation

SMART Audio classifier

Introduction

Requirements

Configuration

Audio data

Training audio classification models

Testing classification rates

Generating a feed description file

Posting data to the edge server

Pre-trained models

Audio data

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages