-
Notifications
You must be signed in to change notification settings - Fork 24
04. Emotion Classification from Speech
Sub Projects | Meeting Reports
Priority: Medium
Emotion classification from speech: learn to map from acoustic sequence of the video clip to frame-based or sequence-based emotion classification.
YB (lead), Yann, Nicolas, Razvan, (need to do this quickly because both leave in May), Guillaume A., Stephan
Meeting Reports (Brainstorming):
Audio features
Nicolas: I generated a first set of basic hand-crafted frame-level audio features to get us started, using Yaafe.
The features are in /data/lisa/data/audio_features/ in pickled numpy matrices (one file for each mp3 in the Train/Val folders). There are 3 subsets of features:
- raw : Only the magnitude spectrogram.
- minimal : Only MFCCs, MFCC derivates, AutoCorrelation, Loudness, Flux and other low-dimensional perceptual features (concatenated).
- full : includes the above plus other features from the Yaafe library (ZCR, TemporalShapeStatistics, SpectralRolloff, SpectralShapeStatistics, SpectralFlatness, SpectralDecrease, SpectralFlatnessPerBand, SpectralCrestFactorPerBand, LPC, LSF, ComplexDomainOnsetDetection, Mel spectrum, MFCC second derivatives, Envelope, EnvelopeShapeStatistics, AmplitudeModulation, OBSI, OBSIR).
Mostly default parameters have been used with an analysis window size of ~25 ms, hopping size 12.5 ms, as used in the literature (although variable). The audio input was normalized.
There are also whitened versions of the above with .pca.pkl extensions. Each component has been whitened independently (0-mean 1-variance, diagonal covariance, no dimensionality reduction) using the training set distribution, in order to preserve the topology of the original feature space. PCA objects containing information about the applied transformation have been saved in the same directory with .pca extensions.
Note that there are no clip-level features in those sets; no aggregation / pooling / multi-scale statistics whatsoever were performed.
Generation code in lisa_emotiw/emotiw/boulanni/audio_features.py
Vera Am Mittag dataset
data/lisa/data/Vera_Am_Mittag/extracted_audio
See the entry Data Files for a description about this data.
Experimental Results