Skip to content

2013.04.05 Speech: Meeting Report

cotemyriam edited this page Apr 11, 2013 · 7 revisions

Sub Projects | Meeting Reports | Speech Sub-Group

Present: YB, Nicolas, Guillaume A., Razvan, Yann, Stephan

  • Nicolas looked up mood classification from music, he found that there are handcrafted features, so it might be a good idea to include them as input at some level
    • formalize what those potentially good features and code them up, these are mostly frame-based features, Nicolas & Razvan could work on this. Nicolas proposes input-level aggregation features (inside a window: variance, mean, multi-scale = different window sizes, e.g. 1 for the whole sequence, one for short-term). Nicolas will implement some of the features.
  • look for any literature on emotion recognition from speech (Yann & Nicolas). Yann report about papers by April 12th.
  • The output of our systems could be either frame-based or sequence-based (but we only have sequence-level targets in the competition data) [frame-based = one output per frame, but input usually covers a whole window]. Some features may be sequence-level and could either be integrated frame-wise or at a higher level of the system that is sequence-wise. Nicolas (with help of Razvan) to code and apply the features (existing features, by April 8, and new features by the 17th).
  • CRUCIAL TASK: get other datasets, music+mood data, unlabeled speech with emotional content, look at see http://emotion-research.net/databases and maybe more. Guillaume A will investigate the Vera Am Mittag data and Humaine data. We might get also data from the 'emotion challenge' at Interspeech 2009. Deadline: Tuesday 10th.
  • Define a benchmark validation set (combining the challenge validation set - speech part, and a subset of the other datasets we find). Guillaume A. will do it with Razvan (April 10).
  • Need to consider what object class we need for holding the different features and static information about them and the raw signal. Must be coordinated with PV who does it for videos (maybe subclass PV's classes). Nicolas just uses a matrix for the sequence. Razvan will work on the data format question (deadline = Tuesday 10th).
  • Deep MLP with an input window. Yann will do it. First results with MFCC by April 12th on the challenge dataset.
  • Deep Conv Net. Stephan & Yann will do it. Stephan will get familiar with convnets in Pylearn2 (April 12).
  • Try a bidirectional RNN. Razvan will do it.
  • Could use temporal coherence as a regularizer.
  • Stack of DAEs could also be pre-trained on unlabeled data. Guillaume A.
  • Razvan would like to try a deep MLP with recurrence (from top hidden layer back to the input of the deep net), with maxout/rectifiers in lower hidden layers and tanh on the top hidden layer.
  • Yann proposes to use a max pooling of the top hidden layer. He says it makes a huge difference compared to pooling the probabilistic outputs.
  • Everyone should learn more about pylearn2 to see if it can handle the needs of this sub-project.