Skip to content

Latest commit

 

History

History
97 lines (73 loc) · 4.41 KB

File metadata and controls

97 lines (73 loc) · 4.41 KB

Machine Learning and having it deep and structured

Build Status Coverage Status

About

Implementations and homeworks of the course Machine Learning and having it deep and structured of National Taiwan University (offered by Hung-yi Lee):

  • Constructed and trained variants of neural networks by Theano
  • Attemped to solve the sequence labeling problem in speech recognition (phoneme labeling)
  • Deep Neural Network (DNN) with dropout, maxout and momentum optimization
  • Bidirectional Recurrent Neural Network (RNN) with dropout and RMSProp optimization
  • Bidirectional Long-Short Term Memory (LSTM) with peephole and NAG optimization
  • Hidden Markov Model (HMM) on top of RNN to improve the performance

Course page

Syllabus

Neural Networks and Training:

  • What is Machine Learning, Deep Learning and Structured Learning?
  • Neural Network Basics | Backpropagation | Theano: DNN
  • Tips for Training Deep Neural Network
  • Neural Network with Memory | Theano: RNN
  • Training Recurrent Neural Network
  • Convolutional Neural Network (by Prof. Winston)

Structured Learning and Graphical Models:

  • Introduction of Structured Learning | Structured Linear Model | Structured SVM
  • Sequence Labeling Problem | Learning with Hidden Information
  • Graphical Model, Gibbs Sampling

Extensions, New Applications and Trends:

  • Markov Logic Network
  • Deep Learning for Human Language Processing, Language Modeling
  • Caffe | Deep Reinforcement Learning | Visual Question Answering
  • Unsupervised Learning
  • Attention-based Model

Content

Deep Neural Network (DNN)[kaggle]:

  • Construct and train a deep neural network to classify pronunciation units (phonemes) in each time frame of a speech.
  • Inputs: MFCC features
  • Activation function: Maxout (generalization of ReLU, "learnable" activation function)
  • Output layer: Softmax
  • Cost function: cross entropy
  • Optimization: Momentum
  • With Dropout technique

Bidirectional Recurrent Neural Network (RNN)[kaggle]:

  • Construct and train a bidirectional deep recurrent neural network to classify pronunciation units (phonemes) in each time frame of a speech.
  • Inputs: prediction probabilities of each class from previous DNN
  • Activation function: ReLU
  • Output layer: Softmax
  • Cost function: Mean Squared Error
  • Optimization: Root Mean Square Propagation (RMSProp)
  • With Dropout technique

Bidirectional Long-Short Term Memory (LSTM)[kaggle]:

  • Construct and train a bidirectional deep Long-Short Term Memory to classify pronunciation units (phonemes) in each time frame of a speech.
  • Inputs: prediction probabilities of each class from previous DNN
  • Optimization: Nesterov Accelerated Gradient (NAG)
  • With Peephole
  • Using grad_clip in theano to prevent gradient exploding

Structure Learning (output phone label sequence)[kaggle]:

  • On top of results of RNN / LSTM, applies Hidden Markov Model (HMM) to model the phone transition probabilities and further improves the performance of RNN / LSTM on this sequence labeling problem.
  • Input: the whole utterance as one training data
  • Output: phone label sequence

The performance is measured by Levenshtein distance (a.k.a. Edit distance).

Usage

Clone the repo and use the virtualenv:

git clone https://github.com/AaronYALai/Machine_Learning_and_Having_It_Deep_and_Structured

cd Machine_Learning_and_Having_It_Deep_and_Structured

virtualenv venv

source venv/bin/activate

Install all dependencies and run the model:

pip install -r requirements.txt

cd RNN_LSTM

python run_RNN.py