Skip to content

Latest commit

 

History

History
25 lines (17 loc) · 753 Bytes

README.md

File metadata and controls

25 lines (17 loc) · 753 Bytes

🚧 In Work 🚧

Speech to text

A project of Speech Recognition in python using Tensorflow and keras.

Model Architecture and Training method

I train the model using a network of recurrent neurons predicting a linear output. my dataset consists of audio associated with a sentence. I cut each sentence into phonemes to which I come to associate a sound.

Model Architecture

The architecture is the following:

  • One convolution of 8 filters (9*9) [Elu activation]
  • Max pooling pool_size=[2,2]
  • Lstm of 128 filters
  • Flatten layer
  • Dropout: 0.4
  • Fully conected: 256 [Elu]
  • Dropout: 0.2
  • Fully conected: len(vocab)