A simple project on speech recognition.
Sebastian Thomas (datascience at sebastianthomas dot de)
In this project, we intend to recognize a keyword out of a list of ten given keywords.
It is an extension of the introductory tutorial on speech command recognition from Tensorflow.
It uses the speech_commands dataset of Pete Warden, version 0.0.2. The dataset contains 105829 WAV files, each of a duration of at most 1 second. Each file consists of a spoken command out of a list of 35 commands.
For demonstration purposes, a REST API was implemented. This was inspired by a tutorial of Velardo of his series Deep Learning (Audio) Application: From Design to Deployment.
Data mining, analysis, training and evaluation of the classifier:
Main development:
REST API:
- tune more hyperparameters
- use class weights for training (we have imbalanced classes)
- add background noise to the instances
- use other form of data augmentation such as e.g time shifting
- add a silence label
- consider other classifier models
Velardo, Valerio: Deep Learning (Audio) Application: From Design to Deployment. YouTube, 2020.