Unified Voice Embedding through Multi task Learning

Project Lead
1. Dr. Uthayasanker Thayasivam (talk forum profile link)
Contributors
1. Rajenthiran Jenarthanan
2. Lakshikka Sithamparanathan (talk forum profile link)
3. Saranya Uthayakumar (talk forum profile link)

Useful Links

Pretrained Model : Pretrained Models
Libri Speech : LibriSpeech
Talk Forum : Forum

Summary

Speech technologies is one of the evolving and highly demanded area for the past few decades due to the huge progress brought by machine learning technology. Especially the past decade has brought tremendous progress which includes the introduction of conversational agents. In this work we describe a multi-task deep metric learning system to learn a single unified audio embedding which can be used to power our multiple audio/speaker specific tasks. The solution we present not only allows us to train for multiple application objectives in a single deep neural network architecture, but takes advantage of correlated information in the combination of all training data from each application to generate a unified embedding that outperforms all specialized embeddings previously deployed for audio/speaker specific task.

Directory Structure

The files and directories of the repository is shown below

    aaivu-unified-voice-embedding-master
    ├── Architecture.png
    ├── docs
    │   └── README.md
    ├── hive-mtl
    │   ├── audio.py
    │   ├── batcher.py
    │   ├── cli.py
    │   ├── constants.py
    │   ├── conv_models.py
    │   ├── download_librispeech.sh
    │   ├── hive-mtl
    │   ├── libri_speaker_gender.csv
    │   ├── requirements.txt
    │   ├── test_pretrained.py
    │   ├── train.py
    │   └── utils.py
    ├── LICENSE
    ├── README.md
    └── src
        └── README.md

Getting started

Install dependencies

Requirements

tensorflow>=2.0
keras>=2.3.1
python>=3.6

pip install -r requirements.txt

If you see this error: libsndfile not found, run this: sudo apt-get install libsndfile-dev.

Training

The code for training is available in this repository.

sudo chmod -R 777 hive-mtl/ # Give write permision to hive-mtl
pip uninstall -y tensorflow && pip install tensorflow-gpu
./hive-mtl download_librispeech # Download Librispeech dataset
./hive-mtl build_mfcc
./hive-mtl build_model_inputs
./hive-mtl train_mtl

NOTE: If you want to use your own dataset, make sure you follow the directory structure of librispeech. Audio files have to be in .flac. format. If you have .wav, you can use ffmpeg to make the conversion. Both formats are flawless (FLAC is compressed WAV).

Architecture Diagram

References

Deep Speaker : An End-to-End Neural Speaker Embedding System by Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li.
GitHub

Acknowledgments

Ketharan Suntharam
Sathiyakugan Balakirshnan

License

Apache License 2.0

Code of Conduct

Please read our code of conduct document here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unified Voice Embedding through Multi task Learning

Summary

Directory Structure

Getting started

Install dependencies

Requirements

Training

Architecture Diagram

References

Acknowledgments

License

Code of Conduct

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
docs		docs
hive-mtl		hive-mtl
src		src
.gitignore		.gitignore
Architecture.png		Architecture.png
LICENSE		LICENSE
README.md		README.md

License

aaivu/aaivu-unified-voice-embedding

Folders and files

Latest commit

History

Repository files navigation

Unified Voice Embedding through Multi task Learning

Summary

Directory Structure

Getting started

Install dependencies

Requirements

Training

Architecture Diagram

References

Acknowledgments

License

Code of Conduct

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages