Speech Separation (Final Year Thesis)

Thesis project for Speech Separation using Deep Learning

Installation & Dataset Setup

Installing Dependencies

pip install -r requirements.txt

Setting up MUSDB18 for training (optional)

Convert from STEMS format to .wav format

musdbconvert path/to/musdb-stems-root path/to/new/musdb-wav-root

Download LibriSpeech Corpus for creating Synthetic mixtures from https://www.openslr.org/12

Setting up LibriMix for training

LibriMix is an open source dataset for source separation in noisy environments. It is derived from LibriSpeech signals (clean subset) and WHAM noise. It offers a free alternative to the WHAM dataset and complements it. It will also enable cross-dataset experiments.

Generating LibriMix

Features

In LibriMix you can choose :

The number of sources in the mixtures.
The sample rate of the dataset from 16 KHz to any frequency below.
The mode of mixtures : min (the mixture ends when the shortest source ends) or max (the mixtures ends with the longest source)
The type of mixture : mix_clean (utterances only) mix_both (utterances + noise) mix_single (1 utterance + noise)

By default, LibriMix will be generated for 2 and 3 speakers, at both 16Khz and 8kHz, for min max modes, and all mixture types will be saved (mix_clean, mix_both and mix_single). This represents around 430GB of data for Libri2Mix and 332GB for Libri3Mix. Alternatively if you want to generate a smaller subset you can look at the options below:

Creating Synthetic Audio for Training our Model

Each entry in Librispeech Corpus refers to a speaker, and each speaker folder contains multiple recordings with annotations included. We can use this individual speaker audio from these folders and overlap them using pydub to create synthetic audio mixtures and use them to train our model.

Synthetic Audio Data Format:

+ data
    |
    + spk1_spk2
    |      |
    |      + sound1.wav
    |      + sound2.wav
    |      + mixed.wav
    + spk1_spk3
    |      |
    |      + sound1.wav
    |      + sound2.wav
    |      + mixed.wav

Using MiniLibriMix

MiniLibriMix is a small version of LibriMix.

It was made for demonstration purposes.

It contains a train set of 800 mixtures and a validation set of 200 mixtures.

In each set, you will find :

mix_clean a folder containing clean mixtures of 2 speakers.
mix_both a folder containing clean mixtures of 2 speakers and a noise.
s1, s2, noise three folders containing the raw signals in the mixture.

Results

Waveplot of Mixed/Original/Estimated Audio

Mel Spectrogram of Mixed/Original/Estimated Audio

All Speech Separation Metrics from Asteroid

{'input_pesq': 3.934750556945801,
 'input_sar': 28.28840552880433,
 'input_sdr': 7.4975376739032145,
 'input_si_sdr': 6.865206956863403,
 'input_sir': 7.546190904711902,
 'input_stoi': 0.9072806256745396,
 'pesq': 4.548638343811035,
 'sar': 286.0524142270863,
 'sdr': 297.9890902500691,
 'si_sdr': 90.5447006225586,
 'sir': 286.52094481387064,
 'stoi': 0.9999999999999994}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Thesis		Thesis
img		img
sample_audio		sample_audio
test-song		test-song
test-to-wav		test-to-wav
.gitignore		.gitignore
README.md		README.md
eval_metrics.ipynb		eval_metrics.ipynb
filter_musdb_data.py		filter_musdb_data.py
indirect_loss.py		indirect_loss.py
mix_audio.py		mix_audio.py
mix_audio_pydub.ipynb		mix_audio_pydub.ipynb
musdb_data.py		musdb_data.py
plot_libri_spectrogram.ipynb		plot_libri_spectrogram.ipynb
test.py		test.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Separation (Final Year Thesis)

Installation & Dataset Setup

Setting up LibriMix for training

Features

Creating Synthetic Audio for Training our Model

Synthetic Audio Data Format:

Using MiniLibriMix

Results

Waveplot of Mixed/Original/Estimated Audio

Mel Spectrogram of Mixed/Original/Estimated Audio

All Speech Separation Metrics from Asteroid

About

Releases

Packages

Languages

NikhilC2209/AVSpeech_Sep

Folders and files

Latest commit

History

Repository files navigation

Speech Separation (Final Year Thesis)

Installation & Dataset Setup

Setting up LibriMix for training

Features

Creating Synthetic Audio for Training our Model

Synthetic Audio Data Format:

Using MiniLibriMix

Results

Waveplot of Mixed/Original/Estimated Audio

Mel Spectrogram of Mixed/Original/Estimated Audio

All Speech Separation Metrics from Asteroid

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages