Sangramsingkayte / Audio-Feature-Extraction Public

Notifications You must be signed in to change notification settings
Fork 7
Star 6

In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC.

6 stars 7 forks Branches Tags Activity

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
MFCC		MFCC
Pitch		Pitch
Timbre		Timbre
VAD		VAD
Volume		Volume
ZeroCR		ZeroCR
wave		wave
README.md		README.md
Speech_Transcript_with_Hugging_Face_🤗_Transformers.ipynb		Speech_Transcript_with_Hugging_Face_🤗_Transformers.ipynb
feature_extraction_functions.py		feature_extraction_functions.py

Repository files navigation

# Audio Feature Extraction

The repository describes the feature extraction methods for Audio signals.

Free speech datasets

OpenLSR: OpenSLR is a site devoted to hosting speech and language resources, such as training corpora for speech recognition, and software related to speech recognition.
VoxForge: VoxForge is now mirroring the LT and the Teleccoperation group Open Speech Data Corpus for German with 35 hours of speech from about 180 speakers.
TIMIT: The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus.
Mozilla Speech: Mozilla Releases the world's Second Largest Public Voice Data Set on Nov 29th, 2017.
Open Data for Deep Learning

File description

feature_extraction_functions.py: a set of feature extraction functions from RDShi-SpeakerCount.
MFCC: Mel-frequency cepstral coefficients calculation.
- MFCC.py, MFCCTest.py: Compute the MFCC feature.
- FeatureExtraction.ipynb: Speech preprocessing, including loading data, pre-emphasis, framing, window, Fourier-transform, power spectrum, filter banks, mfccs and mean normalization.
Volume: volume calculation.
ZeroCR: Zero-Crossing Rate calculation.
Pitch: Pitch calculation and pitch tracking.
Timbre: spectrogram drawing.
VAD: EPD (End-Point Detection), or Speech Detection, or VAD(Voice Activity Detection).

Requirements

Anaconda (Python)

References

About

In sound processing, the mel-frequency cepstrum (MFC) is a representation of the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. Mel-frequency cepstral coefficients (MFCCs) are coefficients that collectively make up an MFC.

spectrum speech speech-recognition mfcc

Report repository

Releases

No releases published

Packages

No packages published

Languages