I'm not working on this project anymore. I advise everyone curious about voice detection to have a look at some more modern approaches using deep learning, like:
- https://www.mathworks.com/help/audio/examples/voice-activity-detection-in-noise-using-deep-learning.html
- https://medium.com/vivolab/vivovad-a-voice-activity-detection-tool-based-on-recurrent-neural-networks-32356526321c
- https://github.com/hcmlab/vadnet
Python code to apply voice activity detector to wave file. Voice activity detector based on ration between energy in speech band and total energy.
- numpy
- scipy
- matplotlib
- tkinter (sudo apt install python3-tk)
Input audio data treated as following:
- Convert stereo to mono.
- Move a window of 20ms along the audio data.
- Calculate the ratio between energy of speech band and total energy for window.
- If ratio is more than threshold (0.6 by default) label windows as speech.
- Apply median filter with length of 0.5s to smooth detected speech regions.
- Represent speech regions as intervals of time.
Create object:
- import vad module.
- create instance of class VoiceActivityDetector with full path to wave file.
- run method to detect speech regions.
- optionally, plot original wave data and detected speech region.
Example python script which saves speech intervals in json file:
./detectVoiceInWave.py ./wav-sample.wav ./results.json
Example python code to plot detected speech regions:
from vad import VoiceActivityDetector
filename = '/Users/user/wav-sample.wav'
v = VoiceActivityDetector(filename)
v.plot_detected_speech_regions()
Alexander USOLTSEV 2015 (c) MIT License