an algorithm using ImageBind that will classify KWS test dataset (Google Speech Commands v2 35) in zero-shot manner.
Author: Sean Red Mendoza | 2020-01751 | [email protected]
- randomly pick an audio from the test split and classify it (audio player in UI)
- user should be able to record his/her own voice for testing (audio recorder in UI, powered by Gradio)
- show summary statistics during evaluation of n sampels (# of data points, accuracy).
- comparison table of SOTA model scores
- Duplicate this repository on a working directory
git clone https://github.com/reddiedev/197z-kws
cd 197z-kws
- Prepare environment for running the notebook
conda create --name kws
conda activate kws
sudo apt install ffmpeg
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
pip jupyter jupyterlab ipywidgets==7.6.5 install numpy ipython gradio ipywebrtc notebook
jupyter labextension install jupyter-webrtc
-
Run the
demo.ipynb
jupyter notebook -
View SOTA models comparison in
comparison.md