Other Models

Other models besides VAD

Number Detector
Language Classifier
Language Classifier 95

Number Detector

Number Detector detects spoken numbers (i.e thirty five) in 4 languages - english, german, russian, spanish

In some cases it is crucial to be able to anonymize large-scale spoken corpora (i.e. remove personal data). Typically personal data is considered to be private or sensitive if it contains a name or some private ID. Name recognition is a highly subjective matter and it depends on locale and business case, but VAD and Number Detection are quite general tasks.

How to use Number Detector:

It is recommended to split long audio into short ones (< 15s) and apply model on each of them.
Number Detector can classify if the whole audio contains a number, or if each audio frame contains a number.
Audio is split into frames in a certain way, so, having a per-frame output, we can reconstruct the time boundaries for numbers with an accuracy of about 0.2s.

example

#@title Install and Import Dependencies

# this assumes that you have a relevant version of PyTorch installed
!pip install -q torchaudio

SAMPLING_RATE = 16000

import torch
torch.set_num_threads(1)

from IPython.display import Audio
from pprint import pprint
# download example
torch.hub.download_url_to_file('https://models.silero.ai/vad_models/en_num.wav', 'en_number_example.wav')

USE_ONNX = True # change this to True if you want to test onnx model
if USE_ONNX:
    !pip install -q onnxruntime
  
model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
                              model='silero_number_detector',
                              force_reload=True,
                              onnx=USE_ONNX)
(get_number_ts,
 save_audio,
 read_audio,
 collect_chunks,
 drop_chunks) = utils

wav = read_audio('en_number_example.wav', sampling_rate=SAMPLING_RATE)
# get number timestamps from full audio file
number_timestamps = get_number_ts(wav, model)
pprint(number_timestamps)

# convert ms in timestamps to samples
for timestamp in number_timestamps:
    timestamp['start'] = int(timestamp['start'] * SAMPLING_RATE / 1000)
    timestamp['end'] = int(timestamp['end'] * SAMPLING_RATE / 1000)

# merge all number chunks to one audio
save_audio('only_numbers.wav',
           collect_chunks(number_timestamps, wav), SAMPLING_RATE) 
Audio('only_numbers.wav')

# drop all number chunks from audio
save_audio('no_numbers.wav',
           drop_chunks(number_timestamps, wav), SAMPLING_RATE) 
Audio('no_numbers.wav')

Language Classifier

99% validation accuracy.
Language classifier was trained using audio samples in 4 languages: Russian, English, Spanish, German.
Arbitrary audio length can be used, although network was trained using audio shorter than 15 seconds
95 languages version

example

#@title Install and Import Dependencies

# this assumes that you have a relevant version of PyTorch installed
!pip install -q torchaudio

SAMPLING_RATE = 16000

import torch
torch.set_num_threads(1)

from IPython.display import Audio
from pprint import pprint
# download example
torch.hub.download_url_to_file('https://models.silero.ai/vad_models/en.wav', 'en_example.wav')

USE_ONNX = True # change this to True if you want to test onnx model
if USE_ONNX:
    !pip install -q onnxruntime
  
model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
                              model='silero_lang_detector',
                              force_reload=True,
                              onnx=USE_ONNX)

get_language, read_audio = utils

wav = read_audio('en_example.wav', sampling_rate=SAMPLING_RATE)
lang = get_language(wav, model)
print(lang)

Language Classifier 95

85% validation accuracy among 95 languages, 90% validation accuracy among 58 language groups
Language classifier 95 was trained using audio samples in 95 languages
Arbitrary audio length can be used, although network was trained using audio shorter than 20 seconds

example

#@title Install and Import Dependencies

# this assumes that you have a relevant version of PyTorch installed
!pip install -q torchaudio

SAMPLING_RATE = 16000

import torch
torch.set_num_threads(1)

from IPython.display import Audio
from pprint import pprint
# download example
torch.hub.download_url_to_file('https://models.silero.ai/vad_models/de.wav', 'de_example.wav')

USE_ONNX = True # change this to True if you want to test onnx model
if USE_ONNX:
    !pip install -q onnxruntime
  
model, lang_dict, lang_group_dict,  utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
                                                           model='silero_lang_detector_95',
                                                           force_reload=True,
                                                           onnx=USE_ONNX)

get_language_and_group, read_audio = utils

wav = read_audio('de.wav', sampling_rate=SAMPLING_RATE)
languages, language_groups = get_language_and_group(wav, model, lang_dict, lang_group_dict, top_n=2)

for i in languages:
  pprint(f'Language: {i[0]} with prob {i[-1]}')

for i in language_groups:
  pprint(f'Language group: {i[0]} with prob {i[-1]}')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Other Models

Other models besides VAD

Number Detector

Language Classifier

Language Classifier 95

Clone this wiki locally