-
Notifications
You must be signed in to change notification settings - Fork 432
Other Models
Dimitrii Voronin edited this page Dec 10, 2021
·
7 revisions
Number Detector detects spoken numbers (i.e thirty five) in 4 languages - english, german, russian, spanish
In some cases it is crucial to be able to anonymize large-scale spoken corpora (i.e. remove personal data). Typically personal data is considered to be private or sensitive if it contains a name or some private ID. Name recognition is a highly subjective matter and it depends on locale and business case, but VAD and Number Detection are quite general tasks.
How to use Number Detector:
- It is recommended to split long audio into short ones (< 15s) and apply model on each of them.
- Number Detector can classify if the whole audio contains a number, or if each audio frame contains a number.
- Audio is split into frames in a certain way, so, having a per-frame output, we can reconstruct the time boundaries for numbers with an accuracy of about 0.2s.
JIT example
import torch
torch.set_num_threads(1)
from pprint import pprint
model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
model='silero_number_detector',
force_reload=True)
(get_number_ts,
_, read_audio,
_, _, _) = utils
files_dir = torch.hub.get_dir() + '/snakers4_silero-vad_master/files'
wav = read_audio(f'{files_dir}/en_num.wav')
# full audio
# get number timestamps from full audio file
number_timestamps = get_number_ts(wav, model)
pprint(number_timestamps)
ONNX example
import torch
import onnxruntime
from pprint import pprint
model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
model='silero_number_detector',
force_reload=True)
(get_number_ts,
_, read_audio,
_, _, donwload_onnx_model) = utils
donwload_onnx_model('number_detector')
files_dir = torch.hub.get_dir() + '/snakers4_silero-vad_master/files'
def init_onnx_model(model_path: str):
return onnxruntime.InferenceSession(model_path)
def validate_onnx(model, inputs):
with torch.no_grad():
ort_inputs = {'input': inputs.cpu().numpy()}
outs = model.run(None, ort_inputs)
outs = [torch.Tensor(x) for x in outs]
return outs
model = init_onnx_model('number_detector.onnx')
wav = read_audio(f'{files_dir}/en_num.wav')
# get number timestamps from full audio file
number_timestamps = get_number_ts(wav, model, run_function=validate_onnx)
pprint(number_timestamps)
- 99% validation accuracy.
- Language classifier was trained using audio samples in 4 languages: Russian, English, Spanish, German.
- Arbitrary audio length can be used, although network was trained using audio shorter than 15 seconds
- 95 languages version
JIT example
import torch
torch.set_num_threads(1)
from pprint import pprint
model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
model='silero_lang_detector',
force_reload=True)
get_language, read_audio, _ = utils
files_dir = torch.hub.get_dir() + '/snakers4_silero-vad_master/files'
wav = read_audio(f'{files_dir}/de.wav')
language = get_language(wav, model)
pprint(language)
ONNX example
import torch
import onnxruntime
from pprint import pprint
model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad',
model='silero_lang_detector',
force_reload=True)
get_language, read_audio, donwload_onnx_model = utils
donwload_onnx_model('number_detector')
files_dir = torch.hub.get_dir() + '/snakers4_silero-vad_master/files'
def init_onnx_model(model_path: str):
return onnxruntime.InferenceSession(model_path)
def validate_onnx(model, inputs):
with torch.no_grad():
ort_inputs = {'input': inputs.cpu().numpy()}
outs = model.run(None, ort_inputs)
outs = [torch.Tensor(x) for x in outs]
return outs
model = init_onnx_model('number_detector.onnx')
wav = read_audio(f'{files_dir}/de.wav')
language = get_language(wav, model, run_function=validate_onnx)
print(language)
- 85% validation accuracy among 95 languages, 90% validation accuracy among 58 language groups
- Language classifier 95 was trained using audio samples in 95 languages
- Arbitrary audio length can be used, although network was trained using audio shorter than 20 seconds
JIT example
import torch
torch.set_num_threads(1)
from pprint import pprint
model, lang_dict, lang_group_dict, utils = torch.hub.load(
repo_or_dir='snakers4/silero-vad',
model='silero_lang_detector_95',
force_reload=True)
get_language_and_group, read_audio, _ = utils
files_dir = torch.hub.get_dir() + '/snakers4_silero-vad_master/files'
wav = read_audio(f'{files_dir}/de.wav')
languages, language_groups = get_language_and_group(wav, model, lang_dict, lang_group_dict, top_n=2)
for i in languages:
pprint(f'Language: {i[0]} with prob {i[-1]}')
for i in language_groups:
pprint(f'Language group: {i[0]} with prob {i[-1]}')
ONNX example
import torch
import onnxruntime
from pprint import pprint
model, lang_dict, lang_group_dict, utils = torch.hub.load(
repo_or_dir='snakers4/silero-vad',
model='silero_lang_detector_95',
force_reload=True)
get_language_and_group, read_audio, donwload_onnx_model = utils
donwload_onnx_model('lang_classifier_95')
files_dir = torch.hub.get_dir() + '/snakers4_silero-vad_master/files'
def init_onnx_model(model_path: str):
return onnxruntime.InferenceSession(model_path)
def validate_onnx(model, inputs):
with torch.no_grad():
ort_inputs = {'input': inputs.cpu().numpy()}
outs = model.run(None, ort_inputs)
outs = [torch.Tensor(x) for x in outs]
return outs
model = init_onnx_model('lang_classifier_95.onnx')
wav = read_audio(f'{files_dir}/de.wav')
languages, language_groups = get_language_and_group(wav, model, lang_dict, lang_group_dict, top_n=2, run_function=validate_onnx)
for i in languages:
pprint(f'Language: {i[0]} with prob {i[-1]}')
for i in language_groups:
pprint(f'Language group: {i[0]} with prob {i[-1]}')