Differences in Model Behavior for 8kHz and 16kHz Audio Inputs #575
-
Hello, I know that Silero VAD supports both 8kHz and 16kHz audio data. I am curious about the differences in model behavior when processing these two sampling rates. Specifically, I would like to understand:
Thank you for any insights or clarifications! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hi,
For previous model versions there was some difference in quality. There was a chart on old wiki pages, but it's gone now.
The model was trained by resampling into either 16k or 8k. Also internally higher sample rates like 32k or 48k that are a multiple of 16k are naively resampled - i.e. each n-th sample is taken. |
Beta Was this translation helpful? Give feedback.
Hi,
For previous model versions there was some difference in quality.
#2 (comment)
There was a chart on old wiki pages, but it's gone now.
For the latest model it is kind of the same.
If I am not mistaken, both the JIT and the ONNX models contain 2 actual models - one for 8k and one for 16k.
Hence the problem with ONNX export with opset < 16.
The model was trained by resampling into either…