Differences in Model Behavior for 8kHz and 16kHz Audio Inputs #575

tamarabanovac · 2024-11-20T13:05:04Z

tamarabanovac
Nov 20, 2024

Hello,

I know that Silero VAD supports both 8kHz and 16kHz audio data. I am curious about the differences in model behavior when processing these two sampling rates. Specifically, I would like to understand:

Are there any concrete differences in how the model operates when given audio data at these two sampling rates, or is there just an upsampling to 16kHz happening internally?
Was the model trained on both 16kHz and 8kHz audio data, or is it specifically optimized for one of the rates (such as 16kHz)?

Thank you for any insights or clarifications!

Answered by snakers4

Nov 20, 2024

Hi,

Are there any concrete differences in how the model operates when given audio data at these two sampling rates, or is there just an upsampling to 16kHz happening internally?

For previous model versions there was some difference in quality.
#2 (comment)

There was a chart on old wiki pages, but it's gone now.
For the latest model it is kind of the same.
If I am not mistaken, both the JIT and the ONNX models contain 2 actual models - one for 8k and one for 16k.
Hence the problem with ONNX export with opset < 16.

Was the model trained on both 16kHz and 8kHz audio data, or is it specifically optimized for one of the rates (such as 16kHz)?

The model was trained by resampling into either…

View full answer

snakers4 · 2024-11-20T14:24:39Z

snakers4
Nov 20, 2024
Maintainer

Hi,

Are there any concrete differences in how the model operates when given audio data at these two sampling rates, or is there just an upsampling to 16kHz happening internally?

For previous model versions there was some difference in quality.
#2 (comment)

There was a chart on old wiki pages, but it's gone now.
For the latest model it is kind of the same.
If I am not mistaken, both the JIT and the ONNX models contain 2 actual models - one for 8k and one for 16k.
Hence the problem with ONNX export with opset < 16.

Was the model trained on both 16kHz and 8kHz audio data, or is it specifically optimized for one of the rates (such as 16kHz)?

The model was trained by resampling into either 16k or 8k.

Also internally higher sample rates like 32k or 48k that are a multiple of 16k are naively resampled - i.e. each n-th sample is taken.

1 reply

tamarabanovac Nov 22, 2024
Author

Thanks for the explanation!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Differences in Model Behavior for 8kHz and 16kHz Audio Inputs #575

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

Differences in Model Behavior for 8kHz and 16kHz Audio Inputs #575

tamarabanovac Nov 20, 2024

Replies: 1 comment · 1 reply

snakers4 Nov 20, 2024 Maintainer

tamarabanovac Nov 22, 2024 Author

tamarabanovac
Nov 20, 2024

Replies: 1 comment 1 reply

snakers4
Nov 20, 2024
Maintainer

tamarabanovac Nov 22, 2024
Author