Help with debugging incorrect speech timestamps #574

sha-roze · 2024-11-19T08:40:47Z

sha-roze
Nov 19, 2024

Hi, I'm having issues with getting the timestamps of this video.

Here is my code and output:

import torch

torch.set_num_threads(1)

model, utils = torch.hub.load(repo_or_dir="snakers4/silero-vad", model="silero_vad")
(get_speech_timestamps, _, read_audio, _, _) = utils

wav = read_audio(
    r"C:\Users\Sha Roze\Downloads\Videos\httpswww.youtube.comwatchv=oM5hNuAmWs0.wav"
)
speech_timestamps = get_speech_timestamps(
    wav,
    model,
    return_seconds=True,  # Return speech timestamps in seconds (default is samples)
)

print(speech_timestamps)

Using cache found in C:\Users\Sha Roze/.cache\torch\hub\snakers4_silero-vad_master
[{'start': 3.2, 'end': 4.5}]

Answered by snakers4

Nov 19, 2024

v5 has problems with singing, but v3.1 works:

  model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad:v3.1',
                                model='silero_vad',
                                force_reload=True,
                                onnx=USE_ONNX)

  (get_speech_timestamps,
  save_audio,
  read_audio,
  VADIterator,
  collect_chunks) = utils

wav = read_audio('prayer.wav', sampling_rate=SAMPLING_RATE)
# get speech timestamps from full audio file
speech_timestamps = get_speech_timestamps(wav, model,
                                          sampling_rate=SAMPLING_RATE,
                                          visualize_probs=True,
                                      …

View full answer

snakers4 · 2024-11-19T09:33:57Z

snakers4
Nov 19, 2024
Maintainer

v5 has problems with singing, but v3.1 works:

  model, utils = torch.hub.load(repo_or_dir='snakers4/silero-vad:v3.1',
                                model='silero_vad',
                                force_reload=True,
                                onnx=USE_ONNX)

  (get_speech_timestamps,
  save_audio,
  read_audio,
  VADIterator,
  collect_chunks) = utils

wav = read_audio('prayer.wav', sampling_rate=SAMPLING_RATE)
# get speech timestamps from full audio file
speech_timestamps = get_speech_timestamps(wav, model,
                                          sampling_rate=SAMPLING_RATE,
                                          visualize_probs=True,
                                          threshold=0.1, return_seconds=True)
pprint(speech_timestamps)

1 reply

sha-roze Nov 19, 2024
Author

Thank you for answering, this worked for me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Help with debugging incorrect speech timestamps #574

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Help with debugging incorrect speech timestamps #574

sha-roze Nov 19, 2024

Replies: 1 comment · 1 reply

snakers4 Nov 19, 2024 Maintainer

sha-roze Nov 19, 2024 Author

sha-roze
Nov 19, 2024

Replies: 1 comment 1 reply

snakers4
Nov 19, 2024
Maintainer

sha-roze Nov 19, 2024
Author