An option to substitute OpenAI's Whisper models for Kaldi? #313

natelawrence · 2022-10-01T10:43:05Z

I'm not a developer but I do find Gentle very useful.

Since OpenAI released their Whisper models last week, I've been wondering if anyone with development skills would be interested in enabling an option to utilize Whisper instead of Kaldi when running Gentle.

I know that language support for spoken languages beyond English has been a long-standing request for Gentle.
Whisper appears to be pointedly multi-lingual, so perhaps this would make support for languages beyond English more easily achievable for Gentle?

Anyway, please let me know what scale of an undertaking this would be.
Thanks in advance.

WillReynolds5 · 2022-12-20T04:31:43Z

not sure this is possible at the phoneme level because the whisper model is end-to-end trained to predict BPE tokens directly, which are often a full word or subword consisting of a few graphemes.

m-bain · 2022-12-27T00:16:39Z

https://github.com/m-bain/whisperX

zxul767 · 2023-07-28T05:24:05Z

Another option for word-level timestamps is faster-whisper.

I've been using it lately and it produces relatively good word-level timestamps. It does tend to have some recurrent errors, though, like missing the last syllable in the last word of each segment.

And, of course, it inherits several of the issues of vanilla whisper (e.g., "hallucinations", very bad alignment in sections with laughter, songs with vocals, etc.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An option to substitute OpenAI's Whisper models for Kaldi? #313

An option to substitute OpenAI's Whisper models for Kaldi? #313

natelawrence commented Oct 1, 2022

WillReynolds5 commented Dec 20, 2022

m-bain commented Dec 27, 2022

zxul767 commented Jul 28, 2023

An option to substitute OpenAI's Whisper models for Kaldi? #313

An option to substitute OpenAI's Whisper models for Kaldi? #313

Comments

natelawrence commented Oct 1, 2022

WillReynolds5 commented Dec 20, 2022

m-bain commented Dec 27, 2022

zxul767 commented Jul 28, 2023