Support to Wav2vec2 #131

loretoparisi · 2024-03-22T10:57:20Z

Add support to Wav2vec2 / Connectionist Temporal Classification (CTC) phoneme models (Wav2Vec2ForCTC HuggingFace CTC model class)

Motivation
The DistilWhisperLargeV2 has impressive results as far as I can see from the provided Space with the NextJS Web app; the perfect companion of Whisper transcription model is the Wav2Vec2 phoneme model. An example of execution of Whisper + Wav2vec2 infact is WhisperX that enables fast automatic speech recognition with word-level timestamps plus speaker diarization.

Other solutions
The wav2vec2-service provides a wave2vec implementation for fast cpu inference via ONNX.

The text was updated successfully, but these errors were encountered:

FL33TW00D · 2024-03-22T11:14:30Z

Thank you for the well put together issue!

This doesn't seem exceptionally difficult, although we would need to add GroupNorm support to Ratchet first! Ill open that as a separate issue.

AmineDiro · 2024-04-16T10:24:50Z

Hi, looking at wav2vec2 params I think that a LayerNorm can cut it for the implementation.
In the model config, the GroupNorm is used in the following manner
nn.GroupNorm(num_groups=self.out_conv_dim, num_channels=self.out_conv_dim..., where out_conv_dim==in_conv_dim==512, which means 1 group.
I think a permutation of dims and LayerNorm can help. I am working on #132 but this hack could work for now 🤔

FL33TW00D · 2024-05-06T20:25:35Z

GroupNorm was completed in #192 by @AmineDiro

FL33TW00D mentioned this issue Mar 22, 2024

Add GroupNorm support #132

Closed

FL33TW00D added the models label Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support to Wav2vec2 #131

Support to Wav2vec2 #131

loretoparisi commented Mar 22, 2024 •

edited

Loading

FL33TW00D commented Mar 22, 2024

AmineDiro commented Apr 16, 2024 •

edited

Loading

FL33TW00D commented May 6, 2024

Support to Wav2vec2 #131

Support to Wav2vec2 #131

Comments

loretoparisi commented Mar 22, 2024 • edited Loading

FL33TW00D commented Mar 22, 2024

AmineDiro commented Apr 16, 2024 • edited Loading

FL33TW00D commented May 6, 2024

loretoparisi commented Mar 22, 2024 •

edited

Loading

AmineDiro commented Apr 16, 2024 •

edited

Loading