Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support to Wav2vec2 #131

Open
loretoparisi opened this issue Mar 22, 2024 · 3 comments
Open

Support to Wav2vec2 #131

loretoparisi opened this issue Mar 22, 2024 · 3 comments
Labels

Comments

@loretoparisi
Copy link

loretoparisi commented Mar 22, 2024

Add support to Wav2vec2 / Connectionist Temporal Classification (CTC) phoneme models (Wav2Vec2ForCTC HuggingFace CTC model class)

Motivation
The DistilWhisperLargeV2 has impressive results as far as I can see from the provided Space with the NextJS Web app; the perfect companion of Whisper transcription model is the Wav2Vec2 phoneme model. An example of execution of Whisper + Wav2vec2 infact is WhisperX that enables fast automatic speech recognition with word-level timestamps plus speaker diarization.

Other solutions
The wav2vec2-service provides a wave2vec implementation for fast cpu inference via ONNX.

@FL33TW00D
Copy link
Collaborator

Thank you for the well put together issue!

This doesn't seem exceptionally difficult, although we would need to add GroupNorm support to Ratchet first! Ill open that as a separate issue.

@AmineDiro
Copy link

AmineDiro commented Apr 16, 2024

Hi, looking at wav2vec2 params I think that a LayerNorm can cut it for the implementation.
In the model config, the GroupNorm is used in the following manner
nn.GroupNorm(num_groups=self.out_conv_dim, num_channels=self.out_conv_dim..., where out_conv_dim==in_conv_dim==512, which means 1 group.
I think a permutation of dims and LayerNorm can help. I am working on #132 but this hack could work for now 🤔

@FL33TW00D
Copy link
Collaborator

GroupNorm was completed in #192 by @AmineDiro

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants