Garbage output on multi channel audio and audio above 24khz #9

Quackdoc · 2023-08-18T04:20:18Z

Seems like audio decode is picky on what gets input to it

Audio mediainfo

General
Complete name                            : C:\Users\Quack\code\whisper-burn\slap.wav
Format                                   : Wave
File size                                : 788 KiB
Duration                                 : 4 s 203 ms
Overall bit rate mode                    : Constant
Overall bit rate                         : 1 536 kb/s
Writing application                      : Lavf58.29.100

Audio
Format                                   : PCM
Format settings                          : Little / Signed
Codec ID                                 : 1
Duration                                 : 4 s 203 ms
Bit rate mode                            : Constant
Bit rate                                 : 1 536 kb/s
Channel(s)                               : 2 channels
Sampling rate                            : 48.0 kHz
Bit depth                                : 16 bits
Stream size                              : 788 KiB (100%)

Audio file: https://cdn.discordapp.com/attachments/615105639567589376/1141946730485665893/slap.wav

target\release\whisper.exe .\slap.wav small_en        08/18/2023 12:07:51 AMLoading waveform...
Loading model...
Chunk 0:  (screaming)

Chunk 1:  (screeching)

Transcribed text:  (screeching)

whisper-ctranslate2:

whisper-ctranslate2.exe slap.wav --model tiny.en      08/18/2023 12:10:23 AM
Detected language 'English' with probability 1.000000
[00:00.000 --> 00:04.000]  Also, it's not always useful.
Transcription results written to 'C:\Users\Quack\code\whisper-burn' directory

EDIT: transcoding the audio file using ffmpeg -i .\slap.wav -ar SAMPLE_RATE -ac 1 slap-edit.wav seems to make it work, It needs to be both single channel as well as 41khz or less.

at 41khz the audio output was

Chunk 0:  Oh, son, it's not all you are.

Transcribed text:  Oh, son, it's not all you are.

at 24khz and below it is

Chunk 0:  also it's not always useful.

Transcribed text:  also it's not always useful

The text was updated successfully, but these errors were encountered:

jbrough · 2023-08-18T23:19:02Z

the whisper model itself expects 16Khz mono.

Quackdoc · 2023-08-22T19:39:49Z

ah that make sense, I would assume burn doesn't do down sampling for the samplerate or for channel downmixing

Quackdoc · 2023-08-29T01:02:04Z

This is partially addressed by 4080a33, but if I get the time I plan on looking into resampling and channel downmixing. I do have some work done, however I was using dasp which has proven it'self to be rather unusable, so im looking into different crates.

Looked into fon and it seems like it may work, but i don't like how it hasn't been active since feb'22.

currently looking into other crates

jbrough · 2023-09-06T16:22:11Z

@Quackdoc have a look at https://github.com/HEnquist/rubato

It does what you need. I've had no success with the sync Ftt methods yet but SincFixedIn which is in their main example works well.

Here's how I'm using it - I have a pop at the end but the main downsampling is very good:

https://github.com/wavey-ai/soundkit/blob/75bf99c0e220bcfa380c6ae72e626257fb4790e0/src/audio_pipeline.rs#L67

(I had a feeling the Synchronous resampling FFT method might be better for wasm but haven't tested it and may have misunderstood what's its designed for, as the output is terribly distorted. Still investigating. Hopefully SincInterpolationType::Linear is good enough for real-time use cases)

Quackdoc changed the title ~~Garbage text on custom file~~ Garbage output on multi channel audio and audio above 24khz Aug 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Garbage output on multi channel audio and audio above 24khz #9

Garbage output on multi channel audio and audio above 24khz #9

Quackdoc commented Aug 18, 2023 •

edited

Loading

jbrough commented Aug 18, 2023

Quackdoc commented Aug 22, 2023

Quackdoc commented Aug 29, 2023 •

edited

Loading

jbrough commented Sep 6, 2023 •

edited

Loading

Garbage output on multi channel audio and audio above 24khz #9

Garbage output on multi channel audio and audio above 24khz #9

Comments

Quackdoc commented Aug 18, 2023 • edited Loading

jbrough commented Aug 18, 2023

Quackdoc commented Aug 22, 2023

Quackdoc commented Aug 29, 2023 • edited Loading

jbrough commented Sep 6, 2023 • edited Loading

Quackdoc commented Aug 18, 2023 •

edited

Loading

Quackdoc commented Aug 29, 2023 •

edited

Loading

jbrough commented Sep 6, 2023 •

edited

Loading