-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hallucinations #7
Comments
The WhisperHallu option
You then use WhisperTimeSync to put the good timestamps over the good text. |
Hi, thanks for the response. You said "one without cut to get a proper SRT with good timestamps, but possibly with hallucinations", the assumption is that the timestamp quality is not affected. The issue is, for some hallucinations, which just repeat themselves into lines, the timestamps vary between 5 seconds and 30 seconds. Therefore, when the timestamps are synced with the correct subtitles, you get extremely long chunks of subtitle texts for each line, which is inaccurate and defeats the purpose of needing WhisperHallu. I was wondering if there was a way, even with hallucinations, to get accurate timestamps from Whisper or Faster Whisper. |
I never see such timestamp shift due to hallucinations: even if timestamps are not always fully accurate, I never had the impression that this inaccuracy was due to hallucinations. |
Here is an example of a timestamp in Vietnamese through Faster Whisper:
601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 If you see here, the timestamps are okay and are usually between 2-5 seconds. The moment it starts hallucinating (I'm still saving the timestamps so I can integrate them later with WhisperHallu and WhisperTimeSync, the timestamps suddenly go up to 30-second intervals, which don't help for subtitles. My parameters on Faster Whisper are as follow: |
In my own experiments, using the original sound file was more efficient to get proper timestamps. |
The issue with WhisperTimeSync with WhisperHallu is that if you need to use Whisper Hallu, it means that there are long silences and noise that prevent an accurate transcription. So you use WhisperHallu to cut the audio for easier transcription, but you can't sync it with WhisperTimeSync because whispertimesync, lol the original whisper, doesn't recognize the correct timestamps in the first place...
The text was updated successfully, but these errors were encountered: