Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hallucinations #7

Open
joseph2mi opened this issue Jun 26, 2023 · 5 comments
Open

Hallucinations #7

joseph2mi opened this issue Jun 26, 2023 · 5 comments

Comments

@joseph2mi
Copy link

The issue with WhisperTimeSync with WhisperHallu is that if you need to use Whisper Hallu, it means that there are long silences and noise that prevent an accurate transcription. So you use WhisperHallu to cut the audio for easier transcription, but you can't sync it with WhisperTimeSync because whispertimesync, lol the original whisper, doesn't recognize the correct timestamps in the first place...

@EtienneAb3d
Copy link
Owner

@joseph2mi

The WhisperHallu option addSRT is producing 2 outputs:

  • one with noise and silence filtering to get a transcription without hallucinations.
  • one without cut to get a proper SRT with good timestamps, but possibly with hallucinations (that should not damage the timestamps quality).

You then use WhisperTimeSync to put the good timestamps over the good text.

@joseph2mi
Copy link
Author

Hi, thanks for the response. You said "one without cut to get a proper SRT with good timestamps, but possibly with hallucinations", the assumption is that the timestamp quality is not affected.

The issue is, for some hallucinations, which just repeat themselves into lines, the timestamps vary between 5 seconds and 30 seconds.

Therefore, when the timestamps are synced with the correct subtitles, you get extremely long chunks of subtitle texts for each line, which is inaccurate and defeats the purpose of needing WhisperHallu. I was wondering if there was a way, even with hallucinations, to get accurate timestamps from Whisper or Faster Whisper.

@EtienneAb3d
Copy link
Owner

I never see such timestamp shift due to hallucinations: even if timestamps are not always fully accurate, I never had the impression that this inaccuracy was due to hallucinations.

@joseph2mi
Copy link
Author

Here is an example of a timestamp in Vietnamese through Faster Whisper:

600
00:30:21,250 --> 00:30:24,500
Mà tôi không

601
00:30:24,500 --> 00:30:26,500
Trách ông Cát Mát

602
00:30:26,500 --> 00:30:28,500
Vì Cát Mát

603
00:30:28,500 --> 00:30:30,500
Là người đưa ra

604
00:30:30,500 --> 00:30:52,540
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

605
00:31:05,010 --> 00:31:30,350
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

606
00:31:42,370 --> 00:32:04,700
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

607
00:32:16,050 --> 00:32:37,360
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

608
00:32:48,830 --> 00:33:11,550
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

609
00:33:11,550 --> 00:33:34,560
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

610
00:33:34,560 --> 00:33:56,830
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

611
00:33:56,830 --> 00:34:18,720
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

612
00:34:28,940 --> 00:34:48,940
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

613
00:34:48,940 --> 00:35:12,620
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

614
00:35:12,620 --> 00:35:35,150
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

615
00:35:35,150 --> 00:35:55,340
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

616
00:35:55,340 --> 00:36:16,720
Cảm ơn các bạn đã theo dõi, hãy đăng ký kênh để ủng hộ kênh của mình nhé!

If you see here, the timestamps are okay and are usually between 2-5 seconds. The moment it starts hallucinating (I'm still saving the timestamps so I can integrate them later with WhisperHallu and WhisperTimeSync, the timestamps suddenly go up to 30-second intervals, which don't help for subtitles.

My parameters on Faster Whisper are as follow:
model_size=large-v2
device="cuda"
compute_type="float32"
beam_size=7,
vad_filter=True,
vad_parameters=dict(min_silence_duration_ms=50),
language = "vi",
max_initial_timestamp = 2.0,
condition_on_previous_text = True,
length_penalty = 1.5,

@EtienneAb3d
Copy link
Owner

In my own experiments, using the original sound file was more efficient to get proper timestamps.
In your case, perhaps you may try/adapt WhisperHallu with a configuration using all filters (especially blank and noise removal), but without cut.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants