Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimental Result? #3

Open
QuangDiy opened this issue May 30, 2024 · 5 comments
Open

Experimental Result? #3

QuangDiy opened this issue May 30, 2024 · 5 comments

Comments

@QuangDiy
Copy link

Hi @phineas-pta,

Recently, I experimented with fine-tuning Whisper using QLoRA. We tried using the Large v3 model, fine-tuning it with four datasets: CMV-17, VIVOS, Fleurs, and 100 hours of VinAI data. The results for the Large V3 model exceeded 100%. I'm curious if you encountered a similar issue? (I reran the experiments three times but the results remained the same). However, when switching to version v2, the results were more effective.

Model Word Error Rate Character Error Rate
Fleurs CMV-Vi VIVOS Fleurs CMV VIVOS
Whisper Large V3 8.66 15.49 12.41 5.36 10.07 8.10
FT Whisper Large V3 (25h VinAi data) 10.48 15.19 9.40 5.64 8.51 4.69
FT Whisper Large V3
(100h VinAi data)
>100 >100 >100 >100 >100 >100
Whisper Large V2 11.08 18.46 11.56 6.96 10.73 6.35
FT Whisper Large V2
(100h VinAi data)
11.36 17.34 10.98 6.38 9.70 5.67

Despite fine-tuning with over 100 hours of data, the improvements seemed minimal. If you have experienced a similar situation, I hope you can share your solution.

Thank you very much.

@phineas-pta
Copy link
Owner

bác check lại transcription lúc chạy benchmark thử xem, khả năng là nó ko ra tiếng việt

whisper bị 1 cái là khi đã fine tune thì cái language detection bị giảm khá nặng, e cũng bị như vậy nên lúc nào cũng phải ép cái lang="vi"

@QuangDiy
Copy link
Author

Lúc gọi model mình cũng đã config như vầy lúc train con v3 khoảng 50h data thì ổn nhưng lên 125h thì nó bị vầy mình cũng không rõ tại sao.

model.generation_config.language = "<|vi|>"
model.generation_config.task = "transcribe"

nhưng infer nó vẫn ra:

hãy nghĩ hành trục tuyết tương như một hành tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr th th th th th th th th th th th th th th th th th th th th th th th th th th th th th th th th th th th th th th th gi gi gi gi gi gi gi gi gi gi gi gi w w w w w w w w w w w w w w w 

@phineas-pta
Copy link
Owner

cú pháp infer khác mà nhỉ 🤔 hay là bị hallucination 🤔

@QuangDiy
Copy link
Author

À infer tôi vẫn ép là language="vi". Cùng một settings như vậy tôi chuyển qua V2 chạy thì ổn.

  with torch.cuda.amp.autocast():
        with torch.no_grad():
          generated_tokens = (
              model.generate(
                  input_features=batch["input_features"].to("cuda"),
                  max_new_tokens=200,
                  language="vi",
                  task="transcribe"
              )
              .cpu()
              .numpy()
          )

@phineas-pta
Copy link
Owner

vậy chắc do hallucination r, v3 bị cái này khá phiền

ngoài ra thì bạn check lại loss curve lúc training để cho chắc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants