Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Adaptation to the Whisper's JSON output #15

Open
sensboston opened this issue Feb 22, 2024 · 8 comments
Open

[Feature request] Adaptation to the Whisper's JSON output #15

sensboston opened this issue Feb 22, 2024 · 8 comments

Comments

@sensboston
Copy link

Hello, is it possible to adapt your project to the Whisper's JSON output? I'm working on karaoke program for Windows, and need all words in the lyrics to be timestamped.
I'll be glad to issue a PR for this feature but unfortunately I'm not proficient in Python programming (mostly use a C# & C++).

@EtienneAb3d
Copy link
Owner

@sensboston

I will have a look at it as soon as possible.
To be sure of what you expect, can you provide me with an example (JSON+TXT)?

WhisperTimeSync is not written in Python but in Java.
;-)

@sensboston
Copy link
Author

sensboston commented Feb 23, 2024

Here we go: samples.zip
There are two directories: English (Smokie, "Living Next Door to Alice" and Russian (Bit-quartet Secret, "Alice") with JSONs and original lyrics (my daughter's name is Alice 😉 ). Whisper's English output is kinda affordable but Russian is a complete mess.

@EtienneAb3d
Copy link
Owner

@sensboston
Hmmm... The problem I see with this JSON format is that each word has a mandatory description including its timestamp. It will be very hard to decide what to do with not-matching words.
🤔

@sensboston
Copy link
Author

Yeah, it's an issue, agree. But I haven't looked to your (or Java code you've ported) implementation (yet). Theoretically it's possible, even without involving AI - for example by using "soundex" algorithm. I thought about this but initially wanna check if someone already done this.

@EtienneAb3d
Copy link
Owner

@sensboston
The problem is not to match word by word, this is what WhisperTimeSync is doing.
The problem is to know what to do with unmatching words in this specific JSON description.
I may adapt an algo I already have for similar cases, but this is quite a work.
Do you have a budget for this?

@sensboston
Copy link
Author

sensboston commented Feb 23, 2024

No budget at all, I do development just for fun, will publish open source here when it done.

P.S. If you want, I'll add you to this private repo (but you need a Windows PC to test at least).

@sensboston
Copy link
Author

sensboston commented Feb 29, 2024

Any progress? Or you have no idea how to implement this? Please le me know - I don't wanna to waste a time.

@EtienneAb3d
Copy link
Owner

@sensboston
I understood you were working on the subject on your side.
On my side, without a budget, I have to find/allocate on my free time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants