-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YouTube] Add support for extracting auto-translated captions #997
base: dev
Are you sure you want to change the base?
Conversation
...in/java/org/schabi/newpipe/extractor/services/youtube/extractors/YoutubeStreamExtractor.java
Show resolved
Hide resolved
extractor/src/main/java/org/schabi/newpipe/extractor/stream/SubtitlesStream.java
Outdated
Show resolved
Hide resolved
extractor/src/main/java/org/schabi/newpipe/extractor/stream/SubtitlesStream.java
Outdated
Show resolved
Hide resolved
extractor/src/main/java/org/schabi/newpipe/extractor/stream/SubtitlesStream.java
Outdated
Show resolved
Hide resolved
.build()); | ||
if (i == 0 && caption.getBoolean("isTranslatable") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not base the extraction on the index, but rather on whether the subtitles are auto-generated:
if (i == 0 && caption.getBoolean("isTranslatable") | |
if (isAutoGenerated && caption.getBoolean("isTranslatable") |
Also, this PR doesn't add support of subtitles translation for uploaded subtitles. For instance, see https://www.youtube.com/watch?v=_cMxraX_5RE: you can translate from German to French and from English to French, and the translations are different.
We may need another property in SubtitlesStream
for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why we should use isAutoGenerated
here. For better quality, it should be !isAutoGenerated
. Manually added captions should be exact.
I was also wondering whether we should provide the auto-translated captions by default. Extracting the data for and generating ~100 SubtitleStreams takes some time. I'd definitely not recommend to do this for all available languages by default. On the other hand, we could provide a method which does this when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I decided to extract all available subtitles, but made sure to speed up the process. It's up to the frontends to filter the subtitles.
...in/java/org/schabi/newpipe/extractor/services/youtube/extractors/YoutubeStreamExtractor.java
Show resolved
Hide resolved
2bcc0a9
to
efce384
Compare
What happened to this? |
Closes #977 Based on and adresses TeamNewPipe/NewPipe#8023
Faster and ordered: captions provided by the user are at the beginning of the list, auto-translated captions are at the end
efce384
to
9730de2
Compare
Extract auto-translated captions for YouTube videos.
API changes 🟢
SubtitlesStream
This adds
isAutoTranslated()
next toisAutoGenerated()
to distinguish between auto-generated subtitles which use speech2text and auto-translated captions based on Google translator.Additionally,
getBaseLocale()
,getDisplayBaseLanguageName()
andgetBaseLanguageTag()
were added to access info on the language which was used for auto-translations.Issues closed by this PR
Closes #977
Based on and adresses TeamNewPipe/NewPipe#8023