Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate transcription in PeerTube #3

Conversation

Chocobozzz
Copy link
Collaborator

@Chocobozzz Chocobozzz commented Jun 19, 2024

Changes in your packages:

  • Force optional options -> required options to increase constraint and force implementer (server, runner) to specify important options (logger, default transcoding
    directory...)
  • Removed ...Sync methods in favour of async ones
  • Use objects for options instead of list of args (easier to add other options, better understand argument purpose)
  • Prefer using fs-extra functions instead of fs for consistency (move instead of rename for example that can have issues when moving a file between different devices etc.)
  • Use a type for the engine name
  • Deleted unused methods and whisper-timestamped
  • Added the ability to install whisper engines on the fly so PeerTube instance admins don't have to install them manually
  • Added a custom engine path arg so the PeerTube server can install whisper engines locally and specify their path instead of relying on the global path
  • Put custom models to download in fixtures directory (in gitignore but we have a CI cache) so we don't have to download them manually each time we run the tests
  • Reduces fixture sizes for the transcription tests (just decreased video quality while keeping the audio stream as is)
  • Deleted the transcript directory option from the constructor in favour of the method so we can instantiate only once the transcriber but use multiple times the transcribe method using a different directory each time
  • Created a transcription-devtools that includes the benchmark, jiwer and test tools, so packages that just use the transcription don't include these files
  • Removed requirements.txt file but added the pip install command in the test documentation

Added:

  • PIP/Hugging Face models cache in Github action
  • Transcription support in PeerTube runner (uses whisper engines installed globally)
  • Transcription support in PeerTube server (uses whisper engines installed on the fly in the storage/bin/pip directory)
  • Config to enable/disable video transcription
  • Notification for video owner when the transcription is finished
  • Display auto-transcription info in upload/import page and "features found on this instance" in about page
  • Add ability to select the auto engine/model, but admins can also specify a custom engine and model paths
  • Server and runner transcription tests

@Chocobozzz Chocobozzz changed the base branch from transcription-backend-workbench to transcription-backend-workbench-v2 June 19, 2024 08:18
@lutangar lutangar self-requested a review June 19, 2024 13:39
Copy link
Member

@lutangar lutangar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @Chocobozzz , since I made a few comments already I might as well submit them right now. But I'll dig deeper tomorrow and add some more comments...


const transcriptFile = await transcriber.transcribe({
mediaFilePath: inputPath,
model: config.modelPath
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

custom constructor to use or ad

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure to understand your comment

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I left this comment in a hurry as a side note to myself.

I tried to implement a custom constructor for these use cases in either TranscriptionModel (or WhisperBuiltinModel since this implementation is tied to Whisper as it is) but I failed to do so... in fact this should be achievable with the default constructor 🤔

But current constructor may lead to an invalid state (with a path which doesn't exists) and I'm not sure how to deal with this since there is sync/async dilemma and since is no such thing as async constructor...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw the invalid object state is also possible with the TranscriptFile default constructor

config/default.yaml Outdated Show resolved Hide resolved
config/production.yaml.example Outdated Show resolved Hide resolved
packages/jiwer/requirements.txt Outdated Show resolved Hide resolved
server/core/initializers/config.ts Show resolved Hide resolved
Chocobozzz and others added 24 commits June 19, 2024 17:37
Currently translated at 87.3% (2111 of 2418 strings)

Translation: PeerTube/angular
Translate-URL: https://weblate.framasoft.org/projects/peertube/angular/es/
Currently translated at 100.0% (274 of 274 strings)

Translation: PeerTube/server
Translate-URL: https://weblate.framasoft.org/projects/peertube/server/es/
Currently translated at 100.0% (274 of 274 strings)

Translation: PeerTube/server
Translate-URL: https://weblate.framasoft.org/projects/peertube/server/es/
Currently translated at 98.7% (2388 of 2418 strings)

Translation: PeerTube/angular
Translate-URL: https://weblate.framasoft.org/projects/peertube/angular/ru/
Currently translated at 100.0% (144 of 144 strings)

Translation: PeerTube/player
Translate-URL: https://weblate.framasoft.org/projects/peertube/player/es/
Currently translated at 87.3% (2111 of 2418 strings)

Translation: PeerTube/angular
Translate-URL: https://weblate.framasoft.org/projects/peertube/angular/es/
Currently translated at 100.0% (2418 of 2418 strings)

Translation: PeerTube/angular
Translate-URL: https://weblate.framasoft.org/projects/peertube/angular/gl/
Currently translated at 100.0% (2418 of 2418 strings)

Translation: PeerTube/angular
Translate-URL: https://weblate.framasoft.org/projects/peertube/angular/zh_Hant/
Currently translated at 98.1% (2374 of 2418 strings)

Translation: PeerTube/angular
Translate-URL: https://weblate.framasoft.org/projects/peertube/angular/hr/
Currently translated at 98.2% (2375 of 2418 strings)

Translation: PeerTube/angular
Translate-URL: https://weblate.framasoft.org/projects/peertube/angular/hr/
Currently translated at 98.3% (2377 of 2418 strings)

Translation: PeerTube/angular
Translate-URL: https://weblate.framasoft.org/projects/peertube/angular/hr/
Currently translated at 98.5% (2383 of 2418 strings)

Translation: PeerTube/angular
Translate-URL: https://weblate.framasoft.org/projects/peertube/angular/hr/
Currently translated at 98.5% (2383 of 2418 strings)

Translation: PeerTube/angular
Translate-URL: https://weblate.framasoft.org/projects/peertube/angular/hr/
Currently translated at 100.0% (143 of 143 strings)

Translation: PeerTube/player
Translate-URL: https://weblate.framasoft.org/projects/peertube/player/hr/
Currently translated at 98.5% (2383 of 2418 strings)

Translation: PeerTube/angular
Translate-URL: https://weblate.framasoft.org/projects/peertube/angular/hr/
Currently translated at 98.6% (2385 of 2418 strings)

Translation: PeerTube/angular
Translate-URL: https://weblate.framasoft.org/projects/peertube/angular/hr/
Currently translated at 98.6% (2386 of 2418 strings)

Translation: PeerTube/angular
Translate-URL: https://weblate.framasoft.org/projects/peertube/angular/hr/

const transcriptFile = await transcriber.transcribe({
mediaFilePath: inputPath,
model: config.modelPath
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I left this comment in a hurry as a side note to myself.

I tried to implement a custom constructor for these use cases in either TranscriptionModel (or WhisperBuiltinModel since this implementation is tied to Whisper as it is) but I failed to do so... in fact this should be achievable with the default constructor 🤔

But current constructor may lead to an invalid state (with a path which doesn't exists) and I'm not sure how to deal with this since there is sync/async dilemma and since is no such thing as async constructor...


const transcriptFile = await transcriber.transcribe({
mediaFilePath: inputPath,
model: config.modelPath
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw the invalid object state is also possible with the TranscriptFile default constructor

packages/transcription/src/abstract-transcriber.ts Outdated Show resolved Hide resolved
const transcriptFile = await transcriber.transcribe({
mediaFilePath: videoInputPath,

model: CONFIG.VIDEO_TRANSCRIPTION.MODEL_PATH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be another usecase for the previsouly mentionned custom constructor.

support/doc/development/tests.md Outdated Show resolved Hide resolved
packages/transcription-devtools/package.json Outdated Show resolved Hide resolved
scripts/ci.sh Outdated Show resolved Hide resolved
scripts/ci.sh Outdated Show resolved Hide resolved
Chocobozzz and others added 12 commits June 26, 2024 08:33
CI fails, our projects generates too many chunks unfortunately
chore: fiddling around some more

chore: add ctranslate2 and timestamped

chore: add performance markers

chore: refactor test

chore: change worflow name

chore: ensure Python3

chore(duration): convert to chai/mocha syntahx

chore(transcription): add individual tests for others transcribers

chore(transcription): implement formats test of all implementations

Also compare result of other implementation to the reference implementation

chore(transcription): add more test case with other language and models size and local model

chore(test): wip ctranslate 2 adapat

chore(transcription): wip transcript file and benchmark

chore(test): clean a bit

chore(test): clean a bit

chore(test): refacto timestamed spec

chore(test): update workflow

chore(test): fix glob expansion with sh

chore(test): extract some hw info

chore(test): fix async tests

chore(benchmark): add model info

feat(transcription): allow use of a local mode in timestamped-whisper

feat(transcription): extract run and profiling info in own value object

feat(transcription): extract run concept in own class an run more bench

chore(transcription): somplify run object only a uuid is now needed and add more benchmark scenario

docs(transcription): creates own package readme

docs(transcription): add local model usage

docs(transcription): update README

fix(transcription): use fr video for better comparison

chore(transcription): make openai comparison passed

docs(timestamped): clea

chore(transcription): change transcribers transcribe method signature

Introduce whisper builtin model.

fix(transcription): activate language detection

Forbid transcript creation without a language.
Add `languageDetection` flag to an engine and some assertions.

Fix an issue in `whisper-ctranslate2` :
Softcatala/whisper-ctranslate2#93

chore(transcription): use PeerTube time helpers instead of custom ones

Update existing time function to output an integer number of seconds and add a ms human-readable time formatter with hints of tests.

chore(transcription): use PeerTube UUID helpers

chore(transcription): enable CER evaluation

Thanks to this recent fix in Jiwer <3
https://github.com/jitsi/jiwer/issues/873

chore(jiwer): creates JiWer package

I'm not very happy with the TranscriptFileEvaluator constructor... suggestions ?

chore(JiWer): add usage in README

docs(jiwer): update JiWer readme

chore(transcription): use FunMOOC video in fixtures

chore(transcription): add proper english video fixture

chore(transcription): use os tmp directory where relevant

chore(transcription): fix jiwer cli test reference.txt

chore(transcription): move benchmark out of tests

chore(transcription): remove transcription workflow

docs(transcription): add benchmark info

fix(transcription): use ms precision in other transcribers

chore(transcription): simplify most of the tests

chore(transcription): remove slashes when building path with join

chore(transcription): make fromPath method async

chore(transcription): assert path to model is a directory for CTranslate2 transcriber

chore(transcription): ctranslate2 assertion

chore(transcription): ctranslate2 assertion

chore(transcription): add preinstall script for Python dependencies

chore(transcription): add download and unzip utils functions

chore(transcription): add download and unzip utils functions

chore(transcription): download & unzip models fixtures

chore(transcription): zip

chore(transcription): raise download file test timeout

chore(transcription): simplify download file test

chore(transcription): add transcriptions test to CI

chore(transcription): raise test preconditions timeout

chore(transcription): run preinstall scripts before running ci

chore(transcription): create dedicated tmp folder for transcriber tests

chore(transcription): raise timeout some more

chore(transcription): raise timeout some more

chore(transcription): raise timeout some more

chore(transcription): raise timeout some more

chore(transcription): raise timeout some more

chore(transcription): raise timeout some more

chore(transcription): raise timeout some more

chore(transcription): raise timeout some more

chore(transcription): use short video for local model test

chore(transcription): raise timeout some more

chore(transcription): raise timeout some more

chore(transcription): raise timeout some more

chore(transcription): setup verbosity based on NODE_ENV value
Can be specified on-demand using NODE_DEBUG=execa env variable
@Chocobozzz Chocobozzz force-pushed the feature/transcription branch from e654342 to 0b30e58 Compare June 28, 2024 07:06
@Chocobozzz
Copy link
Collaborator Author

Merged manually in upstream develop branch: Chocobozzz@1bfb791

Thanks again!

@Chocobozzz Chocobozzz closed this Jun 28, 2024
@Chocobozzz Chocobozzz deleted the feature/transcription branch June 28, 2024 07:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants