-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(transcription): change transcribers transcribe method signature #2
chore(transcription): change transcribers transcribe method signature #2
Conversation
Introduce whisper builtin model.
Forbid transcript creation without a language. Add `languageDetection` flag to an engine and some assertions. Fix an issue in `whisper-ctranslate2` : Softcatala/whisper-ctranslate2#93
eb36949
to
5fffa52
Compare
Update existing time function to output an integer number of seconds and add a ms human-readable time formatter with hints of tests.
Thanks to this recent fix in Jiwer <3 https://github.com/jitsi/jiwer/issues/873
3453b94
to
75f644f
Compare
.github/workflows/transcription.yml
Outdated
@@ -29,6 +32,7 @@ jobs: | |||
- name: Run transcription tests | |||
run: | | |||
npm run mocha -- --exit --bail "packages/tests/src/transcription/**/*.spec.ts" | |||
yarn run benchmark-transcription |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can remove the benchmark from the CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might as well remove the .github/workflows/transcription.yml
file altogether. What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the Python dependencies I'll add "preinstall": "pip install -r requirements.txt"
to the transcription package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might as well remove the .github/workflows/transcription.yml file altogether. What do you think ?
Yes please!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about the "test" workflow :
- which test suite will run the transcription package test ?
- shall I execute the preinstall script manually somewhere ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can create another test case here: https://github.com/Chocobozzz/PeerTube/blob/develop/scripts/ci.sh where you install your dependencies
Then just add the it to the matrix: https://github.com/Chocobozzz/PeerTube/blob/develop/.github/workflows/test.yml#L42
@@ -0,0 +1,29 @@ | |||
import { millisecondsToTime, secondsToTime } from '@peertube/peertube-core-utils' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a barrel file index.ts
in core-utils
directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, but what's this barrel file should export ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
export * from './date.js'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Humm... I'm not sure to understand the purpose of these in the tests folder. In some of the tests folder (for example in external-plugins
) I'm seeing imports
instead of exports
in the barrel file which might make more sense 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oup's sorry my bad, you're right!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I go with a barrel file with imports
or with no barrel file ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use imports
for consistency, even if I plan to remove them in the future (I try to progressively remove barrel files from the project)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well If it's a progressive work, we could begin with not adding this barrel file if it's not necessary !
@@ -0,0 +1,48 @@ | |||
/* eslint-disable max-len */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove spec
from filename for consistency
packages/tests/src/transcription/transcript/transcript-file.spec.ts
Outdated
Show resolved
Hide resolved
packages/tests/src/transcription/whisper/transcriber/openai-transcriber.spec.ts
Outdated
Show resolved
Hide resolved
}) | ||
|
||
void (async () => { | ||
const logger = createLogger() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you don't need colors or specific formatter I think you can directly use console.log
packages/transcription/src/whisper/transcriber/ctranslate2-transcriber.ts
Outdated
Show resolved
Hide resolved
961724a
to
2dce11c
Compare
I'm not very happy with the TranscriptFileEvaluator constructor... suggestions ? chore(JiWer): add usage in README docs(jiwer): update JiWer readme
chore(transcription): remove transcription workflow docs(transcription): add benchmark info
…ate2 transcriber chore(transcription): ctranslate2 assertion chore(transcription): ctranslate2 assertion
chore(transcription): add download and unzip utils functions chore(transcription): download & unzip models fixtures
52de9ab
to
31fd92a
Compare
Introduce whisper builtin model.
Description
Related issues
Has this been tested?
Screenshots