Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(transcription): change transcribers transcribe method signature #2

Merged

Conversation

lutangar
Copy link
Member

@lutangar lutangar commented May 6, 2024

Introduce whisper builtin model.

Description

Related issues

Has this been tested?

  • 👍 yes, I added tests to the test suite
  • 💭 no, because this PR is a draft and still needs work
  • 🙅 no, because this PR does not update server code
  • 🙋 no, because I need help

Screenshots

@lutangar lutangar requested a review from Chocobozzz May 6, 2024 14:17
Forbid transcript creation without a language.
Add `languageDetection` flag to an engine and some assertions.

Fix an issue in `whisper-ctranslate2` :
Softcatala/whisper-ctranslate2#93
@lutangar lutangar force-pushed the transcription-first-feedbacks branch from eb36949 to 5fffa52 Compare May 7, 2024 09:49
lutangar added 3 commits May 7, 2024 15:07
Update existing time function to output an integer number of seconds and add a ms human-readable time formatter with hints of tests.
@lutangar lutangar force-pushed the transcription-first-feedbacks branch from 3453b94 to 75f644f Compare May 7, 2024 14:49
@@ -29,6 +32,7 @@ jobs:
- name: Run transcription tests
run: |
npm run mocha -- --exit --bail "packages/tests/src/transcription/**/*.spec.ts"
yarn run benchmark-transcription
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove the benchmark from the CI

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might as well remove the .github/workflows/transcription.yml file altogether. What do you think ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the Python dependencies I'll add "preinstall": "pip install -r requirements.txt" to the transcription package.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might as well remove the .github/workflows/transcription.yml file altogether. What do you think ?

Yes please!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the "test" workflow :

  • which test suite will run the transcription package test ?
  • shall I execute the preinstall script manually somewhere ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can create another test case here: https://github.com/Chocobozzz/PeerTube/blob/develop/scripts/ci.sh where you install your dependencies

Then just add the it to the matrix: https://github.com/Chocobozzz/PeerTube/blob/develop/.github/workflows/test.yml#L42

packages/jiwer/README.md Outdated Show resolved Hide resolved
@@ -0,0 +1,29 @@
import { millisecondsToTime, secondsToTime } from '@peertube/peertube-core-utils'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a barrel file index.ts in core-utils directory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, but what's this barrel file should export ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

export * from './date.js'

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Humm... I'm not sure to understand the purpose of these in the tests folder. In some of the tests folder (for example in external-plugins) I'm seeing imports instead of exports in the barrel file which might make more sense 🤔

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oup's sorry my bad, you're right!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I go with a barrel file with imports or with no barrel file ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use imports for consistency, even if I plan to remove them in the future (I try to progressively remove barrel files from the project)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well If it's a progressive work, we could begin with not adding this barrel file if it's not necessary !

@@ -0,0 +1,48 @@
/* eslint-disable max-len */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove spec from filename for consistency

packages/tests/src/transcription/subtitle.spec.ts Outdated Show resolved Hide resolved
})

void (async () => {
const logger = createLogger()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you don't need colors or specific formatter I think you can directly use console.log

packages/transcription/src/transcription-model.ts Outdated Show resolved Hide resolved
@lutangar lutangar force-pushed the transcription-first-feedbacks branch 4 times, most recently from 961724a to 2dce11c Compare May 30, 2024 15:39
@lutangar lutangar force-pushed the transcription-first-feedbacks branch from 52de9ab to 31fd92a Compare May 31, 2024 13:56
@lutangar lutangar merged commit 4745d30 into transcription-backend-workbench May 31, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants