Skip to content

Commit

Permalink
chore(test): wip ctranslate 2 adapat
Browse files Browse the repository at this point in the history
  • Loading branch information
lutangar committed Apr 23, 2024
1 parent cafd9a8 commit 73edc8f
Show file tree
Hide file tree
Showing 2 changed files with 98 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ config.truncateThreshold = 0
describe('Whisper CTranslate2 transcriber', function () {
const transcriptDirectory = join(root(), 'test-transcript')
const shortVideoPath = buildAbsoluteFixturePath('video_short.mp4')
const frVideoPath = buildAbsoluteFixturePath('transcription/communiquer-lors-dune-classe-transplantee.mp4')
const transcriber = new Ctranslate2Transcriber(
{
name: 'anyNameShouldBeFineReally',
Expand Down Expand Up @@ -80,6 +81,51 @@ You
`)
})

it('May transcribe a media file using a local CTranslate2 model', async function () {
const transcript = await transcriber.transcribe(
shortVideoPath,
{ name: 'myLocalModel', path: buildAbsoluteFixturePath('transcription/tiny-ctranslate2.bin') },
'en',
'txt'
)
expect(transcript).to.deep.equals({
path: join(transcriptDirectory, 'video_short.txt'),
language: 'en',
format: 'txt'
})

// eslint-disable-next-line @typescript-eslint/no-unused-expressions
expect(existsSync(transcript.path), `Transcript file ${transcript.path} doesn't exist.`).to.be.true
expect(await readFile(transcript.path, 'utf8')).to.equal(`You
`)
})

it('May transcribe a media file in french', async function () {
this.timeout(45000)
const transcript = await transcriber.transcribe(frVideoPath, { name: 'tiny' }, 'fr', 'txt')
expect(transcript).to.deep.equals({
path: join(transcriptDirectory, 'communiquer-lors-dune-classe-transplantee.txt'),
language: 'fr',
format: 'txt'
})

// eslint-disable-next-line @typescript-eslint/no-unused-expressions
expect(existsSync(transcript.path), `Transcript file ${transcript.path} doesn't exist.`).to.be.true
expect(await readFile(transcript.path, 'utf8')).to.equal(
`Communiquez lors d'une classe transplante. Utilisez les photos prises lors de cette classe pour raconter quotidiennement le séjour vécu.

Check failure on line 115 in packages/tests/src/transcription/whisper/transcriber/whisper-ctranslate2.spec.ts

View workflow job for this annotation

GitHub Actions / test (lint)

This line has a length of 143. Maximum allowed is 140
C'est le scénario P-Dagujic présenté par monsieur Navoli, professeur ainsi que le 3 sur une école alimentaire de Montpellier.
La première application a utilisé ce ralame déatec. L'enseignant va alors transférer les différentes photos réalisés lors de la classe transplante.

Check failure on line 117 in packages/tests/src/transcription/whisper/transcriber/whisper-ctranslate2.spec.ts

View workflow job for this annotation

GitHub Actions / test (lint)

This line has a length of 147. Maximum allowed is 140
Dans un dossier, spécifique pour que les élèves puissent le retrouver plus facilement. Il téléverse donc ses photos dans le dossier, dans le venté, dans la médiatèque de la classe.

Check failure on line 118 in packages/tests/src/transcription/whisper/transcriber/whisper-ctranslate2.spec.ts

View workflow job for this annotation

GitHub Actions / test (lint)

This line has a length of 180. Maximum allowed is 140
Pour terminer, il s'assure que le dossier soit bien ouvert aux utilisateurs afin que tout le monde puisse l'utiliser.
Les élèves par la suite utilisera le blog. A partir de leurs nantes, il pourront se loi de parposte rédigeant un article d'un reinté.
Ils illustront ses articles à l'aide des photos de que mon numérique mise à n'accélier dans le venté.
Pour se faire, il pourront utiliser les diteurs avancés qui les renvèrent directement dans la médiatèque de la classe où il pourront retrouver le dossier créé par leurs enseignants.
Une fois leur article terminée, les élèves soumétront se lui-ci au professeur qui pourra soit la noté pour correction ou le public.
Ensuite, il pourront lire et commenter ce de leurs camarades ou répondre aux commentaires de la veille.
`
)
})

it('Should produce the same transcript text as openai-whisper given the same parameters', async function () {
const transcribeArguments: Parameters<typeof transcriber.transcribe> = [
shortVideoPath,
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,57 @@
import { OpenaiTranscriber } from './openai-transcriber.js'
import { TranscriptionModel } from '../../transcription-model.js'
import { Transcript, TranscriptFormat } from '../../transcript.js'
import { $ } from 'execa'
import { getFileInfo } from '../../file-utils.js'
import { join } from 'path'
import { copyFile, rm } from 'node:fs/promises'

Check failure on line 7 in packages/transcription/src/whisper/transcriber/ctranslate2-transcriber.ts

View workflow job for this annotation

GitHub Actions / test (client)

'rm' is declared but its value is never read.

Check failure on line 7 in packages/transcription/src/whisper/transcriber/ctranslate2-transcriber.ts

View workflow job for this annotation

GitHub Actions / test (api-1)

'rm' is declared but its value is never read.

Check failure on line 7 in packages/transcription/src/whisper/transcriber/ctranslate2-transcriber.ts

View workflow job for this annotation

GitHub Actions / test (api-2)

'rm' is declared but its value is never read.

Check failure on line 7 in packages/transcription/src/whisper/transcriber/ctranslate2-transcriber.ts

View workflow job for this annotation

GitHub Actions / test (api-3)

'rm' is declared but its value is never read.

Check failure on line 7 in packages/transcription/src/whisper/transcriber/ctranslate2-transcriber.ts

View workflow job for this annotation

GitHub Actions / test (api-4)

'rm' is declared but its value is never read.

Check failure on line 7 in packages/transcription/src/whisper/transcriber/ctranslate2-transcriber.ts

View workflow job for this annotation

GitHub Actions / test (api-5)

'rm' is declared but its value is never read.

Check failure on line 7 in packages/transcription/src/whisper/transcriber/ctranslate2-transcriber.ts

View workflow job for this annotation

GitHub Actions / test (cli-plugin)

'rm' is declared but its value is never read.
import { dirname, basename } from 'node:path'

export class Ctranslate2Transcriber extends OpenaiTranscriber {
public static readonly MODEL_FILENAME = 'model.bin'

async transcribe (
mediaFilePath: string,
model: TranscriptionModel = { name: 'tiny' },
language: string = 'en',
format: TranscriptFormat = 'vtt'
): Promise<Transcript> {
this.createPerformanceMark()
// Shall we run the command with `{ shell: true }` to get the same error as in sh ?
// ex: ENOENT => Command not found
const $$ = $({ verbose: true })
const { baseName } = getFileInfo(mediaFilePath)

let modelFilepath = model.path
const shouldCreateModelCopy = (model.path && basename(model.path) !== Ctranslate2Transcriber.MODEL_FILENAME)
if (shouldCreateModelCopy) {
modelFilepath = join(dirname(model.path), Ctranslate2Transcriber.MODEL_FILENAME)
await copyFile(model.path, modelFilepath)
}

const modelArgs = model.path ? [ '--model_directory', dirname(model.path) ] : [ '--model', model.name ]

await $$`${this.engine.binary} ${[
mediaFilePath,
...modelArgs,
'--output_format',
format,
'--output_dir',
this.transcriptDirectory,
'--language',
language
]}`

if (shouldCreateModelCopy) {
// await rm(modelFilepath)
}

this.measurePerformanceMark()

return {
language,
path: join(this.transcriptDirectory, `${baseName}.${format}`),
format
}
}
}

0 comments on commit 73edc8f

Please sign in to comment.