Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract audio lambda #4

Open
cjjenkinson opened this issue Jul 28, 2021 · 2 comments
Open

Extract audio lambda #4

cjjenkinson opened this issue Jul 28, 2021 · 2 comments

Comments

@cjjenkinson
Copy link
Owner

cjjenkinson commented Jul 28, 2021

The extract audio lambda should have access to the following resources:

  • ffmpeg lambda layer
  • S3 video input bucket
  • S3 extracted audio bucket

Lambda business logic

The purpose of the extract audio is to extract the audio from the uploaded videos and store the audio file into the extracted audio bucket using the ffmpeg tool.

  1. Receive and parse the S3 put event to get the location of the video and create a new video row on the DynamoDB table with the following attributes
  • ID (generate an ID using uuid package)
  • state (starts with pending)
  • videoBucketKey (location of uploaded video on s3)
  • extractedAudioKey (location of extracted audio key, will be null initially)
  • transcriptionState (starts with pending)
  • transcriptionKey (location of transcription SRT file from Assembly.AI, will be null initially)
  1. Read Stream the video blob and write the video to the tmp disk in order to run the ffmpeg process on it
  2. Run the extract audio ffmpeg command line operation ffmpeg -i sample.mp4 -q:a 0 -map a sample.mp3
  3. The extracted audio will be written to the tmp disk space which then needs to be store on the S3 extracted audio bucket and update the extractedAudio key property on the video row.
  4. Clean up the temporary disk space

Concerns:

The max disk space of lambdas is 500 MB so if the video exceeds 499 MB the lambda execution will fail. This needs to be considered when reading videos but for the MVP we can limit video uploads to 400mb.

One option to investigate would be to understand if there is a way to stream the file from S3 using createReadStream and pipe the response into the ffmpeg command?

@cjjenkinson
Copy link
Owner Author

Running the ffmpeg lambda layer when given access to the lambda

import childProcess from "child_process";
import os from "os";

const handler = () => {

     const args = [
       "-i" ,
      "tmp/sample.mp4"
       "-q:a",
        "0",
       "-map a",
       "tmp/sample.mp3"
    ];  

    ... prepare the files locally before running the executable

      const stout = childProcess.execFileSync("/opt/ffmpeg", args, {});

     .... rest of the logic to clear up locally written files from running the executable
}

@h3dg3-Wytch
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants