This repository implements Musicgen, a state of the art text-to-music model, as a scalable online endpoint in AWS Sagemaker. It includes a lambda function to enable public access to the endpoint, as well as a Max4Live device that allows to perform inference right within Ableton Live, seamlessly integrating the model into any music production workflow.
Check out the demo below! Audio on! 🔊🔽
demo_video.mp4
-
Login to AWS:
aws sso login
-
Create the
dev
environment and activate it:conda env create -n dev -f envs/dev.yaml conda activate dev
-
Download the model artifacts:
cd aws/endpoint/src/artifacts/ python download_artifacts.py
-
Create the model tar gz:
cd aws/endpoint/model/ bash create_tar.sh
-
Build and publish the custom docker image for the endpoint:
cd aws/endpoint/container/ bash build_and_publish.sh
-
Update the deployment notebook
aws/endpoint/notebooks/deployment.ipynb
, to reflect the url of the image published instep 5
, and use it to register the model. -
Change the configuration in
aws/terraform/provision_ec2/src/config-dev.yaml
andaws/terraform/provision_ec2/src/config-prod.yaml
. Also update themain.tf
backend as needed, especially in the terraform statekey
. Finally, changelocals.tf
as needed. -
Change the configuration of the workflow files in
.github/workflows/
as needed -
Change the configuration of the chalice application, located in
aws/endpoint/lambda/public_endpoint/.chalice/config.json
-
Use github actions defined in
.github/workflows/
to execute the CD pipeline and provision the endpoint as well as the lambda function, which will return a URL where the REST API are exposed -
Test that all working by sending a GET request to the URL given by the workflow
.github/workflows/lambda-provision.yaml
. You should get a JSON like:{ "status": "online" }
-
Open Ableton Live, import the Max4Live device located under
m4l/Musicgen.amxd
-
Set the
API Endpoint URL
parameter to the URL returned instep 10
. Then, write a prompt, and pressGenerate
In Ableton Live, you can edit the Max For Live device. Feel free to do it, and submit a pull request with new features! This is how the patch is currently implemented:
-
Request schema
The API endpoint accepts requests with content type
application/json
. The schema for them is as following:prompt: str duration: Optional[float] = 8.0 temperature: Optional[float] = 1.0 top_p: Optional[float] = 0.0 top_k: Optional[int] = 250 cfg_coefficient: Optional[float] = 3.0
-
Response schema
The API endpoint equally returns
application/json
, in the format:{ "result": { "prediction": "<PREDICTION-IN-BASE64>", "processing_time_ms": "<PROCESSING-TIME-IN-MS>" } }
Where
prediction
is a .mp3 audio file encoded in base64. -
Example request
For example, a valid JSON request would be:
{ "prompt": "calm piano music", "duration": 4.0, "temperature": 0.8 }
And the response would look like:
{ "result": { "prediction": "//voxAAAOvInHjW8gAeKw6gjO7AAAM+ze...", "processing_time_ms": 15124 } }
For any bugs or problem you might encounter, feel free to open an issue, I would be very happy to help out as much as I can!
For any contribution, feel free to submit a PR. Thank you!