Skip to content

Latest commit

 

History

History
18 lines (13 loc) · 1.2 KB

README.md

File metadata and controls

18 lines (13 loc) · 1.2 KB

extract-jukebox-embeddings

Link to Colab Notebook.

A notebook for extracting embeddings from OpenAI's Jukebox model, following the approach described in Castellon et al. (2021) with some modifications followed in Spotify's Llark paper:

  • Source: Output of the 36th layer of the Jukebox encoder
  • Original Jukebox encoding: 4800-dimensional vectors at 345Hz
  • Audio/embeddings are chunked into 25 seconds clips as that is the max Jukebox can take in as input, any clips shorter than 25 seconds are padded before passed through Jukebox
  • Approach: Mean-pooling within 100ms frames, resulting in:
    • Downsampled frequency: 10Hz
    • Embedding size: 1.2 × 10^6 for a 25s audio clip.
    • For a 25s audio clip the 2D array shape will be [240, 4800]
  • This method retains temporal information while reducing the embedding size

Having a Colab notebook for this gives us an easily reproducible environment and allows us to take advantage of the cheap T4 GPU's Colab offers.

Extended from this repo: https://github.com/Broccaloo/jukebox