Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for audio data based projects #2852

Open
imflash217 opened this issue Jul 28, 2023 · 7 comments
Open

Support for audio data based projects #2852

imflash217 opened this issue Jul 28, 2023 · 7 comments
Labels
enhancement New feature or request 🍏 primitives Relating to Rerun primitives 📺 re_viewer affects re_viewer itself user-request This is a pressing issue for one of our users

Comments

@imflash217
Copy link

Is your feature request related to a problem? Please describe.

I primarily work with audio data and it is particularly challenging to visualize different stages of audio data like waveforms or spectrograms. It becomes more challenging if the data is multi-channel audio or very long audio. Currently I have to use jupyter-notebook to display and play my audio. The context switching is very tiring. Also, it is more challenging to exactly relate the audio waveform at a particular timestamp and its corresponding spectrograms. This becomes worse, if we are working of multimodal models like Automatic Speech Recognition (ASR) systems which require text visualization with its corresponding audio.

Describe the solution you'd like

I am very impressed with the video support that is provided by rerun api. I would like to see a similar first-class support for audio based projects too with following features:

  1. [important] play my audio as a time-series data
  2. [important] plot and visualize the changing spectrograms as the audio is playing to precisely pinpoint the timestamp and its corresponding extracted features. Support for various power-spectrums like MFCC would be extremely helpful.
  3. [important] ability ot play individual channels separately or play multiple channels combined. This is essential for various tasks such as source-separation, denoising.
  4. [important] For various tasks like Automatic Speech Recognition (ASR) we would want to see a correlation between the timestamp-window and the respective text produced by the ASR model. This would be scalable across waveform, power-spectrums and ASR text-output so we can comprehend everything at once.
  5. [nice-to-have] ability to apply various types of windows (eg. hanning, hamming etc) and filters (eg. low-pass, high-pass, band-pass etc.) on a audio or a batch to quick experiment on-the-fly.

Describe alternatives you've considered

As far as I know, there is not a comprehensive tool that supports these features, yet. I have to use Jupyter-notebook and librosa most of my experimentation and the biggest challenge is making sure that the timestamp in audio is exactly same as in the power-spectrums.

Additional context

@imflash217 imflash217 added enhancement New feature or request 👀 needs triage This issue needs to be triaged by the Rerun team labels Jul 28, 2023
@Wumpf Wumpf added 📺 re_viewer affects re_viewer itself 🍏 primitives Relating to Rerun primitives and removed 👀 needs triage This issue needs to be triaged by the Rerun team labels Jul 28, 2023
@emilk
Copy link
Member

emilk commented Aug 8, 2023

One fundamental thing we need to implement before we start working on this is log events with a duration. Currently each log event is associated with a single instance (a video is just a set of frames, each logged individually). This won't work for audio: you'd like to log e.g. a two second sound in one log call. We will also need this functionality when implementing proper video codecs.

@lunixbochs
Copy link

lunixbochs commented Oct 6, 2023

I'm very interested in logging and labeling realtime audio when tracing Talon with Rerun!

I'll note that Talon's audio is realtime/continuous/infinite, but it might make more sense efficiency wise to log it in larger chunks than in say 30ms intervals. If we did that, I would want an easy way to backdate a longer chunk of streamed audio to the actual timestep/frame in which it originated during logging.

I think plotting and visualizing audio features is very useful, but I don't want Rerun to calculate the features (spectrogram, windowing, filters, etc) for me. Those are labels / data processing I can ship with the audio signal and they're in my domain of expertise to make sure the data I'm sending you to render is exactly what I want.

Audio Timeline Space

I think I want a kind of "audio timeline" space, which looks sort of like an audacity track and maybe supports several audio channels (vertically stacked), and maybe supports other views of the same audio like spectrograms (which I'm happy to embed in the trace myself).

  • It would have a scrubber synchronized with the global rerun timeline.
  • You can single-click in the audio track to change the global timestep.
  • You can click+drag or click+shift-click to select a region of audio in the audio timeline.
    • If you "play" the audio during this time, playback stops when you get to the end of the selected region (can have a clickable option to loop the region as well).
    • It would be nice to be able to export the selected audio to a file for further debugging (e.g. wav or flac).
  • You can mute individual audio tracks. I'd probably want audio to be muted by default so I'm not blasting audio in public, and because I might have a number of audio streams that would sound terrible if you played them overlapping. (You could add a blueprint setting to unmute individual tracks if users wanted to change this default?)

Annotations

  • You can attach labels to either a zero-width section of the audio timeline, or to a span of the audio timeline.
  • In some cases, annotations could feasibly be their own space or their own track within the audio space. Timeline annotations might make enough sense outside of audio to consider the general use case for them.
  • Some annotations are audio track specific, and some may be more global.

Here's an extreme example of what duration annotations might look like in audacity:
screenshot_2023-08-08_at_3 48 43_pm

Spatial audio

I think about spatial audio as well, e.g. several audio tracks with distinct 3d positions that can change over time. I wouldn't worry about playing the audio back spatially at first, but being able to select an audio track and see it highlighted + move around in the 3d scene might be really useful.

@emilk
Copy link
Member

emilk commented Oct 19, 2023

This looks like a nice, simple audio library for rust:

@CatalinVoss
Copy link

Very interested in audio support as well. Would also love to be able to visualize alongside 2D matrices where each row covers a fixed time window (may be a probability vector over an alphabet, a spectogram entry, or similar).

+1 to text as well.

@abey79 abey79 added the user-request This is a pressing issue for one of our users label Dec 18, 2023
@cboulay
Copy link

cboulay commented May 2, 2024

+1 for spectrogram. I'm hoping to use a spectrogram to visualize streaming (unbounded / realtime) brain signals, not audio, but I think the solution will work equally well for either.

I don't think rerun should be responsible for doing the spectral transformation. This is too personal and domain specific. (Pre-Filtering? Windowing? Log-transform? FFT or Wavelets? Multi-taper? Frequency resolution? Window duration? Window step size?). It should be up to the user to do their spectral transformation then log their spectrum / spectra.

  • The SDK client app logs a tensor (e.g., "channels" x "frequencies") for a single frame, or a batch of frames ("times" x "channels" x "frequencies").
  • Somewhere under the hood, the history of tensors get concatenated along the "time" axis.

The Space view should be something like a mix of the Tensor view and TimeSeries view:

  • representation is an image, like the generic tensor view
  • x-dimension is always "time" and can be scrolled and scrubbed over, just like a TimeSeries view.
  • user can choose whether the y-axis is "channel" or "frequency" and the index into each of the non-rendered dimensions, just like the generic tensor view

Until something like this is implemented, I might try plotting a scalar for every time x frequency, for only a single channel, and then coloring each scalar independently, probably with a SeriesPoint and square markers.

@emilk
Copy link
Member

emilk commented Oct 31, 2024

For decoding audio (that is not simple PCM), we should be able to use ffmpeg over CLI, like we do for video (see #7962)

@sud335
Copy link

sud335 commented Dec 12, 2024

+1 for spectrogram for audio visualization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request 🍏 primitives Relating to Rerun primitives 📺 re_viewer affects re_viewer itself user-request This is a pressing issue for one of our users
Projects
None yet
Development

No branches or pull requests

8 participants