Skip to content

Commit

Permalink
whole paper excerpt for figures
Browse files Browse the repository at this point in the history
  • Loading branch information
yiitozer committed Mar 11, 2024
1 parent 16e8f15 commit 3319ecf
Show file tree
Hide file tree
Showing 2 changed files with 84 additions and 3 deletions.
55 changes: 52 additions & 3 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,7 @@ Moreover, the Python package libfmp [@MuellerZ21_libfmp_JOSS] includes a functio
(\texttt{libfmp.b.sonify\_chromagram\_with\_signal}) for sonifying time--chroma representations.
Testing these methods, our experiments have revealed that current implementations frequently rely on inefficient
event-based looping, resulting in excessively long runtimes. For instance, generating a click soundtrack for beat
annotations of 10-minute recordings can require \meinard{impractically long processing times}.
annotations of 10-minute recordings can require impractically long processing times.

In our Python toolbox, libsoni, we offer implementations of various sonification methods, including those
mentioned above. These implementations feature a coherent API and are based on straightforward methods that are
Expand All @@ -106,7 +106,56 @@ this could be a potential future extension. Hence, libsoni may not only be benef
educators, students, composers, sound designers, and individuals exploring new musical concepts.


## Chromagram Representations (libsoni.chroma)
# Core Functionalities
In the following, we briefly describe some of the main modules included in the Python toolbox libsoni.
For an illustration of some core functionalities, we also refer to Figure. A comprehensive API
documentation of libsoni is publicly accessible through GitHub[^1]. Furthermore, the applications of core
functionalities are illustrated by educational Jupyter notebooks as an integral part of libsoni, providing
illustrative code examples within concrete MIR scenarios.

## Triggered Sound Events (\texttt{libsoni.tse})
The Triggered Sound Events (TSE) module of libsoni contains various functions for the sonification of temporal events.
In this scenario, one typically has a music recording and a list of time positions that indicate the presence of
certain musical events. These events could include onset positions of specific notes, beat positions, or structural
boundaries between musical parts.
The TSE module allows for generating succinct acoustic stimuli at each of the time positions, providing the listener
with precise temporal feedback. Ideally, these stimuli should be perceivable even when overlaid with the original
music recording. Often, the time positions are further classified into different categories (e.g., downbeat and
upbeat positions). To facilitate this classification, similar to librosa [@McFeeRLEMBN15_librosa_Python], the TSE module
allows for generating distinguishable stimuli with different ``colorations'' that can be easily associated with the
different categories. Additionally, the TSE module enables the playback of pre-recorded stimuli at different relative
time positions and, if specified by suitable parameter settings, with time-scale modifications and pitch shifting.

## Fundamental Frequency (\texttt{libsoni.f0})
When describing a specific song, we often have the ability to sing or hum the main melody, which can be loosely
defined as a linear succession of musical tones expressing a particular musical idea. In the context of a music
recording (rather than a musical score), the melody corresponds to a sequence of fundamental frequency values
(also called F0 values) representing the pitches of the tones. In real performances, these sequences often form
complex time--frequency patterns known as frequency trajectories, which may include continuous frequency glides
(glissando) or frequency modulations (vibrato). In libsoni, the F0 module allows for sonifying a sequence of frame-wise
frequency values that correspond to manually annotated or estimated F0 values (see also Figure b). This module offers a
variety of adjustable parameters, allowing for the inclusion of additional partials to tonally enrich the sonification,
thereby generating sounds of different timbre. Moreover, users have the option to adjust the amplitude of each predicted
F0 value based on its confidence level, as provided by an F0 estimator. This allows for insights into the reliability
of the predictions.

## Piano-Roll Representations (\texttt{libsoni.pianoroll})
A symbolic score-based representation describes each note by parameters such as start time, duration, pitch, and other
attributes. This representation is closely related to MIDI encodings and is often visualized in the form of
two-dimensional piano-roll representations (see also Figure c). In these representations, time is
encoded on the horizontal axis, pitch on the vertical axis, and each note is represented by an axis-parallel rectangle
indicating onset, pitch, and duration. This representation is widely used in several MIR tasks, including automatic
music transcription [@BenetosDDE19_MusicTranscription_SPM] und music score--audio music synchronization
[@Mueller15_FMP_SPRINGER}]. The simplest method in libsoni to sonify piano-roll representations is based on
straightforward sinusoidal models (potentially enriched by harmonics). When the score information is synchronized
with a music recording (e.g., using alignment methods provided by the Sync Toolbox [@MuellerOKPD21_SyncToolbox_JOSS],
libsoni enables the creation of a stereo signal with the sonification in one channel and the original recording in the
other channel. This setup provides an intuitive way to understand the accuracy for a range of musical analysis and
transcription tasks. Furthermore, these sonifications may be superimposed with further onset-based stimuli provided by
the TSE module.


## Chromagram Representations (\texttt{libsoni.chroma})
Humans perceive pitch in a periodic manner, meaning that pitches separated by an octave are perceived as having a
similar quality or acoustic color, known as chroma. This concept motivates the use of time--chroma representations
or chromagrams, where pitch bands that differ spectrally by one or several octaves are combined to form a single chroma
Expand All @@ -121,7 +170,7 @@ extracted from music recordings. This facilitates deeper insights for listeners
harmony-related tonal information contained within an audio signal.


## Spectrogram Representations (libsoni.spectrogram)
## Spectrogram Representations (\texttt{libsoni.spectrogram})
Similar to chromagrams, pitch-based feature representations can be derived directly from music recordings using
transforms such as the constant-Q-transform (CQT), see [@SchoerkhuberK10_ConstantQTransform_SMC].
These representations are a special type of log-frequency spectrograms, where the frequency axis is logarithmically
Expand Down
32 changes: 32 additions & 0 deletions paper/references.bib
Original file line number Diff line number Diff line change
@@ -1,3 +1,24 @@
@inproceedings{RosenzweigSM22_libf0_ISMIR-LBD,
author = {Sebastian Rosenzweig and Simon Schw{\"a}r and Meinard M{\"u}ller},
title = {{libf0}: {A} {P}ython Library for Fundamental Frequency Estimation},
booktitle = {Demos and Late Breaking News of the International Society for Music Information Retrieval Conference ({ISMIR})},
address = {Bengaluru, India},
year = {2022},
url-demo = {https://github.com/groupmm/libf0},
url-pdf = {2022_RosenzweigSM_libf0_ISMIR-LBD.pdf}
}

@article{MuellerOKPD21_SyncToolbox_JOSS,
author = {Meinard M{\"u}ller and Yigitcan {\"O}zer and Michael Krause and Thomas Pr{\"a}tzlich and Jonathan Driedger},
title = {{S}ync {T}oolbox: {A} {P}ython Package for Efficient, Robust, and Accurate Music Synchronization},
journal = {Journal of Open Source Software ({JOSS})},
volume = {6},
number = {64},
year = {2021},
pages = {3434:1--4},
doi = {10.21105/joss.03434}
}

@article{MuellerZ21_libfmp_JOSS,
author = {Meinard M{\"u}ller and Frank Zalkow},
title = {{libfmp}: {A} {P}ython Package for Fundamentals of Music Processing},
Expand All @@ -11,6 +32,17 @@ @article{MuellerZ21_libfmp_JOSS
url-demo = {https://github.com/meinardmueller/libfmp}
}

@article{BenetosDDE19_MusicTranscription_SPM,
author = {Emmanouil Benetos and Simon Dixon and Zhiyao Duan and Sebastian Ewert},
title = {Automatic Music Transcription: {A}n Overview},
journal = {{IEEE} Signal Processing Magazine},
volume = {36},
number = {1},
pages = {20--30},
year = {2019},
doi = {10.1109/MSP.2018.2869928}
}

@article{McFeeKCSBB19_OpenSourcePractices_IEEE-SPM,
author = {Brian McFee and Jong Wook Kim and Mark Cartwright and Justin Salamon and Rachel M. Bittner and Juan Pablo Bello},
title = {Open-Source Practices for Music Signal Processing Research: Recommendations for Transparent, Sustainable, and Reproducible Audio Research},
Expand Down

0 comments on commit 3319ecf

Please sign in to comment.