Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master' into faster_python_import
Browse files Browse the repository at this point in the history
  • Loading branch information
dbogdanov committed Dec 14, 2023
2 parents b9615f7 + 95c996e commit 8ed5045
Show file tree
Hide file tree
Showing 32 changed files with 913 additions and 169 deletions.
2 changes: 2 additions & 0 deletions FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,8 @@ A lightweight version of Essentia for iOS can be compiled using the ```--cross-c
You can also compile it for iOS simulator (so that you can test on your desktop) using ```--cross-compile-ios-sim``` flag.
Please note that TensorFlow-based Essentia algorithms are not supported on iOS at the moment because we do not currently offer a TensorFlowLite wrapper.
Compiling Essentia to ASM.js or WebAssembly using Emscripten
------------------------------------------------------------
Expand Down
13 changes: 12 additions & 1 deletion doc/sphinxdoc/_templates/applications.html
Original file line number Diff line number Diff line change
Expand Up @@ -296,7 +296,18 @@ <h1>Applications</h1>
<a href="https://github.com/leozimmerman/ofxAudioAnalyzer">ofxAudioAnalyzer</a> is an openFrameworks wrapper for Essentia. It provides audio analysis algorithms modified to process signals in real-time.
</dd>
</div>

<div class="row essnt-apps-page__container">
<dt class="col-xs-2 col-sm-3 col-md-2 essnt-apps-page__logo">
<a href="https://github.com/p3zo/gifsync" title="Go to GIF Sync">
<span class="essnt-apps-page__logo-text">
GIF Sync
</span>
</a>
</dt>
<dd class="col-xs-10 col-sm-9 col-md-10 essnt-apps-page__description">
<a href="https://github.com/p3zo/gifsync">GIF Sync</a> reassembles the frames of a GIF to sync its animation to the beat of an audio file.
</dd>
</div>
</dl>

{% endblock %}
11 changes: 9 additions & 2 deletions doc/sphinxdoc/demos.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,21 @@ Examples of music audio analysis with Essentia algorithms using Essentia.js
https://mtg.github.io/essentia.js/examples/


Tempo estimation
----------------

Tempo BPM estimation with Essentia: https://replicate.com/mtg/essentia-bpm


Essentia TensorFlow models
--------------------------

Examples of inference with the pre-trained TensorFlow models for music auto-tagging and classification tasks:

- Music classification by genre, mood, danceability, instrumentation: https://replicate.com/mtg/music-classifiers
- Music style classification with the Discogs taxonomy (400 styles). Overall track-level predictions: https://replicate.com/mtg/effnet-discogs
- Music style classification with the Discogs taxonomy (400 styles). Segment-level real-time predictions with Essentia.js: https://essentia.upf.edu/essentiajs-discogs
- Music style classification with the Discogs taxonomy (400 styles, MAEST model). Overall track-level predictions: https://replicate.com/mtg/maest
- Music style classification with the Discogs taxonomy (400 styles, Effnet-Discogs model). Overall track-level predictions: https://replicate.com/mtg/effnet-discogs
- Music style classification with the Discogs taxonomy (400 styles, Effnet-Discogs model). Segment-level real-time predictions with Essentia.js: https://essentia.upf.edu/essentiajs-discogs
- Real-time music autotagging (50 tags) in the browser with Essentia.js: https://mtg.github.io/essentia.js/examples/demos/autotagging-rt/
- Mood classification in the browser with Essentia.js: https://mtg.github.io/essentia.js/examples/demos/mood-classifiers/
- Music emotion arousal/valence regression: https://replicate.com/mtg/music-arousal-valence
Expand Down
106 changes: 101 additions & 5 deletions doc/sphinxdoc/models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ If you use any of the models in your research, please cite the following paper::
booktitle={International Conference on Acoustics, Speech and Signal Processing ({ICASSP})},
year={2020}
}

.. highlight:: default


Expand Down Expand Up @@ -137,6 +137,105 @@ Models:
*Note: We provide models operating with a fixed batch size of 64 samples since it was not possible to port the version with dynamic batch size from ONNX to TensorFlow. Additionally, an ONNX version of the model with* `dynamic batch <https://essentia.upf.edu/models/feature-extractors/discogs-effnet/discogs-effnet-bsdynamic-1.onnx>`_ *size is provided.*


MAEST
^^^^^

Music Audio Efficient Spectrogram Transformer (`MAEST <https://github.com/palonso/MAEST/>`_) trained to predict music style labels using an in-house dataset annotated with Discogs metadata.
We offer versions of MAEST trained with sequence lengths ranging from 5 to 30 seconds (``5s``, ``10s``, ``20s``, and ``30s``), and trained starting from different intial weights: from random initialization (``fs``), from `DeiT <https://doi.org/10.48550/arXiv.2012.12877>`_ pre-trained weights (``dw``), and from `PaSST <https://doi.org/10.48550/arXiv.2106.07139>`_ pre-trained weights (``pw``). Additionally, we offer a version of MAEST trained following a teacher student setup (``ts``).
According to our study ``discogs-maest-30s-pw``, achieved the most competitive performance in most downstream tasks (refer to the `paper <http://hdl.handle.net/10230/58023>`_ for details).


Models:

.. collapse:: ⬇️ <a class="reference external">discogs-maest-30s-pw</a>

|
[`weights <https://essentia.upf.edu/models/feature-extractors/maest/discogs-maest-30s-pw-1.pb>`_, `metadata <https://essentia.upf.edu/models/feature-extractors/maest/discogs-maest-30s-pw-1.json>`_]

Model trained with a multi-label classification objective targeting 400 Discogs styles.

Python code for embedding extraction:

.. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-30s-pw-1_embeddings.py

.. collapse:: ⬇️ <a class="reference external">discogs-maest-30s-pw-ts</a>

|
[`weights <https://essentia.upf.edu/models/feature-extractors/maest/discogs-maest-30s-pw-ts-1.pb>`_, `metadata <https://essentia.upf.edu/models/feature-extractors/maest/discogs-maest-30s-pw-ts-1.json>`_]

Model trained with a multi-label classification objective targeting 400 Discogs styles.

Python code for embedding extraction:

.. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-30s-pw-ts-1_embeddings.py

.. collapse:: ⬇️ <a class="reference external">discogs-maest-20s-pw</a>

|
[`weights <https://essentia.upf.edu/models/feature-extractors/maest/discogs-maest-20s-pw-1.pb>`_, `metadata <https://essentia.upf.edu/models/feature-extractors/maest/discogs-maest-20s-pw-1.json>`_]

Model trained with a multi-label classification objective targeting 400 Discogs styles.

Python code for embedding extraction:

.. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-20s-pw-1_embeddings.py

.. collapse:: ⬇️ <a class="reference external">discogs-maest-10s-pw</a>

|
[`weights <https://essentia.upf.edu/models/feature-extractors/maest/discogs-maest-10s-pw-1.pb>`_, `metadata <https://essentia.upf.edu/models/feature-extractors/maest/discogs-maest-10s-pw-1.json>`_]

Model trained with a multi-label classification objective targeting 400 Discogs styles.

Python code for embedding extraction:

.. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-10s-pw-1_embeddings.py

.. collapse:: ⬇️ <a class="reference external">discogs-maest-10s-fs</a>

|
[`weights <https://essentia.upf.edu/models/feature-extractors/maest/discogs-maest-10s-fs-1.pb>`_, `metadata <https://essentia.upf.edu/models/feature-extractors/maest/discogs-maest-10s-fs-1.json>`_]

Model trained with a multi-label classification objective targeting 400 Discogs styles.

Python code for embedding extraction:

.. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-10s-fs-1_embeddings.py

.. collapse:: ⬇️ <a class="reference external">discogs-maest-10s-dw</a>

|
[`weights <https://essentia.upf.edu/models/feature-extractors/maest/discogs-maest-10s-dw-1.pb>`_, `metadata <https://essentia.upf.edu/models/feature-extractors/maest/discogs-maest-10s-dw-1.json>`_]

Model trained with a multi-label classification objective targeting 400 Discogs styles.

Python code for embedding extraction:

.. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-10s-dw-1_embeddings.py

.. collapse:: ⬇️ <a class="reference external">discogs-maest-5s-pw</a>

|
[`weights <https://essentia.upf.edu/models/feature-extractors/maest/discogs-maest-5s-pw-1.pb>`_, `metadata <https://essentia.upf.edu/models/feature-extractors/maest/discogs-maest-5s-pw-1.json>`_]

Model trained with a multi-label classification objective targeting 400 Discogs styles.

Python code for embedding extraction:

.. literalinclude:: ../../src/examples/python/models/scripts/feature-extractors/maest/discogs-maest-5s-pw-1_embeddings.py


*Note: It is possible to retrieve the output of each attention layer by setting* ``output=StatefulParitionedCall:n`` *, where* ``n`` *is the index of the layer (starting from 1).*
*The output from the attention layers should be interpreted as* ``[batch_index, 1, token_number, embeddings_size]``
*, where the first and second tokens (i.e.,* ``[0, 0, :2, :]`` *) correspond to the* ``CLS`` *and* ``DIST`` *tokens respectively, and the following ones to input signal.*

OpenL3
^^^^^^

Expand Down Expand Up @@ -240,7 +339,7 @@ The name of these models is a combination of the classification/regression task
*Note: TensorflowPredict2D has to be configured with the correct output layer name for each classifier. Check the attached JSON file to find the name of the output layer on each case.*


Music genre and style
Music genre and style
^^^^^^^^^^^^^^^^^^^^^


Expand Down Expand Up @@ -2071,6 +2170,3 @@ Models:
Python code for predictions:

.. literalinclude :: ../../src/examples/python/models/scripts/tempo/tempocnn/deeptemp-k16-3_predictions.py
2 changes: 2 additions & 0 deletions doc/sphinxdoc/research_papers.md
Original file line number Diff line number Diff line change
Expand Up @@ -79,6 +79,8 @@ Indexing music by mood: design and integration of an automatic content-based ann

## Emotion detection

- Azuaje, G., Liew, K., Epure, E., Yada, S., Wakamiya, S., & Aramaki, E. (2023). Visualyre: multimodal album art generation for independent musicians. Personal and Ubiquitous Computing, 1-12.

- S. Chowdhury, and G. Widmer. On perceived emotion in expressive piano performance: Further experimental evidence for the relevance of mid-level perceptual features. In International Society for Music Information Retrieval (ISMIR 2021), 2021.

- Byun, S. W., Lee, S. P. A Study on a Speech Emotion Recognition System with Effective Acoustic Features Using Deep Learning Algorithms. Applied Sciences, 11(4), 1890, 2021.
Expand Down
2 changes: 1 addition & 1 deletion pyproject-tensorflow.toml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ manylinux-x86_64-image = "mtgupf/essentia-builds:manylinux2014_x86_64"

# Only support x86_64 for essentia-tensorflow
build = "cp**-manylinux_x86_64"
skip = ["pp*", "*-musllinux*"]
skip = ["pp*", "*-musllinux*", "*i686"]

environment = { PROJECT_NAME="essentia-tensorflow", ESSENTIA_PROJECT_NAME="${PROJECT_NAME}", ESSENTIA_WHEEL_SKIP_3RDPARTY=1, ESSENTIA_WHEEL_ONLY_PYTHON=1 }

Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ build-verbosity = 3
manylinux-x86_64-image = "mtgupf/essentia-builds:manylinux2014_x86_64"
manylinux-i686-image = "mtgupf/essentia-builds:manylinux2014_i686"

skip = ["pp*", "*-musllinux*"]
skip = ["pp*", "*-musllinux*", "*i686"]

environment = { PROJECT_NAME="essentia", ESSENTIA_PROJECT_NAME="${PROJECT_NAME}", ESSENTIA_WHEEL_SKIP_3RDPARTY=1, ESSENTIA_WHEEL_ONLY_PYTHON=1 }

Expand Down
2 changes: 1 addition & 1 deletion src/algorithms/filters/iir.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ using namespace standard;

const char* IIR::name = "IIR";
const char* IIR::category = "Filters";
const char* IIR::description = DOC("This algorithm implements a standard IIR filter. It filters the data in the input vector with the filter described by parameter vectors 'numerator' and 'denominator' to create the output filtered vector. In the litterature, the numerator is often referred to as the 'B' coefficients and the denominator as the 'A' coefficients.\n"
const char* IIR::description = DOC("This algorithm implements a standard IIR filter. It filters the data in the input vector with the filter described by parameter vectors 'numerator' and 'denominator' to create the output filtered vector. In the literature, the numerator is often referred to as the 'B' coefficients and the denominator as the 'A' coefficients.\n"
"\n"
"The filter is a Direct Form II Transposed implementation of the standard difference equation:\n"
" a(0)*y(n) = b(0)*x(n) + b(1)*x(n-1) + ... + b(nb-1)*x(n-nb+1) - a(1)*y(n-1) - ... - a(nb-1)*y(n-na+1)\n"
Expand Down
17 changes: 6 additions & 11 deletions src/algorithms/machinelearning/tensorflowpredict.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -366,6 +366,7 @@ const Tensor<Real> TensorflowPredict::TFToTensor(
TF_Output TensorflowPredict::graphOperationByName(const string nodeName) {
int index = 0;
const char* name = nodeName.c_str();
string newNodeName;

// TensorFlow operations (or nodes from the graph perspective) return tensors named <nodeName:n>, where n goes
// from 0 to the number of outputs. The first output tensor of a node can be extracted implicitly (nodeName)
Expand All @@ -374,22 +375,16 @@ TF_Output TensorflowPredict::graphOperationByName(const string nodeName) {
string::size_type n = nodeName.find(':');
if (n != string::npos) {
try {
string::size_type next_char;
index = stoi(nodeName.substr(n + 1), &next_char);

if (n + next_char + 1 != nodeName.size()) {
throw EssentiaException("TensorflowPredict: `" + nodeName + "` is not a valid node name, the index cannot "
"be followed by other characters. Make sure that all your inputs and outputs follow "
"the pattern `nodeName:n`, where `n` in an integer that goes from 0 to the number "
"of outputs of the node - 1.");
}
newNodeName = nodeName.substr(0, n);
name = newNodeName.c_str();
index = stoi(nodeName.substr(n + 1, nodeName.size()));

} catch (const invalid_argument& ) {
throw EssentiaException("TensorflowPredict: `" + nodeName + "` is not a valid node name. Make sure that all "
"your inputs and outputs follow the pattern `nodeName:n`, where `n` in an integer that "
"goes from 0 to the number of outputs of the node - 1.");
}
name = nodeName.substr(0, n).c_str();
}

}

TF_Operation* oper = TF_GraphOperationByName(_graph, name);
Expand Down
Loading

0 comments on commit 8ed5045

Please sign in to comment.