Skip to content

Latest commit

 

History

History
163 lines (123 loc) · 6.36 KB

README.md

File metadata and controls

163 lines (123 loc) · 6.36 KB

Muzeeglot

In this repository, we present Muzeeglot, a propotype aiming at illustrating how multilingual music genre embedding space representations can be leveraged to generate cross-lingual music genre annotations for DBpedia music entities (artists, albums, tracks, etc ...).

Muzeeglot includes a web interface to visualize these multilingual music genre embeddings.

How it works

Based on annotations from one or several source languages, our system automatically predicts the corresponding annotations in a target language.

Languages supported:

  • 🇫🇷 French
  • 🇬🇧 English
  • 🇪🇸 Spanish
  • 🇳🇱 Dutch
  • 🇨🇿 Czech
  • 🇯🇵 Japanese

You will find more information about application usage here.

Architecture

Muzeeglot is based on a classic N-tier architecture including :

  • A Redis instance as storage engine.
  • A REST API developed in Python with FastAPI.
  • A frontend developed with VueJS, as a SPA (Single Page Application).

The overall stack is loadbalanced using Nginx webserver :

Data such as entities, tags, and languages are stored into the Redis instance. Additionnally, a text search index based on Whoosh is maintained using ngram tokenization on entity names.

Deployment

Deploying Muzeeglot requires the following tools to be installed :

You can then clone this repository and start Muzeeglot1 :

git clone https://github.com/deezer/muzeeglot
cd muzeeglot
make start

Behind the scene it will build the required docker images and run a compose file with everything required locally in daemon mode.

1 first deployment will be long as it requires data ingestion and indexing.

SSL support

In case you want to deploy Muzeeglot with SSL using LetsEncrypt, you need to first create certificate using the provided bot challenge. Start by editing the following configuration files to add your target domain :

  • frontend/nginx/certificate-builder.conf
  • frontend/nginx/muzeeglot-ssl.conf

Once you did so, you can run the following command to generate SSL certificates:

make letsencrypt DOMAIN=mydomain.tld

It will create a docker volume and provision it with certificate. Then you can run Muzeeglot as follows:

make ssl start

Development

Project can be managed using GNU Make through the following goals :

Goal Description
api Build api image
frontend Build frontend image
run Start the entire stack using docker-compose
start Start the entire stack in daemon mode
stop Stop the entier stack using docker-compose
logs Display stack logs when running in daemon mode
clean Clean docker volume for storage and indexes
letsencrypt Generate certificate volume

Additional goals can be used to provide extra parameters:

Goal Description
no-cache Build images using --no-cache flag
ssl Enable SSL support

If you want to use your own data, please provide the following files into api/data directory2:

  • Tag embeddings such as music genres are expected through embeddings.csv CSV file.
  • Reduced embeddings for display are expected through embeddings_reduced.csv CSV file.
  • Supported language are expected through languages.csv CSV file.
  • Indexed entities are expected through entites.csv CSV file.
  • Test corpus is expected through corpus.csv CSV file.

2 you need to clean the data storage and index to force data ingestion when you redeploy.

Cite

@inproceedings{epure2020muzeeglot,
  title={Muzeeglot: annotation multilingue et multi-sources d'entit{\'e}s musicales {\`a} partir de repr{\'e}sentations de genres musicaux},
  author={Epure, Elena V and Salha, Guillaume and Voituret, F{\'e}lix and Baranes, Marion and Hennequin, Romain},
  booktitle={Actes de la 6e conf{\'e}rence conjointe Journ{\'e}es d'{\'E}tudes sur la Parole (JEP, 31e {\'e}dition), Traitement Automatique des Langues Naturelles (TALN, 27e {\'e}dition), Rencontre des {\'E}tudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (R{\'E}CITAL, 22e {\'e}dition). Volume 4: D{\'e}monstrations et r{\'e}sum{\'e}s d'articles internationaux},
  pages={18--21},
  year={2020},
  organization={ATALA}
}

How we learn multilingual music genre embeddings in more detail:

@inproceedings{epure2020modeling,
  title={Modeling the Music Genre Perception across Language-Bound Cultures},
  author={Epure, Elena V and Salha, Guillaume and Manuel, Moussallam and Hennequin, Romain},
  booktitle={The 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)},
  month = nov,
  year={2020},
  publisher = {Association for Computational Linguistics},
}