This application is based on the EasyMLServe Projekt from KIT and we used it as a case study to test the usability of an easy usage of machine learning models in the musical environment.
It consists of a service, which is the core of the app and an user interface (website), that is used for testing and accessing the service.
The service hosts multiple machine learning models on a server, that can be accessed via a REST API.
The goal is that many different applications can access these ml models through the API.
Click here when you are interested in how to use the service.
The service is reachable on port 8000.
Simply send a JSON Post request to /process:
{
"model_to_use": 2, // 0 = "Librosa - GTZAN", 1 = "Librosa - FMA", 2 = "JLibrosa - GTZAN", 3 = "JLibrosa - FMA"
"music_array": [
mfcc_01_mean,mfcc_01_std,...,mfcc_20_mean,mfcc_20_std
]
}
The service will answer with a response like this:
{
"genre": "Folk",
"confidences": {
"Electronic": 0.03255925909317552,
"Experimental": 0.25040334119246555,
"Folk": 0.41365239093414485,
"HipHop": 0.06541446814588152,
"Instrumental": 0.06807809042033906,
"International": 0.07369756385779534,
"Pop": 0.07780601926004657,
"Rock": 0.018388872916917674
}
}
Click here when you are interested in how to use the website.
The website is reachable on port 8080.
Click here when you are interested in the different models.
Four models are available, which differ in the following points:
- The language in which the mfcc values are generated:
- Librosa: The popular Python library for working with audio files.
- JLibrosa: The Java counterpart to the Python library, used because it generates slightly different values than Librosa.
- The data set from which the mfcc values are generated (we split the sound files described below into 5-second parts and added noise, doubling the number of snippets):
- GTZAN: 10 genres, 100 audio files, each 30 seconds long.
- Free Music Archive (FMA): 8 genres, 1000 audio files, each 30 seconds long.
pip install -r requirements.txt
pip install -e .
Install ffmpeg for working with music files:
- Install on windows for development. (Don't use the essential bundle, only the full) or install with choco
choco install ffmpeg-full
- Download FFmpeg
- Start service
python3.9.exe .\genre_detection\service.py
- Start user interface
python3.9.exe .\genre_detection\ui.py
Version: Ubuntu 22.04.2 LTS (GNU/Linux 5.15.0-72-generic x86_64)
- Clone from Git repository
- Go into repository folder (EasyMLServe)
- Create virtual environment
sudo apt install python3-virtualenv
virtualenv --python python3 venv
- Activate venv
source venv/bin/activate
- Install packages
pip install -r requirements.txt
- Install missing packages on VM (May vary from system to system)
sudo apt install libglu1-mesa
sudo apt install libxkbcommon-x11-0
sudo apt install libgl1
sudo apt install libegl1-mesa
sudo apt install ffmpeg
The service and the website are plain python files (service.py & ui.py) and therefore can simply be started via terminal.
Accessing multiple terminals simultaneously can be achieved with tmux.
- Install tmux
sudo apt install tmux
- Create session
tmux new -s beatbot
- Split session
Press ctrl b and %
- Navigate in both windows to
EasyMLServe
directory - Activate venv in both windows
source venv/bin/activate
- Start service in first window
python genre_detection/service.py
- Start website in second window
python genre_detection/ui.py
- Detach from session
Press ctrl b and d
- Connect to session
tmux attach -t beatbot
- Stop service and ui:
ctrl c
in both windows - Start them again
- Detach from session
Here's the original project where the code was coming from. We adapted a few things to match our needs.
This code is licensed under the MIT License.