Skip to content

Commit

Permalink
Merge branch 'main' of gitlab.science.ru.nl:lnguyen/tiny-voxceleb-ske…
Browse files Browse the repository at this point in the history
…leton-2023
  • Loading branch information
anilsson committed Mar 8, 2023
2 parents 2ab1cc8 + 20c8791 commit 6e7f22e
Show file tree
Hide file tree
Showing 19 changed files with 219 additions and 110 deletions.
5 changes: 2 additions & 3 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
# In this file, we store environment variables which are used in
# scripts throughout this project

SCIENCE_USERNAME=put_your_science_username_here

# '/home/$USERNAME/tiny-voxceleb-skeleton' is just a guess, can be some other value
DATA_FOLDER=/home/$SCIENCE_USERNAME/tiny-voxceleb-skeleton/data
# this path is just a guess, can be some other value
DATA_FOLDER=/home/$SCIENCE_USERNAME/mlip/tiny-voxceleb-skeleton-2023/data
6 changes: 3 additions & 3 deletions cli_evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@
from skeleton.models.prototype import PrototypeSpeakerRecognitionModule

from skeleton.evaluation.evaluation import (
evaluate_speaker_trails,
evaluate_speaker_trials,
EmbeddingSample,
load_evaluation_pairs,
)
Expand Down Expand Up @@ -69,7 +69,7 @@
parser.add_argument(
"--use-gpu",
type=lambda x: x.lower() in ("yes", "true", "t", "1"),
default=False,
default=True,
help="whether to evaluate on a GPU device",
)

Expand Down Expand Up @@ -161,7 +161,7 @@ def main(

# for each trial, compute scores based on cosine similarity between
# speaker embeddings
results = evaluate_speaker_trails(pairs, embeddings, skip_eer=True)
results = evaluate_speaker_trials(pairs, embeddings, skip_eer=True)
scores = results['scores']

# write each trial (with computed score) to a file
Expand Down
10 changes: 5 additions & 5 deletions doc/bootstrap.md
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
## Bootstrapping the first challenge in MLIP

- Form a team of three or four students
- Make a [fork](#forking-the-repository-on-scienceru-gitlab) of the repository on Science Gitlab to one of your team member's science account, and add the other team members
- Configure the Science [VPN](https://wiki.cncz.science.ru.nl/Vpn)
- Log in to the [compute clusters](cluster.md) machine `slurm22.science.ru.nl`
- Set up an [ssh private/public key pair](clone.md#etting-up-an-ssh-key-in-order-to-clone-your-copy-of-the-repo) to access this cloned repository from the science cluster
- Configure the Science [VPN](https://wiki.cncz.science.ru.nl/Vpn) and connect to it
- Make a [fork](./clone.md#forking-the-repository-on-scienceru-gitlab) of the repository on Science Gitlab to one of your team member's science account, and add the other team members
- Log in to the [compute clusters](cluster.md) machine `cn84.science.ru.nl`
- Set up an [ssh private/public key pair](clone.md#setting-up-an-ssh-key-in-order-to-clone-your-copy-of-the-repo) to access this cloned repository from the science cluster
- [Clone](clone.md#cloning) your private Gitlab repository to the cluster
- [Set up](clone.md##setting-up-links-and-virtual-environments-in-the-cluster) the environment on the cluster
- [Set up](clone.md#setting-up-links-and-virtual-environments-in-the-cluster) the environment on the cluster
- Submit your first [SLURM job](cluster.md#queuing-slurm-jobs)
- Study the code in the [skeleton](skeleton.md), perhaps make some trivial changes
- Submit your first speaker recognition [training](skeleton.md#training-the-basic-network) SLURM job with [sbatch](cluster.md#more-advanced-slurm-scripts)
Expand Down
18 changes: 12 additions & 6 deletions doc/clone.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Forking and cloning this repository
## Setting up the cluster environment

It is handy to have a central repository for the code that your team is working on. You can easily do this by _forking_ the repo directly from science.ru gitlab.

Expand All @@ -20,12 +20,20 @@ Each member, before they can clone this forked repo to their local computer or t
### Setting up an SSH key in order to clone your copy of the repo

First, if you have never done so for the machine you're working on (your local computer or the cluster), generate a public/private key pair:

```
$ ssh-keygen
## and hit <return> a few times
```

Also put the newly generated key in `~/.ssh/authorized_keys`.

```
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
```

Then, print your public key in the terminal and copy it to the clipboard:

```
$ cat ~/.ssh/id_rsa.pub
```
Expand Down Expand Up @@ -55,7 +63,7 @@ You can repeat this process of adding an ssh-key for each computer from which yo

### Cloning

Now, if you want to clone this repo to the cluster, log on to the cluster node `slurm22` (through VPN or via `lilo`). If you want to clone to a local computer, open a local shell.
Now, if you want to clone this repo to the cluster, log on to the cluster node `cn84` (through VPN or via `lilo`). If you want to clone to a local computer, open a local shell.

You can copy the exact URL for cloning by clicking the _Clone_ button on your own repository:

Expand Down Expand Up @@ -90,21 +98,19 @@ while the remote `upstream` points to the original, skeleton code you forked. Yo
git remote -v
```


### Setting up links and virtual environments in the cluster

If everything is all right, you have a reasonable clean set of files and directories upon first checkout (we will from now on drop the command prompt `$` in the example code):
```bash
ls
```
Now run the script for setting up the virtual environment and links to various places where data is / will be stored. This script will take several minutes to complete:

```bash
scripts/prepare_cluster.sh
./scripts/prepare_cluster.sh
ls -l
```
You will see the soft links made to
- `data`, where the audio data is stored,
- `logs`, where results and log outputs of your scripts are stored
- `venv`, the python virtual environment that has been copied to local discs on cluster nodes `cn47` and `cn48`.


28 changes: 16 additions & 12 deletions doc/cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The data science group has a small compute cluster for educational use. We are going to use this for the Speaker Recognition Challenge of the course [MLiP 2023](https://brightspace.ru.nl/d2l/home/333310).

The cluster consists of two _compute nodes_, lovingly named `cn47` and `cn48`, and a so-called _head node_, `slurm22`. All these machines live in the domain `science.ru.nl`, so the head node's fully qualified name is `slurm22.science.ru.nl`.
The cluster consists of two _compute nodes_, lovingly named `cn47` and `cn48`, and a so-called _head node_, `cn84`. All these machines live in the domain `science.ru.nl`, so the head node's fully qualified name is `cn84.science.ru.nl`.

Both compute nodes have the following specifications:
- 8 Nvidia RTX 2080 Ti GPUs, with 11 GB memory
Expand All @@ -14,23 +14,25 @@ The head node has the same OS installed as the compute nodes, but does not have
- simple editing and file manipulation
- submitting jobs to the compute nodes and controlling these jobs

### accessing the cluster

You need a [science account](https://wiki.cncz.science.ru.nl/Nieuwe_studenten#.5BScience_login_.28vachternaam.29_.5D.5BScience_login_.28isurname.29.5D) in order to be able to log into the cluster.

These nodes are not directly accessible from the internet, in on order to reach these machines you need to either
- use the science.ru [VPN](https://wiki.cncz.science.ru.nl/Vpn)
- you have direct access to `slurm22`, this is somewhat easier with copying through `scp` and `rsync`, remote editing, etc.
- you have direct access to `cn84`, this is somewhat easier with copying through `scp` and `rsync`, remote editing, etc.
- ```
local+vpn$ ssh slurm22
local+vpn$ ssh [email protected]
```
- login through the machine `lilo.science.ru.nl`.
- login through the machine `lilo.science.ru.nl`
- The preferred way is to use the `ProxyJump` option of ssh:
```
local$ ssh -J [email protected] [email protected]
```
- Alternatively, you can login in two steps. In case you have to transport files, please be reminded only your (small) home filesystem `~` is available on `lilo`.
```
local$ ssh -J lilo.science.ru.nl cn99
```
- Alternatively, you can login in two steps. In case you have to transport files, please be reminded only your (small) home filesystem `~` is available on `lilo`.
- ```
local$ ssh lilo.science.ru.nl
lilo7$ ssh slurm22
local$ ssh [email protected]
lilo7$ ssh cn84
```
Either way, you will be working through a secure-shell connection, so you must have a `ssh` client on your local laptop/computer.
Expand All @@ -54,7 +56,7 @@ The limitations on the home filesystem, `~` (a.k.a. `$HOME`) are pretty tight---
### Forking and cloning the repository
Before you can carry out the instructions below properly, you need to fork this repository on Gitlab, and check out a clone on your home directory on the cluster You can follow the [instructions here](./clone.md).
Before you can carry out the instructions below properly, you need to fork this repository on Gitlab, check out a clone on your home directory on the cluster, and setup the environment. You can follow the [instructions here](./clone.md).
## SLURM
Expand All @@ -69,9 +71,10 @@ It is possible to ask for an interactive shell to one of the compute nodes. Thi
srun --pty --partition csedu --gres gpu:1 /bin/bash
hostname ## we're on cn47 or cn48
nvidia-smi ## it appears there is 1 GPU available in this machine
exit ## make the slot available again, exit to slurm22 again
exit ## make the slot available again, exit to cn84 again
```
In general, we would advice not to use the interactive shell option, as described here, with a GPU and all, unless you need to just do a quick check in a situation where a GPU is required.
### Queuing slurm jobs
The normal way of working on the cluster is by submitting a batch job. This consists of several components:
Expand Down Expand Up @@ -111,6 +114,7 @@ The following `#SBATCH` options are in this example:
- `--output=./logs/slurm/%J.out`: The place were the stdout is collected. `%J` refers to the job ID.
- `--error=./logs/slurm/%J.err`: This is where stderr is collected
- `--mail-type=BEGIN,END,FAIL`: specify that we want a mail message sent to our science account email at the start and finish, and in case of a failed job.
- `--qos=csedu-normal`: This specifies that your job can run for at most 12 hours. If you want to run a job which can run for at most 48 hours, you can use `qos=csedu-large`, but you will have decreased priority.

When you are ready for it, you can run your first [skeleton speaker recognition](./skeleton.md) training job. The options in the command-line training script are explained [here](./skeleton.md), here we will show you how to submit the job in slurm. Beware: completing the training takes several hours, even with this [minimalistic neural network](../skeleton/models/prototype.py#L124-126).

Expand Down
14 changes: 7 additions & 7 deletions doc/honour-code.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,23 +15,23 @@ In order to keep this project fun but also educational, we ask you to respect th
We want every group to be able to use GPU resources provided in the CSEDU compute cluster. Therefore, we ask everyone to honour these rules:

* Do **not** `ssh` into `cn47` and/or `cn48` directly to run any experiments. This leads to other programs crashing due to e.g. out of memory errors, and this way the resources on the cluster cannot be allocated (fairly).
* Your group should only have one running job at a time.
* Your group should only have one long-running job at a time.
* If you submit an [array job](https://slurm.schedmd.com/job_array.html), you must use a parallelism of `1`, by using `%1` in e.g. `#SBATCH --array=0-4%1`.
* Your jobs can use a maximum of 6 CPUs, 16 GB memory, and 1 GPU.
* Your jobs can use a maximum of 6 CPUs, 15 GB memory, and 1 GPU.
* This can be controlled with the `SBATCH` parameters below
```
#SBATCH --gres=gpu:1 # this value may not exceed 1
#SBATCH --mem=10G # this value may not exceed 16
#SBATCH --mem=10G # this value may not exceed 15
#SBATCH --cpus-per-task=6 # this value may not exceed 6
```
* Your jobs time-out after at most 24 hours. However, we ask everyone to **aim** for a maximum of 12 hours for most jobs.
* Your jobs time-out after at most 48 hours. However, we ask everyone to **aim** for a maximum of 12 hours for most jobs.
* This can be controlled with `#SBATCH --time=12:00:00`
* If you have evidence that you need to train for longer than 12 hours, be fair, and restrict your usage afterwards.
* If you train for longer than 12 hours, make sure that you can argue why this was necessary.
* Use sharded data loading (as implemented in [TinyVoxcelebDataModule](../skeleton/data/tiny_voxceleb.py)), rather than individual file access, wherever you can, to prevent high i/o loads on the network file system.
* Do not run any long-running foreground tasks on the `slurm22` head node.
* The `slurm22` node should only be used to schedule SLURM jobs
* An example of short-running foreground tasks with are OK to run on `slurm22`: manipulation of file-system with `rsync` or `cp`, using `git`, using `srun` or `sbatch`.
* Do not run any long-running foreground tasks on the `cn84` head node.
* The `cn84` node should only be used to schedule SLURM jobs
* An example of short-running foreground tasks with are OK to run on `cn84`: manipulation of file-system with `nano`, `rsync` or `cp`, using `git`, using `tmux`, using `srun` or `sbatch`.
* Example of tasks with which should be submitted as a job: offline data augmentation, compiling a large software project.
* Whenever you're using the cluster, use your judgement to make sure that everyone can have access.
Expand Down
5 changes: 4 additions & 1 deletion doc/skeleton.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
## The skeleton code for tiny-voxceleb

We have provided some skeleton code to get you started training and evaluating on the tiny-voxceleb data.
We have provided some skeleton code to get you started training and evaluating on the tiny-voxceleb data. Note that the instructions below are intended for running the code on
your own machine. For running the code on the cluster, read [these instructions](cluster.md).

### setting up the environment variables

Expand Down Expand Up @@ -217,6 +218,8 @@ You can look at `experiments/experiment_1_local.sh` for an example script which

### Evaluating a network

**PLEASE NOTE** For running on the CSEDU compute cluster, you need to submit the commands below as batch jobs, see [the cluster documentation](./cluster.md) for further details. This section describes the working of the python command line evaluation script.

We have an evaluation script which you can use to compute score lists on the dev and eval test set:

```
Expand Down
1 change: 1 addition & 0 deletions doc/speaker-recognition.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ Apart from working on basic disciriminability of speakers, a lot of performance
- Neural embeddings from raw waweforms instead of MFCCs: [Wav2Spk: A Simple DNN Architecture for Learning Speaker Embeddings from Waveforms](./papers/wav2spk.pdf), Weiwei Lin and Man-Wai Mak, Proc. Interspeech 2020, 3211-3215, doi: 10.21437/Interspeech.2020-1287
- Recent state-of-the-art model: [ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification](./papers/ecapa_tdnn.pdf), Brecht Desplanques, Jenthe Thienpondt, Kris Demuynck, Proc. Interspeech 2020, 3830-3834, doi: 10.21437/Interspeech.2020-2650
- Recent comparison of loss functions: [In Defence of Metric Learning for Speaker Recognition](./papers/metric_learning.pdf), Joon Son Chung et al, Proc. Interspeech 2020, 2977-2981, doi: 10.21437/Interspeech.2020-1064
- Speaker recognition with small datasets: [Training speaker recognition systems with limited data](./papers/training_with_limited_data.pdf) Nik Vaessen and David A. van Leeuwen, 2022, Proc. Interspeech 2022, 4760-4764
- I-Vectors: the state of the art for many years, a form of embeddings avant-la-lettre: [Front-End Factor Analysis for Speaker Verification](./papers/najim-ivector-taslp-2009.pdf) Dehak, N. and Kenny, P. J. and Dehak, R. and Dumouchel, P. and Ouellet, P., [IEEE Trans. on Audio, Speech and Language Processing](http://ieeexplore.ieee.org/document/5545402), vol. 19, no. 4, pp. 788-798, May 2011, doi: 10.1109/TASL.2010.2064307.
- An introduction to calibration: [An Introduction to Application-Independent Evaluation of Speaker Recognition Systems](./papers/appindepeval-lnai-2007.pdf) David A. van Leeuwen and Niko Brümmer, 2007, In: Müller C. (eds) [Speaker Classification I.](https://link.springer.com/chapter/10.1007%2F978-3-540-74200-5_19) Lecture Notes in Computer Science, vol 4343. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74200-5_19
- An older overview: [A Tutorial on Text-Independent Speaker Verification](./papers/bimbot-overview.pdf), Bimbot, F., Bonastre, JF., Fredouille, C. et al. [EURASIP J. Adv. Signal Process. 2004](https://asp-eurasipjournals.springeropen.com/articles/10.1155/S1110865704310024), 101962 (2004).
Expand Down
2 changes: 2 additions & 0 deletions experiments/experiment_1_cluster.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
#!/usr/bin/env bash
#SBATCH --partition=csedu
#SBATCH --account=cseduimc030
#SBATCH --qos=csedu-normal
#SBATCH --gres=gpu:1
#SBATCH --mem=10G
#SBATCH --cpus-per-task=6
Expand Down
7 changes: 4 additions & 3 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,14 +1,15 @@
# libraries for the heavy-duty neural network stuff
-f https://download.pytorch.org/whl/torch_stable.html
torch==1.13.1+cu116
torchaudio==0.13.1+cu116
torch==1.13.1+cu117
torchaudio==0.13.1+cu117
torchdata==0.5.1
pytorch-lightning==1.9.0
torchmetrics==0.11.0
tensorboard==2.11.2
torchvision==0.14.1

# always useful in any data science project (but not required)
numpy
numpy==1.21.5
scikit-learn
matplotlib
jupyterlab
Expand Down
2 changes: 1 addition & 1 deletion scripts/download_data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ mkdir -p "$DATA_FOLDER"

# rsync data from cluster to the local data folder
USERNAME=your_username
rsync -P "$SCIENCE_USERNAME"@slurm22.science.ru.nl:/ceph/csedu-scratch/course/IMC030_MLIP/data/data.zip "$DATA_FOLDER"/data.zip
rsync -P "$SCIENCE_USERNAME"@cn84.science.ru.nl:/ceph/csedu-scratch/course/IMC030_MLIP/data/data.zip "$DATA_FOLDER"/data.zip

# now you can unzip, by doing:
# unzip "$DATA_FOLDER"/data.zip $DATA_FOLDER
2 changes: 1 addition & 1 deletion scripts/download_remote_logs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ fi
mkdir -p "$DATA_FOLDER"

# rsync remote logs
rsync -azP "$SCIENCE_USERNAME"@slurm22.science.ru.nl:/ceph/csedu-scratch/course/IMC030_MLIP/users/"$SCIENCE_USERNAME"/ "$DATA_FOLDER"
rsync -azP "$SCIENCE_USERNAME"@cn84.science.ru.nl:/ceph/csedu-scratch/course/IMC030_MLIP/users/"$SCIENCE_USERNAME"/ "$DATA_FOLDER"
Loading

0 comments on commit 6e7f22e

Please sign in to comment.