diff --git a/cli_evaluate.py b/cli_evaluate.py index fc41a25..82e39d3 100755 --- a/cli_evaluate.py +++ b/cli_evaluate.py @@ -69,7 +69,7 @@ parser.add_argument( "--use-gpu", type=lambda x: x.lower() in ("yes", "true", "t", "1"), - default=False, + default=True, help="whether to evaluate on a GPU device", ) diff --git a/doc/bootstrap.md b/doc/bootstrap.md index 1a106b5..c1578c5 100644 --- a/doc/bootstrap.md +++ b/doc/bootstrap.md @@ -3,7 +3,7 @@ - Form a team of three or four students - Configure the Science [VPN](https://wiki.cncz.science.ru.nl/Vpn) and connect to it - Make a [fork](./clone.md#forking-the-repository-on-scienceru-gitlab) of the repository on Science Gitlab to one of your team member's science account, and add the other team members - - Log in to the [compute clusters](cluster.md) machine `slurm22.science.ru.nl` + - Log in to the [compute clusters](cluster.md) machine `cn84.science.ru.nl` - Set up an [ssh private/public key pair](clone.md#setting-up-an-ssh-key-in-order-to-clone-your-copy-of-the-repo) to access this cloned repository from the science cluster - [Clone](clone.md#cloning) your private Gitlab repository to the cluster - [Set up](clone.md#setting-up-links-and-virtual-environments-in-the-cluster) the environment on the cluster diff --git a/doc/clone.md b/doc/clone.md index 772d00e..8b02e90 100644 --- a/doc/clone.md +++ b/doc/clone.md @@ -63,7 +63,7 @@ You can repeat this process of adding an ssh-key for each computer from which yo ### Cloning -Now, if you want to clone this repo to the cluster, log on to the cluster node `slurm22` (through VPN or via `lilo`). If you want to clone to a local computer, open a local shell. +Now, if you want to clone this repo to the cluster, log on to the cluster node `cn84` (through VPN or via `lilo`). If you want to clone to a local computer, open a local shell. You can copy the exact URL for cloning by clicking the _Clone_ button on your own repository: diff --git a/doc/cluster.md b/doc/cluster.md index 5f5039d..4438005 100644 --- a/doc/cluster.md +++ b/doc/cluster.md @@ -2,7 +2,7 @@ The data science group has a small compute cluster for educational use. We are going to use this for the Speaker Recognition Challenge of the course [MLiP 2023](https://brightspace.ru.nl/d2l/home/333310). -The cluster consists of two _compute nodes_, lovingly named `cn47` and `cn48`, and a so-called _head node_, `slurm22`. All these machines live in the domain `science.ru.nl`, so the head node's fully qualified name is `slurm22.science.ru.nl`. +The cluster consists of two _compute nodes_, lovingly named `cn47` and `cn48`, and a so-called _head node_, `cn84`. All these machines live in the domain `science.ru.nl`, so the head node's fully qualified name is `cn84.science.ru.nl`. Both compute nodes have the following specifications: - 8 Nvidia RTX 2080 Ti GPUs, with 11 GB memory @@ -20,19 +20,19 @@ You need a [science account](https://wiki.cncz.science.ru.nl/Nieuwe_studenten#.5 These nodes are not directly accessible from the internet, in on order to reach these machines you need to either - use the science.ru [VPN](https://wiki.cncz.science.ru.nl/Vpn) - - you have direct access to `slurm22`, this is somewhat easier with copying through `scp` and `rsync`, remote editing, etc. + - you have direct access to `cn84`, this is somewhat easier with copying through `scp` and `rsync`, remote editing, etc. - ``` - local+vpn$ ssh $SCIENCE_USERNAME@slurm22.science.ru.nl + local+vpn$ ssh $SCIENCE_USERNAME@cn84.science.ru.nl ``` - login through the machine `lilo.science.ru.nl` - The preferred way is to use the `ProxyJump` option of ssh: ``` - local$ ssh -J $SCIENCE_USERNAME@lilo.science.ru.nl $SCIENCE_USERNAME@slurm22.science.ru.nl + local$ ssh -J $SCIENCE_USERNAME@lilo.science.ru.nl $SCIENCE_USERNAME@cn84.science.ru.nl ``` - Alternatively, you can login in two steps. In case you have to transport files, please be reminded only your (small) home filesystem `~` is available on `lilo`. ``` local$ ssh $SCIENCE_USERNAME@lilo.science.ru.nl - lilo7$ ssh slurm22 + lilo7$ ssh cn84 ``` Either way, you will be working through a secure-shell connection, so you must have a `ssh` client on your local laptop/computer. @@ -71,7 +71,7 @@ It is possible to ask for an interactive shell to one of the compute nodes. Thi srun --pty --partition csedu --gres gpu:1 /bin/bash hostname ## we're on cn47 or cn48 nvidia-smi ## it appears there is 1 GPU available in this machine -exit ## make the slot available again, exit to slurm22 again +exit ## make the slot available again, exit to cn84 again ``` In general, we would advice not to use the interactive shell option, as described here, with a GPU and all, unless you need to just do a quick check in a situation where a GPU is required. diff --git a/doc/honour-code.md b/doc/honour-code.md index 44bb8f4..aa3fcc5 100644 --- a/doc/honour-code.md +++ b/doc/honour-code.md @@ -29,11 +29,10 @@ We want every group to be able to use GPU resources provided in the CSEDU comput * If you have evidence that you need to train for longer than 12 hours, be fair, and restrict your usage afterwards. * If you train for longer than 12 hours, make sure that you can argue why this was necessary. * Use sharded data loading (as implemented in [TinyVoxcelebDataModule](../skeleton/data/tiny_voxceleb.py)), rather than individual file access, wherever you can, to prevent high i/o loads on the network file system. -* Do not run any long-running foreground tasks on the `slurm22` head node. - * The `slurm22` node should only be used to schedule SLURM jobs - * An example of short-running foreground tasks with are OK to run on `slurm22`: manipulation of file-system with `rsync` or `cp`, using `git`, using `srun` or `sbatch`. +* Do not run any long-running foreground tasks on the `cn84` head node. + * The `cn84` node should only be used to schedule SLURM jobs + * An example of short-running foreground tasks with are OK to run on `cn84`: manipulation of file-system with `nano`, `rsync` or `cp`, using `git`, using `tmux`, using `srun` or `sbatch`. * Example of tasks with which should be submitted as a job: offline data augmentation, compiling a large software project. - * Do not connect to `slurm22` with remote-development features in IDE's like Visual Studio Code and Pycharm. * Whenever you're using the cluster, use your judgement to make sure that everyone can have access. ## Other rules related to proper evaluation diff --git a/scripts/download_data.sh b/scripts/download_data.sh index 8f1c0c6..e5bad27 100755 --- a/scripts/download_data.sh +++ b/scripts/download_data.sh @@ -16,7 +16,7 @@ mkdir -p "$DATA_FOLDER" # rsync data from cluster to the local data folder USERNAME=your_username -rsync -P "$SCIENCE_USERNAME"@slurm22.science.ru.nl:/ceph/csedu-scratch/course/IMC030_MLIP/data/data.zip "$DATA_FOLDER"/data.zip +rsync -P "$SCIENCE_USERNAME"@cn84.science.ru.nl:/ceph/csedu-scratch/course/IMC030_MLIP/data/data.zip "$DATA_FOLDER"/data.zip # now you can unzip, by doing: # unzip "$DATA_FOLDER"/data.zip $DATA_FOLDER diff --git a/scripts/download_remote_logs.sh b/scripts/download_remote_logs.sh index 64ec61b..a234e39 100755 --- a/scripts/download_remote_logs.sh +++ b/scripts/download_remote_logs.sh @@ -15,4 +15,4 @@ fi mkdir -p "$DATA_FOLDER" # rsync remote logs -rsync -azP "$SCIENCE_USERNAME"@slurm22.science.ru.nl:/ceph/csedu-scratch/course/IMC030_MLIP/users/"$SCIENCE_USERNAME"/ "$DATA_FOLDER" \ No newline at end of file +rsync -azP "$SCIENCE_USERNAME"@cn84.science.ru.nl:/ceph/csedu-scratch/course/IMC030_MLIP/users/"$SCIENCE_USERNAME"/ "$DATA_FOLDER" \ No newline at end of file diff --git a/scripts/prepare_cluster.sh b/scripts/prepare_cluster.sh index e905031..a28dcb0 100755 --- a/scripts/prepare_cluster.sh +++ b/scripts/prepare_cluster.sh @@ -1,6 +1,11 @@ #! /usr/bin/env bash set -e +# only run this script on cn84 +if [[ "$HOSTNAME" != "cn84" ]]; then + echo "prepare_cluster.sh should only be run on cn84" +fi + # set variable to path where this script is SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )" cd "$SCRIPT_DIR" || exit 1 @@ -12,68 +17,38 @@ mkdir -p "$CEPH_USER_DIR"/slurm chmod 700 "$CEPH_USER_DIR" # only you can access ln -sfn "$CEPH_USER_DIR" "$SCRIPT_DIR"/../logs -# place `~/.cache`, and optionally `~/.local`, in the ceph user directory in order to -# save disk space in $HOME folder -function setup_link { - dest_path=$1 - link_path=$2 - - if [ -L "${link_path}" ] ; then - # link_path exists as a link - if [ -e "${link_path}" ] ; then - # and works - echo "link at $link_path is already setup" - else - # but is broken - echo "link $link_path is broken... Does $dest_path exists?" - return 1 - fi - elif [ -e "${link_path}" ] ; then - # link_path exists, but is not a link - mkdir -p "$dest_path" - echo "moving all data in $link_path to $dest_path" - mv "$link_path"/* "$dest_path"/ - rmdir "$link_path" - ln -s "$dest_path" "$link_path" - echo "created link $link_path to $dest_path" - else - # link_path does not exist - mkdir -p "$dest_path" - ln -s "$dest_path" "$link_path" - - echo "created link $link_path to $dest_path" - fi - - return 0 -} - -# .local is probably not necessary -# setup_link "$CEPH_USER_DIR"/.local ~/.local -setup_link "$CEPH_USER_DIR"/.cache ~/.cache - # make a symlink to the data in order to directly access it from the root of the project ln -sfn /ceph/csedu-scratch/course/IMC030_MLIP/data "$SCRIPT_DIR"/../data +# if .cache or .local is a symlink, remove it +# this is temporary for students of MLIP 2023 +if [[ -L "$HOME/.cache" ]]; then + rm "$HOME"/.cache +fi +if [[ -L "$HOME/.local" ]]; then + rm "$HOME"/.cache +fi + +# make sure pip doesn't cache results +if ! grep -q "export PIP_NO_CACHE_DIR=" ~/.profile ; then +{ +echo "" +echo "### disable pip caching downloads" +echo "export PIP_NO_CACHE_DIR=off" +} >> ~/.profile +fi + # set up a virtual environment located at # /scratch/$USER/virtual_environments/tiny-voxceleb-venv # and make a symlink to the virtual environment # at the root directory of this project called "venv" -# uncommented this if you need the venv on the head node (but no space on slurm22) -# echo "### SETTING UP VIRTUAL ENVIRONMENT ON SLURM22 ###" +# uncomment this if you need a virtual environment on cn84 # ./setup_virtual_environment.sh # make sure that there's also a virtual environment # on the GPU nodes echo "### SETTING UP VIRTUAL ENVIRONMENT ON CN47 ###" -ssh cn47 " - source .profile - cd $PWD; - ./setup_virtual_environment.sh -" +srun -p csedu-prio -A cseduimc030 -q csedu-small -w cn47 ./setup_virtual_environment.sh echo "### SETTING UP VIRTUAL ENVIRONMENT ON CN48 ###" -ssh cn48 " - source .profile - cd $PWD; - ./setup_virtual_environment.sh -" +srun -p csedu-prio -A cseduimc030 -q csedu-small -w cn48 ./setup_virtual_environment.sh diff --git a/scripts/setup_virtual_environment.sh b/scripts/setup_virtual_environment.sh index 40551af..116eecc 100755 --- a/scripts/setup_virtual_environment.sh +++ b/scripts/setup_virtual_environment.sh @@ -12,10 +12,12 @@ PROJECT_DIR="$(dirname "$SCRIPT_DIR")" # We make sure a valid directory to store virtual environments exists # under the path /scratch/YOUR_USERNAME/virtual_environments # -# If you call this script on your local computer (e.g, hostname != slurm22, cn47 or cn48) +# If you call this script on your local computer (e.g, hostname != cn84, cn47 or cn48) # the virtual environment will just be created in the root directory of this project. - -if [[ "$HOSTNAME" != "slurm"* && "$HOSTNAME" != "cn"* ]]; then +if [[ "$HOSTNAME" == "slurm"* ]]; then + echo "don't run this script on slurm22" + return 1 2> /dev/null || exit 1 +elif [[ "$HOSTNAME" != "cn"* ]]; then VENV_DIR=$PROJECT_DIR/venv else VENV_DIR=/scratch/$USER/virtual_environments/tiny-voxceleb-venv @@ -35,5 +37,6 @@ fi # install the dependencies source "$VENV_DIR"/bin/activate -python3 -m pip install --upgrade pip -python3 -m pip install -r "$PROJECT_DIR"/requirements.txt +PIP_NO_CACHE_DIR=off python3 -m pip install --upgrade pip +PIP_NO_CACHE_DIR=off python3 -m pip install wheel +PIP_NO_CACHE_DIR=off python3 -m pip install -r "$PROJECT_DIR"/requirements.txt diff --git a/skeleton/evaluation/evaluation.py b/skeleton/evaluation/evaluation.py index 4733750..53797c1 100644 --- a/skeleton/evaluation/evaluation.py +++ b/skeleton/evaluation/evaluation.py @@ -83,7 +83,7 @@ def load_evaluation_pairs(file_path: pathlib.Path): # implementation of evalauting a trial list with cosine-distance -def evaluate_speaker_trails( +def evaluate_speaker_trials( trials: List[EvaluationPair], embeddings: List[EmbeddingSample], skip_eer: bool = False,