Skip to content

Commit

Permalink
Merge branch 'cn84' into 'main'
Browse files Browse the repository at this point in the history
changes to work on cn84

See merge request imc030/tiny-voxceleb-skeleton-2023!3
  • Loading branch information
David van Leeuwen committed Feb 17, 2023
2 parents be7581e + 729f1b2 commit 037c402
Show file tree
Hide file tree
Showing 10 changed files with 49 additions and 72 deletions.
2 changes: 1 addition & 1 deletion cli_evaluate.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@
parser.add_argument(
"--use-gpu",
type=lambda x: x.lower() in ("yes", "true", "t", "1"),
default=False,
default=True,
help="whether to evaluate on a GPU device",
)

Expand Down
2 changes: 1 addition & 1 deletion doc/bootstrap.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
- Form a team of three or four students
- Configure the Science [VPN](https://wiki.cncz.science.ru.nl/Vpn) and connect to it
- Make a [fork](./clone.md#forking-the-repository-on-scienceru-gitlab) of the repository on Science Gitlab to one of your team member's science account, and add the other team members
- Log in to the [compute clusters](cluster.md) machine `slurm22.science.ru.nl`
- Log in to the [compute clusters](cluster.md) machine `cn84.science.ru.nl`
- Set up an [ssh private/public key pair](clone.md#setting-up-an-ssh-key-in-order-to-clone-your-copy-of-the-repo) to access this cloned repository from the science cluster
- [Clone](clone.md#cloning) your private Gitlab repository to the cluster
- [Set up](clone.md#setting-up-links-and-virtual-environments-in-the-cluster) the environment on the cluster
Expand Down
2 changes: 1 addition & 1 deletion doc/clone.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,7 @@ You can repeat this process of adding an ssh-key for each computer from which yo

### Cloning

Now, if you want to clone this repo to the cluster, log on to the cluster node `slurm22` (through VPN or via `lilo`). If you want to clone to a local computer, open a local shell.
Now, if you want to clone this repo to the cluster, log on to the cluster node `cn84` (through VPN or via `lilo`). If you want to clone to a local computer, open a local shell.

You can copy the exact URL for cloning by clicking the _Clone_ button on your own repository:

Expand Down
12 changes: 6 additions & 6 deletions doc/cluster.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

The data science group has a small compute cluster for educational use. We are going to use this for the Speaker Recognition Challenge of the course [MLiP 2023](https://brightspace.ru.nl/d2l/home/333310).

The cluster consists of two _compute nodes_, lovingly named `cn47` and `cn48`, and a so-called _head node_, `slurm22`. All these machines live in the domain `science.ru.nl`, so the head node's fully qualified name is `slurm22.science.ru.nl`.
The cluster consists of two _compute nodes_, lovingly named `cn47` and `cn48`, and a so-called _head node_, `cn84`. All these machines live in the domain `science.ru.nl`, so the head node's fully qualified name is `cn84.science.ru.nl`.

Both compute nodes have the following specifications:
- 8 Nvidia RTX 2080 Ti GPUs, with 11 GB memory
Expand All @@ -20,19 +20,19 @@ You need a [science account](https://wiki.cncz.science.ru.nl/Nieuwe_studenten#.5

These nodes are not directly accessible from the internet, in on order to reach these machines you need to either
- use the science.ru [VPN](https://wiki.cncz.science.ru.nl/Vpn)
- you have direct access to `slurm22`, this is somewhat easier with copying through `scp` and `rsync`, remote editing, etc.
- you have direct access to `cn84`, this is somewhat easier with copying through `scp` and `rsync`, remote editing, etc.
- ```
local+vpn$ ssh $SCIENCE_USERNAME@slurm22.science.ru.nl
local+vpn$ ssh $SCIENCE_USERNAME@cn84.science.ru.nl
```
- login through the machine `lilo.science.ru.nl`
- The preferred way is to use the `ProxyJump` option of ssh:
```
local$ ssh -J [email protected] $SCIENCE_USERNAME@slurm22.science.ru.nl
local$ ssh -J [email protected] $SCIENCE_USERNAME@cn84.science.ru.nl
```
- Alternatively, you can login in two steps. In case you have to transport files, please be reminded only your (small) home filesystem `~` is available on `lilo`.
```
local$ ssh [email protected]
lilo7$ ssh slurm22
lilo7$ ssh cn84
```
Either way, you will be working through a secure-shell connection, so you must have a `ssh` client on your local laptop/computer.
Expand Down Expand Up @@ -71,7 +71,7 @@ It is possible to ask for an interactive shell to one of the compute nodes. Thi
srun --pty --partition csedu --gres gpu:1 /bin/bash
hostname ## we're on cn47 or cn48
nvidia-smi ## it appears there is 1 GPU available in this machine
exit ## make the slot available again, exit to slurm22 again
exit ## make the slot available again, exit to cn84 again
```
In general, we would advice not to use the interactive shell option, as described here, with a GPU and all, unless you need to just do a quick check in a situation where a GPU is required.
Expand Down
7 changes: 3 additions & 4 deletions doc/honour-code.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,10 @@ We want every group to be able to use GPU resources provided in the CSEDU comput
* If you have evidence that you need to train for longer than 12 hours, be fair, and restrict your usage afterwards.
* If you train for longer than 12 hours, make sure that you can argue why this was necessary.
* Use sharded data loading (as implemented in [TinyVoxcelebDataModule](../skeleton/data/tiny_voxceleb.py)), rather than individual file access, wherever you can, to prevent high i/o loads on the network file system.
* Do not run any long-running foreground tasks on the `slurm22` head node.
* The `slurm22` node should only be used to schedule SLURM jobs
* An example of short-running foreground tasks with are OK to run on `slurm22`: manipulation of file-system with `rsync` or `cp`, using `git`, using `srun` or `sbatch`.
* Do not run any long-running foreground tasks on the `cn84` head node.
* The `cn84` node should only be used to schedule SLURM jobs
* An example of short-running foreground tasks with are OK to run on `cn84`: manipulation of file-system with `nano`, `rsync` or `cp`, using `git`, using `tmux`, using `srun` or `sbatch`.
* Example of tasks with which should be submitted as a job: offline data augmentation, compiling a large software project.
* Do not connect to `slurm22` with remote-development features in IDE's like Visual Studio Code and Pycharm.
* Whenever you're using the cluster, use your judgement to make sure that everyone can have access.
## Other rules related to proper evaluation
Expand Down
2 changes: 1 addition & 1 deletion scripts/download_data.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ mkdir -p "$DATA_FOLDER"

# rsync data from cluster to the local data folder
USERNAME=your_username
rsync -P "$SCIENCE_USERNAME"@slurm22.science.ru.nl:/ceph/csedu-scratch/course/IMC030_MLIP/data/data.zip "$DATA_FOLDER"/data.zip
rsync -P "$SCIENCE_USERNAME"@cn84.science.ru.nl:/ceph/csedu-scratch/course/IMC030_MLIP/data/data.zip "$DATA_FOLDER"/data.zip

# now you can unzip, by doing:
# unzip "$DATA_FOLDER"/data.zip $DATA_FOLDER
2 changes: 1 addition & 1 deletion scripts/download_remote_logs.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ fi
mkdir -p "$DATA_FOLDER"

# rsync remote logs
rsync -azP "$SCIENCE_USERNAME"@slurm22.science.ru.nl:/ceph/csedu-scratch/course/IMC030_MLIP/users/"$SCIENCE_USERNAME"/ "$DATA_FOLDER"
rsync -azP "$SCIENCE_USERNAME"@cn84.science.ru.nl:/ceph/csedu-scratch/course/IMC030_MLIP/users/"$SCIENCE_USERNAME"/ "$DATA_FOLDER"
77 changes: 26 additions & 51 deletions scripts/prepare_cluster.sh
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
#! /usr/bin/env bash
set -e

# only run this script on cn84
if [[ "$HOSTNAME" != "cn84" ]]; then
echo "prepare_cluster.sh should only be run on cn84"
fi

# set variable to path where this script is
SCRIPT_DIR="$( cd -- "$( dirname -- "${BASH_SOURCE[0]}" )" &> /dev/null && pwd )"
cd "$SCRIPT_DIR" || exit 1
Expand All @@ -12,68 +17,38 @@ mkdir -p "$CEPH_USER_DIR"/slurm
chmod 700 "$CEPH_USER_DIR" # only you can access
ln -sfn "$CEPH_USER_DIR" "$SCRIPT_DIR"/../logs

# place `~/.cache`, and optionally `~/.local`, in the ceph user directory in order to
# save disk space in $HOME folder
function setup_link {
dest_path=$1
link_path=$2

if [ -L "${link_path}" ] ; then
# link_path exists as a link
if [ -e "${link_path}" ] ; then
# and works
echo "link at $link_path is already setup"
else
# but is broken
echo "link $link_path is broken... Does $dest_path exists?"
return 1
fi
elif [ -e "${link_path}" ] ; then
# link_path exists, but is not a link
mkdir -p "$dest_path"
echo "moving all data in $link_path to $dest_path"
mv "$link_path"/* "$dest_path"/
rmdir "$link_path"
ln -s "$dest_path" "$link_path"
echo "created link $link_path to $dest_path"
else
# link_path does not exist
mkdir -p "$dest_path"
ln -s "$dest_path" "$link_path"

echo "created link $link_path to $dest_path"
fi

return 0
}

# .local is probably not necessary
# setup_link "$CEPH_USER_DIR"/.local ~/.local
setup_link "$CEPH_USER_DIR"/.cache ~/.cache

# make a symlink to the data in order to directly access it from the root of the project
ln -sfn /ceph/csedu-scratch/course/IMC030_MLIP/data "$SCRIPT_DIR"/../data

# if .cache or .local is a symlink, remove it
# this is temporary for students of MLIP 2023
if [[ -L "$HOME/.cache" ]]; then
rm "$HOME"/.cache
fi
if [[ -L "$HOME/.local" ]]; then
rm "$HOME"/.cache
fi

# make sure pip doesn't cache results
if ! grep -q "export PIP_NO_CACHE_DIR=" ~/.profile ; then
{
echo ""
echo "### disable pip caching downloads"
echo "export PIP_NO_CACHE_DIR=off"
} >> ~/.profile
fi

# set up a virtual environment located at
# /scratch/$USER/virtual_environments/tiny-voxceleb-venv
# and make a symlink to the virtual environment
# at the root directory of this project called "venv"
# uncommented this if you need the venv on the head node (but no space on slurm22)
# echo "### SETTING UP VIRTUAL ENVIRONMENT ON SLURM22 ###"
# uncomment this if you need a virtual environment on cn84
# ./setup_virtual_environment.sh

# make sure that there's also a virtual environment
# on the GPU nodes
echo "### SETTING UP VIRTUAL ENVIRONMENT ON CN47 ###"
ssh cn47 "
source .profile
cd $PWD;
./setup_virtual_environment.sh
"
srun -p csedu-prio -A cseduimc030 -q csedu-small -w cn47 ./setup_virtual_environment.sh

echo "### SETTING UP VIRTUAL ENVIRONMENT ON CN48 ###"
ssh cn48 "
source .profile
cd $PWD;
./setup_virtual_environment.sh
"
srun -p csedu-prio -A cseduimc030 -q csedu-small -w cn48 ./setup_virtual_environment.sh
13 changes: 8 additions & 5 deletions scripts/setup_virtual_environment.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,12 @@ PROJECT_DIR="$(dirname "$SCRIPT_DIR")"
# We make sure a valid directory to store virtual environments exists
# under the path /scratch/YOUR_USERNAME/virtual_environments
#
# If you call this script on your local computer (e.g, hostname != slurm22, cn47 or cn48)
# If you call this script on your local computer (e.g, hostname != cn84, cn47 or cn48)
# the virtual environment will just be created in the root directory of this project.

if [[ "$HOSTNAME" != "slurm"* && "$HOSTNAME" != "cn"* ]]; then
if [[ "$HOSTNAME" == "slurm"* ]]; then
echo "don't run this script on slurm22"
return 1 2> /dev/null || exit 1
elif [[ "$HOSTNAME" != "cn"* ]]; then
VENV_DIR=$PROJECT_DIR/venv
else
VENV_DIR=/scratch/$USER/virtual_environments/tiny-voxceleb-venv
Expand All @@ -35,5 +37,6 @@ fi

# install the dependencies
source "$VENV_DIR"/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install -r "$PROJECT_DIR"/requirements.txt
PIP_NO_CACHE_DIR=off python3 -m pip install --upgrade pip
PIP_NO_CACHE_DIR=off python3 -m pip install wheel
PIP_NO_CACHE_DIR=off python3 -m pip install -r "$PROJECT_DIR"/requirements.txt
2 changes: 1 addition & 1 deletion skeleton/evaluation/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ def load_evaluation_pairs(file_path: pathlib.Path):
# implementation of evalauting a trial list with cosine-distance


def evaluate_speaker_trails(
def evaluate_speaker_trials(
trials: List[EvaluationPair],
embeddings: List[EmbeddingSample],
skip_eer: bool = False,
Expand Down

0 comments on commit 037c402

Please sign in to comment.