Skip to content

Getting Started

Kashu Yamazaki edited this page Mar 11, 2024 · 14 revisions

SSH Configuration

1. generate private and public keys

ssh-keygen

id_rsa is the private key and id_rsa.pub is the public key.

2. ssh config

Open ~/.ssh/config and add the Host lists like the following. All servers in the lab can be accessed via turing outside the campus network. Change the [uark_name], [server_name], and [user_name] accordingly.

Host turing
  HostName turing.csce.uark.edu
  User [uark_name]
  IdentityFile ~/.ssh/id_rsa

Host [server_name]
  HostName [server_name].ddns.uark.edu
  User [user_name]
  IdentityFile ~/.ssh/id_rsa
  ProxyCommand ssh turing -W %h:%p

3. copy the public key to the remote server

  • Linux shell
ssh-copy-id usr_name@host
  • Windows Powershell
type $env:USERPROFILE\.ssh\id_rsa.pub | ssh host "cat >> .ssh/authorized_keys"

Configure the Environment

Preparing Docker Image (on local machine)

1. Write a Dockerfile

Create a Dockerfile for your project. Below is an example.

Click to view the example of Dockerfile
# Specify the base image: you can explore docker hub.
# For pytorch: https://hub.docker.com/r/pytorch/pytorch/tags
FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime

ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=US/Pacific

# Install dependencies and command-line tools.
RUN apt-get update && apt-get install -y \
    build-essential \
    cmake \
    curl \
    g++ \
    wget \
    bzip2 \
    git \
    vim \
    tmux \
    htop \
    git \
    zip \
    unzip \
    ca-certificates \
    libosmesa6-dev \
    libgl1-mesa-glx \
    libglfw3 \
    patchelf \
    libglu1-mesa \
    libxext6 \
    libxtst6 \
    libxrender1 \
    libxi6 \
    libjpeg-dev \
    libpng-dev \
    libopenblas-dev \
    libopencv-dev \
    libyaml-dev \
    libavformat-dev \
    libavcodec-dev \
    libswscale-dev \
    libavutil-dev \
    libavfilter-dev \
    libavdevice-dev \
    libswresample-dev \
    less \
    groff \
    mpich \
    ninja-build

# Then clean up apt-get cache.
RUN apt-get clean && rm -rf /var/lib/apt/lists/*

# Install git lfs
RUN curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash
RUN apt-get install -y git-lfs
RUN git lfs install

# Set timezone
RUN ln -sf /usr/share/zoneinfo/US/Pacific /etc/localtime

# Set CUDA_ROOT
RUN export CUDA_HOME="/usr/local/cuda"

# Set the working directory.
WORKDIR /workspace
ENV HOME /workspace

# Pip install python packages.
RUN pip install \
    timm \
    opencv-python
# Pip install from github.
RUN pip install git+https://github.com/openai/CLIP.git
# Pip install from local repository.
# If you have a directory tree like below:
# .
# |─ docker
# │   └── Dockerfile
# └── third_party
#     └── detectron2
ADD third_party/detectron2 /workspace/third_party/detectron2
# move to the directory and install it.
WORKDIR /workspace/third_party/detectron2
RUN pip install -e .

# Set the working directory.
WORKDIR /workspace

You can refer to the commands below:

description command example
Base image FROM FROM name/web:ver1.0
Maintainer MAINTAINER MAINTAINER name
Environment variables ENV ENV KEY=VALUE
Execute specified command RUN RUN yum -y install httpd
Add file to the image ADD ADD index.html /var/www/html/index.html
Specify port number EXPOSE EXPOSE 3306
Command to run when the container starts CMD CMD ["service","httpd","start"]
Specify the current working directory WORKDIR WORKDIR /var/www/html
Specify volume VOLUME VOLUME /var/log/httpd

2. Build Docker Image

Generate Docker image from Dockerfile ($REGISTRY_UNAME will be the Docker Hub username):

docker build -t $REGISTRY_UNAME/$IMAGE_NAME -f /path/to/Dockerfile .

Create a new container of an image, and execute it to verify it works.

docker run -it -v $PWD:/share --rm --name $CONTAINER_NAME $REGISTRY_UNAME/$IMAGE_NAME:$TAG

3. Sharing Docker Image

You can save and load the docker image locally:

docker save image_name | gzip > output.tar.gz

output.tar.gz will be generated, which can be used to load on the other machine.

docker load < output.tar.gz

4. Register the image on DockerHub

Docker Hub is a service provided by Docker for finding and sharing container images.

You can push your image to the hub as:

docker login && docker push $REGISTRY_UNAME/$IMAGE_NAME

5. Managing Docker Images

You can list all the images in your local machine by:

docker images

To save your machine space, you can remove the images that you do not need anymore:

docker rmi $IMAGE_ID

Using Singularity (on server)

Setup your code with the GitHub repository:

git clone $REMOTE_REPO
cd $REPO_NAME

Create a bash script (e.g. setup_and_run.sh) to run your code:

#!/bin/bash
pip install --no-cache-dir -e . # install the code repository inside the container
python train.py # run your code

Then, you can run it from singularity like below:

singularity run --bind $PWD:$PWD \ # bind the current directory to the container's working directory
--bind /data/your_datasets:$PWD/datasets \ # bind the data directory to the container's data directory
--nv docker://$REGISTRY_UNAME/$IMAGE_NAME:$TAG \ # load the environment from docker image
bash setup_and_run.sh # run the bash script

Here is an example train.sh that you can modify to use for your own:

Click to view the sample script here
#!/bin/bash

# Check if a config path is provided as an argument
if [ $# -eq 0 ]; then
  echo "[*] Error: No config path provided."
  echo "[*] Usage: $0 <config_path> [options]: --bs 32"
  exit 1
fi

# Extract the config path from the arguments
config_path="$1"
# Check if the config file exists
if [ ! -f "$config_path" ]; then
  echo "[*] Error: Config file not found: $config_path"
  exit 1
fi

echo "Config Path: $config_path"

# get the value of the CUDA_VISIBLE_DEVICES environment variable
devices=$CUDA_VISIBLE_DEVICES
# if CUDA_VISIBLE_DEVICES is not set, ask the user to set it
if [ -z "$devices" ]; then
    echo "[*] CUDA_VISIBLE_DEVICES is not set."
    read -p "[*] Please enter a comma-separated list of GPU IDs to use (e.g., 0,1,2,3): " devices
    export CUDA_VISIBLE_DEVICES=$devices
fi

# count the number of GPUs
gpu_count=$(echo "$devices" | tr ',' '\n' | wc -l)
echo "CUDA_VISIBLE_DEVICES: $CUDA_VISIBLE_DEVICES"
echo "Number of GPUs: $gpu_count"

data_dir=$DATADIR
if [ -z "$data_dir" ]; then
    echo "[*] DATADIR is not set."
    read -p "[*] Please enter a path to datasets: " input
    if [ -z "$input" ]; then
      input=$PWD/datasets
    fi
    # mabe not secure
    eval data_dir="$input"
fi

# Check if the dataset dir exists
if [ ! -d "$data_dir" ]; then
  echo "[*] Error: dataset dir not found: $data_dir"
  exit 1
fi

echo "DATADIR: $data_dir"

# run environmnet 
singularity run -e --bind $PWD:$PWD \
--bind $data_dir:$PWD/datasets \
--nv docker://$REGISTRY_UNAME/$IMAGE_NAME:$TAG \ # change here!
bash ./tools/setup_and_run.sh \
--config-file $config_path \
--num-gpus $gpu_count --amp --wandb "${@:2}"

Other ways to use singularity:

# Interactive Shells
singularity shell <Container-Name>

After you finish your training/evaluation, please clean your cache by:

singularity cache clean

Copying Data to Server

DO NOT COPY THE DATA TO HOME DIRECTORY! MAKE SURE YOU COPY TO THE ALLOCATED DATA DIR

You can use rsync command to copy the data

rsync -av  --progress --partial --append ./src [server1]:/dst

Following are the useful options.

# Archive options
-a, --archive    # archive (-rlptgoD)

-r, --recursive
-l, --links      # copy symlinks as links
-p, --perms      # preserve permissions
-t, --times      # preserve times
-g, --group      # preserve group
-o, --owner      # preserve owner
-D               # same as --devices --specials

# Transfer options
-z, --compress
-n, --dry-run
    --partial   # allows resuming of aborted syncs
    --bwlimit=RATE    # limit socket I/O bandwidth

# Display options
-v, --verbose
-h, --human-readable
-P, --progress  # same as --partial --progress