You must be signed in to change notification settings - Fork 19
Getting Started
is the private key and id_rsa.pub
is the public key.
Open ~/.ssh/config
and add the Host lists like the following. All servers in the lab can be accessed via turing
outside the campus network. Change the [uark_name]
, [server_name]
, and [user_name]
Host turing
HostName turing.csce.uark.edu
User [uark_name]
IdentityFile ~/.ssh/id_rsa
Host [server_name]
HostName [server_name].ddns.uark.edu
User [user_name]
IdentityFile ~/.ssh/id_rsa
ProxyCommand ssh turing -W %h:%p
- Linux shell
ssh-copy-id usr_name@host
- Windows Powershell
type $env:USERPROFILE\.ssh\id_rsa.pub | ssh host "cat >> .ssh/authorized_keys"
Create a Dockerfile
for your project. Below is an example.
Click to view the example of Dockerfile
# Specify the base image: you can explore docker hub.
# For pytorch: https://hub.docker.com/r/pytorch/pytorch/tags
FROM pytorch/pytorch:2.0.1-cuda11.7-cudnn8-runtime
ARG DEBIAN_FRONTEND=noninteractive
ENV TZ=US/Pacific
# Install dependencies and command-line tools.
RUN apt-get update && apt-get install -y \
build-essential \
cmake \
curl \
g++ \
wget \
bzip2 \
git \
vim \
tmux \
htop \
git \
zip \
unzip \
ca-certificates \
libosmesa6-dev \
libgl1-mesa-glx \
libglfw3 \
patchelf \
libglu1-mesa \
libxext6 \
libxtst6 \
libxrender1 \
libxi6 \
libjpeg-dev \
libpng-dev \
libopenblas-dev \
libopencv-dev \
libyaml-dev \
libavformat-dev \
libavcodec-dev \
libswscale-dev \
libavutil-dev \
libavfilter-dev \
libavdevice-dev \
libswresample-dev \
less \
groff \
mpich \
# Then clean up apt-get cache.
RUN apt-get clean && rm -rf /var/lib/apt/lists/*
# Install git lfs
RUN curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | bash
RUN apt-get install -y git-lfs
RUN git lfs install
# Set timezone
RUN ln -sf /usr/share/zoneinfo/US/Pacific /etc/localtime
RUN export CUDA_HOME="/usr/local/cuda"
# Set the working directory.
WORKDIR /workspace
ENV HOME /workspace
# Pip install python packages.
RUN pip install \
timm \
# Pip install from github.
RUN pip install git+https://github.com/openai/CLIP.git
# Pip install from local repository.
# If you have a directory tree like below:
# .
# |─ docker
# │ └── Dockerfile
# └── third_party
# └── detectron2
ADD third_party/detectron2 /workspace/third_party/detectron2
# move to the directory and install it.
WORKDIR /workspace/third_party/detectron2
RUN pip install -e .
# Set the working directory.
WORKDIR /workspace
You can refer to the commands below:
description | command | example |
Base image | FROM | FROM name/web:ver1.0 |
Maintainer | MAINTAINER | MAINTAINER name |
Environment variables | ENV | ENV KEY=VALUE |
Execute specified command | RUN | RUN yum -y install httpd |
Add file to the image | ADD | ADD index.html /var/www/html/index.html |
Specify port number | EXPOSE | EXPOSE 3306 |
Command to run when the container starts | CMD | CMD ["service","httpd","start"] |
Specify the current working directory | WORKDIR | WORKDIR /var/www/html |
Specify volume | VOLUME | VOLUME /var/log/httpd |
Generate Docker image from Dockerfile ($REGISTRY_UNAME
will be the Docker Hub username):
docker build -t $REGISTRY_UNAME/$IMAGE_NAME -f /path/to/Dockerfile .
Create a new container of an image, and execute it to verify it works.
docker run -it -v $PWD:/share --rm --name $CONTAINER_NAME $REGISTRY_UNAME/$IMAGE_NAME:$TAG
You can save and load the docker image locally:
docker save image_name | gzip > output.tar.gz
will be generated, which can be used to load on the other machine.
docker load < output.tar.gz
Docker Hub is a service provided by Docker for finding and sharing container images.
You can push your image to the hub as:
docker login && docker push $REGISTRY_UNAME/$IMAGE_NAME
You can list all the images in your local machine by:
docker images
To save your machine space, you can remove the images that you do not need anymore:
docker rmi $IMAGE_ID
Setup your code with the GitHub repository:
git clone $REMOTE_REPO
Create a bash script (e.g. setup_and_run.sh
) to run your code:
pip install --no-cache-dir -e . # install the code repository inside the container
python train.py # run your code
Then, you can run it from singularity like below:
singularity run --bind $PWD:$PWD \ # bind the current directory to the container's working directory
--bind /data/your_datasets:$PWD/datasets \ # bind the data directory to the container's data directory
--nv docker://$REGISTRY_UNAME/$IMAGE_NAME:$TAG \ # load the environment from docker image
bash setup_and_run.sh # run the bash script
Here is an example train.sh
that you can modify to use for your own:
Click to view the sample script here
# Check if a config path is provided as an argument
if [ $# -eq 0 ]; then
echo "[*] Error: No config path provided."
echo "[*] Usage: $0 <config_path> [options]: --bs 32"
exit 1
# Extract the config path from the arguments
# Check if the config file exists
if [ ! -f "$config_path" ]; then
echo "[*] Error: Config file not found: $config_path"
exit 1
echo "Config Path: $config_path"
# get the value of the CUDA_VISIBLE_DEVICES environment variable
# if CUDA_VISIBLE_DEVICES is not set, ask the user to set it
if [ -z "$devices" ]; then
echo "[*] CUDA_VISIBLE_DEVICES is not set."
read -p "[*] Please enter a comma-separated list of GPU IDs to use (e.g., 0,1,2,3): " devices
export CUDA_VISIBLE_DEVICES=$devices
# count the number of GPUs
gpu_count=$(echo "$devices" | tr ',' '\n' | wc -l)
echo "Number of GPUs: $gpu_count"
if [ -z "$data_dir" ]; then
echo "[*] DATADIR is not set."
read -p "[*] Please enter a path to datasets: " input
if [ -z "$input" ]; then
# mabe not secure
eval data_dir="$input"
# Check if the dataset dir exists
if [ ! -d "$data_dir" ]; then
echo "[*] Error: dataset dir not found: $data_dir"
exit 1
echo "DATADIR: $data_dir"
# run environmnet
singularity run -e --bind $PWD:$PWD \
--bind $data_dir:$PWD/datasets \
--nv docker://$REGISTRY_UNAME/$IMAGE_NAME:$TAG \ # change here!
bash ./tools/setup_and_run.sh \
--config-file $config_path \
--num-gpus $gpu_count --amp --wandb "${@:2}"
Other ways to use singularity:
# Interactive Shells
singularity shell <Container-Name>
After you finish your training/evaluation, please clean your cache by:
singularity cache clean
You can use rsync
command to copy the data
rsync -av --progress --partial --append ./src [server1]:/dst
Following are the useful options.
# Archive options
-a, --archive # archive (-rlptgoD)
-r, --recursive
-l, --links # copy symlinks as links
-p, --perms # preserve permissions
-t, --times # preserve times
-g, --group # preserve group
-o, --owner # preserve owner
-D # same as --devices --specials
# Transfer options
-z, --compress
-n, --dry-run
--partial # allows resuming of aborted syncs
--bwlimit=RATE # limit socket I/O bandwidth
# Display options
-v, --verbose
-h, --human-readable
-P, --progress # same as --partial --progress