Skip to content

Commit

Permalink
[manual][7.2] editorialize most sections.
Browse files Browse the repository at this point in the history
This includes formatting, proofreading, factual things.
WIP the GPU section.
  • Loading branch information
smokhov committed Aug 5, 2024
1 parent eab7ff7 commit 07cd7d8
Show file tree
Hide file tree
Showing 5 changed files with 511 additions and 504 deletions.
41 changes: 17 additions & 24 deletions doc/scheduler-directives.tex
Original file line number Diff line number Diff line change
Expand Up @@ -15,23 +15,18 @@
Directives that start with \verb|#SBATCH| set the options for the cluster's
SLURM job scheduler. The following provides an example of some essential directives:

%\begin{verbatim}
%#$ -N <jobname>
%#$ -cwd
%#$ -m bea
%#$ -pe smp <corecount>
%#$ -l h_vmem=<memory>G
%\end{verbatim}

\small
\begin{verbatim}
#SBATCH --job-name=<jobname> ## or -J. Give the job a name
#SBATCH --mail-type=<type> ## set type of email notifications
#SBATCH --chdir=<directory> ## or -D, set working directory for the job
#SBATCH --nodes=1 ## or -N, node count required for the job
#SBATCH --ntasks=1 ## or -n, number of tasks to be launched
#SBATCH --cpus-per-task=<corecount> ## or -c, core count requested, e.g. 8 cores
#SBATCH --mem=<memory> ## assign memory for this job, e.g., 32G memory per node
#SBATCH --mem=<memory> ## assign memory for this job,
## e.g., 32G memory per node
\end{verbatim}
\normalsize

\noindent Replace the following to adjust the job script for your project(s)
\begin{itemize}
Expand All @@ -43,28 +38,26 @@
Valid options are: NONE, BEGIN, END, FAIL, REQUEUE, ALL.
\item \verb+<corecount>+ with the degree of multithreaded parallelism (i.e., cores) allocated to your job. Up to 32 by default.
\item \verb+<memory>+ with the amount of memory, in GB, that you want to be allocated per node. Up to 500 depending on the node.\\
\textbf{Note}: All jobs MUST set a value for the \verb|--mem| option.
\textbf{Note}: All jobs MUST set a value for the \option{--mem} option.
\end{itemize}

\noindent Example with short option equivalents:
\small
\begin{verbatim}
#SBATCH -J myjob ## Job's name set to 'myjob'
#SBATCH --mail-type=ALL ## Receive all email type notifications
#SBATCH -D ./ ## Use current directory as working directory
#SBATCH -N 1 ## Node count required for the job
#SBATCH -n 1 ## Number of tasks to be launched
#SBATCH -c 8 ## Request 8 cores
#SBATCH --mem=32G ## Allocate 32G memory per node
#SBATCH -J myjob ## Job's name set to 'myjob'
#SBATCH --mail-type=ALL ## Receive all email type notifications
#SBATCH -D ./ ## Use current directory as working directory
#SBATCH -N 1 ## Node count required for the job
#SBATCH -n 1 ## Number of tasks to be launched
#SBATCH -c 8 ## Request 8 cores
#SBATCH --mem=32G ## Allocate 32G memory per node
\end{verbatim}
\normalsize

\noindent \textbf{Tip:} If you are unsure about memory footprints, err on assigning a generous
memory space to your job, so that it does not get prematurely terminated.
%(the value given to \api{h\_vmem} is a hard memory ceiling).
You can refine \option{--mem} values for future jobs by monitoring the size of a job's active
memory space on \texttt{speed-submit} with:
%\begin{verbatim}
%qstat -j <jobID> | grep maxvmem
%\end{verbatim}

\begin{verbatim}
sacct -j <jobID>
Expand All @@ -78,15 +71,15 @@
sstat -o jobid,maxvmsize,ntasks%7,tresusageouttot%25 -j <jobID>
\end{verbatim}

\noindent Memory-footprint values are also provided for completed jobs in the final
\noindent Memory-footprint efficiency values (\tool{seff}) are also provided for completed jobs in the final
email notification as ``maxvmsize''.
\emph{Jobs that request a low-memory footprint are more likely to load on a busy
cluster.}\\

\noindent Other essential options are \option{--time}, or \verb|-t|, and \option{--account}, or \verb|-A|.
\noindent Other essential options are \option{--time}, or \option{-t}, and \option{--account}, or \option{-A}.
\begin{itemize}
\item \option{--time=<time>} -- is the estimate of wall clock time required for your job to run.
As preiviously mentioned, the maximum is 7 days for batch and 24 hours for interactive jobs.
As previously mentioned, the maximum is 7 days for batch and 24 hours for interactive jobs.
Jobs with a smaller \texttt{time} value will have a higher priority and may result in your job being scheduled sooner.

\item \option{--account=<name>} -- specifies which Account, aka project or association,
Expand Down
65 changes: 33 additions & 32 deletions doc/scheduler-faq.tex
Original file line number Diff line number Diff line change
@@ -1,65 +1,77 @@
% ------------------------------------------------------------------------------
% -----------------------------------------------------------------------------
% B Frequently Asked Questions
% ------------------------------------------------------------------------------
% -----------------------------------------------------------------------------
\section{Frequently Asked Questions}
\label{sect:faqs}

% B.1 Where do I learn about Linux?
% -------------------------------------------------------------
% -----------------------------------------------------------------------------
\subsection{Where do I learn about Linux?}
\label{sect:faqs-linux}

All Speed users are expected to have a basic understanding of Linux and its commonly used commands.
Here are some recommended resources:

% -----------------------------------------------------------------------------
\paragraph*{Software Carpentry}

Software Carpentry provides free resources to learn software, including a workshop on the Unix shell.
Visit \href{https://software-carpentry.org/lessons/}{Software Carpentry Lessons} to learn more.

% -----------------------------------------------------------------------------
\paragraph*{Udemy}

There are numerous Udemy courses, including free ones, that will help you learn Linux.
Active Concordia faculty, staff and students have access to Udemy courses.
A recommended starting point for beginners is the course ``Linux Mastery: Master the Linux Command Line in 11.5 Hours''.
Visit \href{https://www.concordia.ca/it/services/udemy.html}{Concordia's Udemy page} to learn how Concordians can access Udemy.

% B.2 How to bash shell on Speed?
% -------------------------------------------------------------
\subsection{How to use bash shell on \tool{Speed}?}
% -----------------------------------------------------------------------------
\subsection{How to use bash shell on Speed?}
\label{sect:faqs-bash}

This section provides comprehensive instructions on how to utilize the bash shell on the Speed cluster.

% B.2.1 How do I set bash as my login shell?
% -----------------------------------------------------------------------------
\subsubsection{How do I set bash as my login shell?}

To set your default login shell to bash on Speed, your login shell on all GCS servers must be changed to bash.
To make this change, create a ticket with the Service Desk (or email \texttt{help at concordia.ca}) to
request that bash become your default login shell for your ENCS user account on all GCS servers.

% B.2.2 How do I move into a bash shell on Speed?
\subsubsection{How do I move into a bash shell on \tool{Speed}?}
% -----------------------------------------------------------------------------
\subsubsection{How do I move into a bash shell on Speed?}

To move to the bash shell, type \textbf{bash} at the command prompt:
\begin{verbatim}
[speed-submit] [/home/a/a_user] > bash
bash-4.4$ echo $0
bash
\end{verbatim}

\noindent \textbf{Note} how the command prompt changes from
\noindent
\textbf{Note} how the command prompt changes from
``\verb![speed-submit] [/home/a/a_user] >!'' to ``\verb!bash-4.4$!'' after entering the bash shell.

% B.2.3 How do I use the bash shell in an interactive session on Speed?
\subsubsection{How do I use the bash shell in an interactive session on \tool{Speed}?}
% -----------------------------------------------------------------------------
\subsubsection{How do I use the bash shell in an interactive session on Speed?}

Below are examples of how to use \tool{bash} as a shell in your interactive job sessions
with both the \tool{salloc} and \tool{srun} commands.

\begin{itemize}
\item \texttt{salloc -ppt --mem=100G -N 1 -n 10 /encs/bin/bash}
\item \texttt{srun --mem=50G -n 5 --pty /encs/bin/bash}
\item \texttt{srun --mem=50G -n 5 --pty /encs/bin/bash}
\end{itemize}

\noindent\textbf{Note:} Make sure the interactive job requests memory, cores, etc.

% B.2.4 How do I run scripts written in bash on Speed?
% -----------------------------------------------------------------------------
\subsubsection{How do I run scripts written in bash on \tool{Speed}?}

To execute bash scripts on Speed:
Expand All @@ -74,8 +86,10 @@ \subsubsection{How do I run scripts written in bash on \tool{Speed}?}
% B.3 How to resolve “Disk quota exceeded” errors?
% -------------------------------------------------------------
\subsection{How to resolve ``Disk quota exceeded'' errors?}
\label{sect:quota-exceeded}

% B.3.1 Probable Cause
% -----------------------------------------------------------------------------
\subsubsection{Probable Cause}

The ``\texttt{Disk quota exceeded}'' error occurs when your application has
Expand All @@ -87,6 +101,7 @@ \subsubsection{Probable Cause}
\end{enumerate}

% B.3.2 Possible Solutions
% -----------------------------------------------------------------------------
\subsubsection{Possible Solutions}

\begin{enumerate}
Expand Down Expand Up @@ -119,6 +134,7 @@ \subsubsection{Possible Solutions}
\noindent In the above example, \verb!$USER! is an environment variable containing your ENCS username.

% B.3.3 Example of setting working directories for COMSOL
% -----------------------------------------------------------------------------
\subsubsection{Example of setting working directories for \tool{COMSOL}}

\begin{itemize}
Expand All @@ -137,6 +153,7 @@ \subsubsection{Example of setting working directories for \tool{COMSOL}}
\noindent In the above example, \verb!$USER! is an environment variable containing your ENCS username.

% B.3.4 Example of setting working directories for Python Modules
% -----------------------------------------------------------------------------
\subsubsection{Example of setting working directories for \tool{Python Modules}}

By default when adding a Python module, the \texttt{/tmp} directory is set as the temporary repository for files downloads.
Expand All @@ -156,7 +173,7 @@ \subsubsection{Example of setting working directories for \tool{Python Modules}}
\noindent In the above example, \verb!$USER! is an environment variable containing your ENCS username.

% B.4 How do I check my job's status?
% -------------------------------------------------------------
% -----------------------------------------------------------------------------
\subsection{How do I check my job's status?}

%When a job with a job id of 1234 is running, the status of that job can be tracked using \verb!`qstat -j 1234`!.
Expand All @@ -181,33 +198,16 @@ \subsection{How do I check my job's status?}
\end{itemize}

% B.5 Why is my job pending when nodes are empty?
% -------------------------------------------------------------
% -----------------------------------------------------------------------------
\subsection{Why is my job pending when nodes are empty?}

% B.5.1 Disabled nodes
% -----------------------------------------------------------------------------
\subsubsection{Disabled nodes}

It is possible that one or more of the Speed nodes are disabled for maintenance.
To verify if Speed nodes are disabled, check if they are in a draining or drained state:

%\begin{verbatim}
%qstat -f -qs d
%queuename qtype resv/used/tot. load_avg arch states
%---------------------------------------------------------------------------------
%[email protected] BIP 0/0/32 0.27 lx-amd64 d
%---------------------------------------------------------------------------------
%[email protected] BIP 0/0/32 0.01 lx-amd64 d
%---------------------------------------------------------------------------------
%[email protected] BIP 0/0/32 0.01 lx-amd64 d
%---------------------------------------------------------------------------------
%[email protected] BIP 0/0/32 0.02 lx-amd64 d
%---------------------------------------------------------------------------------
%[email protected] BIP 0/0/32 0.03 lx-amd64 d
%---------------------------------------------------------------------------------
%[email protected] BIP 0/0/32 0.01 lx-amd64 d
%---------------------------------------------------------------------------------
%[email protected] BIP 0/0/32 0.03 lx-amd64 d
%\end{verbatim}

\small
\begin{verbatim}
[serguei@speed-submit src] % sinfo --long --Node
Expand Down Expand Up @@ -260,6 +260,7 @@ \subsubsection{Disabled nodes}
and the disabled nodes have a state of \textbf{idle}.

% B.5.2 Error in job submit request.
% -----------------------------------------------------------------------------
\subsubsection{Error in job submit request.}

It is possible that your job is pending because it requested resources that are not available within Speed.
Expand All @@ -268,5 +269,5 @@ \subsubsection{Error in job submit request.}
sacct -j 1234
\end{verbatim}

\noindent A summary of the reasons can be obtained via the \tool{squeue} command.
%and review the messages in the \textbf{scheduling info:} section.
\noindent
A summary of the reasons can be obtained via the \tool{squeue} command.
Loading

0 comments on commit 07cd7d8

Please sign in to comment.