From 6ff4d32de6bbec7b1aee1be29519fef16cfd0285 Mon Sep 17 00:00:00 2001 From: Robert Frank Date: Mon, 4 Mar 2024 08:43:11 +0000 Subject: [PATCH] Update FAQ. --- noether_faq.md | 59 ++++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 48 insertions(+), 11 deletions(-) diff --git a/noether_faq.md b/noether_faq.md index f9a4768..43870fb 100644 --- a/noether_faq.md +++ b/noether_faq.md @@ -4,48 +4,84 @@ A1. **What is Noether?** -Noether is the HEP Group's 'Tier3' end-user research computation cluster, consisting of several loosely-coupled physical and virtual machines. Shared home directory and data areas are mounted from a [Gluster](https://www.gluster.org/) farm, and by using the [HTCondor](https://htcondor.org/) scheduler, users can run batch or interactive sessions on the cluster. +Noether is the HEP Group's 'Tier3' end-user research computation cluster, +consisting of several loosely-coupled physical and virtual machines. +Shared home directory and data areas are mounted from a [Gluster](https://www.gluster.org/) farm, +and by using the [HTCondor](https://htcondor.org/) scheduler, +users can run batch or interactive sessions on the cluster. A2. **How do I request an account on Noether?** -Please send an email to [BLACKETT-SUPPORT] cc-ing an academic sponsor and indicating which experimental data your account will be allowed to access. We aim to action such requests within 24h. +Accounts have to be requested by your academic supervisor or line manager. +Please ask them to send an email to [BLACKETT-SUPPORT] with you in CC, providing the following information about you: +- Full name. +- University of Manchester email address. +- Role +- Primary affiliation (experiment or project) +- Expiry date of the account. +- Required access to experimental or project data. A3. **How many work-nodes does Noether have and what are their memory and CPU specifications?** -There are presently 24 work-nodes: 8 having 96 cores / 384GB RAM (4GB/core Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz) and the remaining standard 12 work-nodes having 16 cores / 64 GB RAM (4GB/core Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz). There are in addtion a number of GPU-enabled work-nodes (at the time of wrting 3, of which 1 is interactive and the remainder batch-only); instructions on how to submit jobs to these nodes is given below. The total is therefore currently 864 cores, though in practice this may vary somewhat as equipment is added, retired or placed under maintenance. +There are presently 24 work-nodes: 8 having 96 cores / 384GB RAM (4GB/core Intel(R) Xeon(R) Gold 5220R CPU @ 2.20GHz) and +the remaining standard 12 work-nodes having 16 cores / 64 GB RAM (4GB/core Intel(R) Xeon(R) CPU E5-2620 v4 @ 2.10GHz). +There are in addtion a number of GPU-enabled work-nodes (at the time of wrting 3, of which 1 is interactive and the remainder batch-only); +instructions on how to submit jobs to these nodes is given below. +The total is therefore currently 864 cores, +though in practice this may vary somewhat as equipment is added, +retired or placed under maintenance. ### Section B. Storage ### B1. **What storage is available on Noether and what is its visibility?** -Globally visible data and home directory areas are available with a total capacity of 0.25 PB. Each work-node has a much smaller (and volatile) local 'scratch' area of around 7TB: you should only use 'scratch' as a staging area for ongoing batch work: all code and data you wish to retain on Noether must be copied to your home directory or shared data area. +Globally visible data and home directory areas are available with a total capacity of 0.25 PB. +Each work-node has a much smaller (and volatile) local 'scratch' area of around 7TB: +you should only use 'scratch' as a staging area for ongoing batch work: +all code and data you wish to retain on Noether must be copied to your home directory or shared data area. B2. **Where should I place 'big data' on Noether?** -We have organised per-experiment data diretories on Noether, mounted under `/gluster/data`; all large datasets should be placed here for optimal access in computations. Your initial access to this area will correspond to your experimental affiliation. Access to additional areas will be granted upon request subjecxt to stakeholder approval. +We have organised per-experiment data diretories on Noether, mounted under `/gluster/data`; +all large datasets should be placed here for optimal access in computations. +Your initial access to this area will correspond to your experimental affiliation. +Access to additional areas will be granted upon request subjecxt to stakeholder approval. B3. **What are the quotas for the Home and Data areas on Noether?** -All end-user home directories, which are to be found under `/gluster/home`, are assigned an initial capacity of 5GB, whereas the per-experiment data directories are assigned 10TB. This large datasets should not be kept in one's home directory. If you run over capacity you will be unable to create new files. You must either free up space or request in increase to your quota via the [BLACKETT-SUPPORT] list. +All end-user home directories, which are to be found under `/gluster/home`, +are assigned an initial capacity of 5GB, whereas the per-experiment data directories are assigned 10TB. +This large datasets should not be kept in one's home directory. +If you run over capacity you will be unable to create new files. +You must either free up space or request in increase to your quota via the [BLACKETT-SUPPORT] list. B4. **Are the Home and Data areas backed up?** -*No they are not backed up* presently: it is therefore essential that users of Noether ensure that *all critical code and data* are regularly `rsync`-ed (or otherwise transferred) to a secure and resilient out-of-band location, such as an encrypted external hard-drive. +*No they are not backed up* presently: it is therefore essential that users of Noether ensure that *all critical code and data* are +regularly `rsync`-ed (or otherwise transferred) to a secure and resilient out-of-band location, +such as an encrypted external hard-drive. ### Section C. Usage ### C1. **How do I connect to Noether?** -You must use an ssh-client to connect to Noether. if you use Linux or Mac OS then you may simply `ssh @noether.hep.manchester.ac.uk` The proceedure for resetting your initially assigned password is given in the email sent to you when your account is created. If you use Windows then it it recommended either to use PuTTy or the Windows Subsystem for Linux. +You must use an ssh-client to connect to Noether. +If you use Linux or Mac OS then you may simply `ssh @noether.hep.manchester.ac.uk`. +The proceedure for resetting your initially assigned password is given in the email sent to you when your account is created. +If you use Windows then it it recommended either to use PuTTy or the Windows Subsystem for Linux. C2. **What is a typical workflow on Noether?** -Most users will wish to 'start small' in their home directory under an interactive htcondor session on one of the work-nodes. When code is running correctly, it is then matter of writing a shell-script and htcondor submission file to automate matters. In this way you may scale up your workflow, conduct parameterised 'sweeps' and so on. +Most users will wish to 'start small' in their home directory under an interactive htcondor session on one of the work-nodes. +When code is running correctly, it is then matter of writing a shell-script and htcondor submission file to automate matters. +In this way you may scale up your workflow, conduct parameterised 'sweeps' and so on. C3. **How do I start an interactive session on a work-node?** -In brief, one issues `condor_submit -i` or simply `qrsh` to be 'teleported' to a work-node of Noether (a subset of work-nodes has been set aside for interactive sessions). More details are given in [here](noether_basic_usage.md) +In brief, one issues `condor_submit -i` or simply `qrsh` to be 'teleported' to a work-node of Noether +(a subset of work-nodes has been set aside for interactive sessions). +More details are given in [here](noether_basic_usage.md) C4. **How do I submit a batch job into the HTCondor scheduler queue?** @@ -53,7 +89,8 @@ A simple example of the use of HTCondor to run batch jobs is given [ibid](noethe C5. **How do I use the GPU-enabled work-nodes?** -That is very straightforward: for interactive sessions simply add `qrsh request_gpus=1` to your `qrsh` invocation. For batch jobs you may add `request_gpus=[1-3]` to either your `qsub` invocation or the corresponding `.sub` file for the job (the current batch GPU nodes have three such processors). +That is very straightforward: for interactive sessions simply add `qrsh request_gpus=1` to your `qrsh` invocation. +For batch jobs you may add `request_gpus=[1-3]` to either your `qsub` invocation or the corresponding `.sub` file for the job (the current batch GPU nodes have three such processors). ### Section D. Resource Limits ###