diff --git a/_alerts/2024-04-30_slurm.md b/_alerts/2024-04-30_slurm.md index 8bc7962..17b409b 100644 --- a/_alerts/2024-04-30_slurm.md +++ b/_alerts/2024-04-30_slurm.md @@ -1,9 +1,9 @@ --- -status: Ongoing +status: Resolved type: Service Alert -start_date: 2024-04-30 -end_date: -scope: Cirrus compute nodes. -impact: Suspect that jobs will not run successfully +start_date: 2024-04-30 08:35 +end_date: 2024-04-30 10:00 +scope: Cirrus compute nodes and scratch (solid state RPOOL) file system +impact: Jobs may have failed. Please contact the service desk if you think you require a refund reason: A switch which is connected to the slurm controller is down, this is causing lots of hangs on all nodes. Systems team are investigating but it may mean that all running jobs have failed. ---