You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The code should print out what it tried to do (I mean the actual sacct command line) so that it can be tried manually by the user.
I have a feeling this is going to require a different approach (and which was already suggested, I'm pretty sure), of asking sacct for all job ids since a certain time and then parsing that output for just the job ids of interest. That has the downside that you may be receiving many thousands of job ids in which you have no interest, but probably it can be restricted to the user who started the jobs, or similar.
The text was updated successfully, but these errors were encountered:
SLURM consists of two pieces - the controller and the database daemon. Information about jobs
starts in the controller and is conveyed asynchronously to the database daemon, and sacct talks
to the database daemon. This means that at busy times, there may be a delay before job information
reaches the database and can be queried by sacct. Eventually the information will be transferred and
the controller will forget that particular job. Note that job 11260826 is still pending, and is known to
the controller.
I suggest that it would be more reliable if your script started by using squeue to query the job (this
talks to the controller and has its own format options), and falls back to sacct if squeue responds
"no such job".
It looks like a call to
sacct
can fail when there are too many job ids.The code should print out what it tried to do (I mean the actual
sacct
command line) so that it can be tried manually by the user.I have a feeling this is going to require a different approach (and which was already suggested, I'm pretty sure), of asking
sacct
for all job ids since a certain time and then parsing that output for just the job ids of interest. That has the downside that you may be receiving many thousands of job ids in which you have no interest, but probably it can be restricted to the user who started the jobs, or similar.The text was updated successfully, but these errors were encountered: