-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More flexible Job listing/killing for Slurm #173
base: master
Are you sure you want to change the base?
Conversation
Can you call |
no, squeue without the clusters argument will always return no jobs (the partition argument is optional and only used if a specific partition is actually used, which means there is a default partition but no default cluster). There are no duplicated job ids as far as I know. I can double check, but until now the job.id was always a unique identifier over all clusters |
How about this approach:
|
Ah wait, this does not work for |
@mllg why dont you simply expose the "args" from listJobs in the constructor, with your settings as default? and then users can overwrite this flexibly? isnt that the normal trick? and this changes nothing for anybody else or the internal code? |
this here:
just expose this as args.listjobsqueued (or whatever), with the string as a default? |
Can you please comment if
|
I don't think this helps. I hope we don't need the clusters argument anymore if we get that to work. I think I'll take this rather ugly fix here and keep them as clusterFunctionsSlurLRZ or whatever in the config repository for our project on the lrz. Since all cluster users are linking against my batchtools.conf file anyways... The perfect solution for us would be that clusters + partitions are resources that can be set on a job level (which is already possible, I think) and have the listing/killing calls take the values from there |
#180 ? |
I could do the same thing for partitions, but I really don't know what I'm doing. 😕 |
This does not solve the original cluster/partition issue, but @berndbischl's suggestion of exposing the arguments would solve a problem I am encountering where all of my SLURM jobs show up as expired until done. My computing cluster has its own version of squeue (see here), but it only recognizes |
Ok it looks like |
We frequently change clusters/partition on our HPC and setting the clusters in makeClusterFunctionSlurm() is not really practical (multiple users are linking to the same template/config file).
So we handle the clusters/partitions via resources. To make listing/killing of the jobs possible we need to set the squeue arguments in the functions accordingly.
This is not really nice, but I can't think of another solution. As far as I know you can't change the clusters argument of makeClusterFunctionSlurm not after or while creating the registry.