-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultiThreading and MUNGE #44
Comments
Not familiar with MUNGE, but guessing only one thread can submit a job at a time due to how MUNGE does validation. Have you tried using a lock around job submission? Alternatively have you tried using a job array for submission? This would be one request that would allow you to submit multiple jobs. At the end of the day, am guessing this will require a conversation with your Cluster's Admins to understand how they have configured this security protocol and what qualifies as acceptable usage. Would be interested to hear the results of that conversation and whether there are things DRMAA could do to make it easier to use for this case. |
Has there been any more discussion on this issue? I have an application that uses drmaa to submit jobs to an sge cluster and I am getting invalid credential format error in munge. I have isolated the issue to drmaa's submit job function. Don't know what to do from here. The munge developer claimed that error indicates that the munge credential is getting truncated. |
Not here. The only place we got was to confirm that this isn't a problem with the java drmaa library - we can submit jobs from multiple threads in java no problem. Since all our application does is submit drmaa jobs, we've moved away from a multi-threading paradigm and use a different structure to manage the different job streams to get around the problem. |
Hi,
I have happily used drmaa-python for many years with our SGE cluster. Just recently a new cluster was installed, and this time it is configured to use MUNGE security.
If I create and submit a simple job, everything works fine, but if I run the job submission as part of a thread pool I get an error about MUNGE security.
For example:
produces the following output
so the first sign of trouble is when
jt.remoteCommand
is set, but the script continues and gives an unhandled python error whensession.runJob
is executed.The text was updated successfully, but these errors were encountered: