Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MultiThreading and MUNGE #44

Open
IanSudbery opened this issue Jan 15, 2017 · 3 comments
Open

MultiThreading and MUNGE #44

IanSudbery opened this issue Jan 15, 2017 · 3 comments

Comments

@IanSudbery
Copy link

Hi,

I have happily used drmaa-python for many years with our SGE cluster. Just recently a new cluster was installed, and this time it is configured to use MUNGE security.

If I create and submit a simple job, everything works fine, but if I run the job submission as part of a thread pool I get an error about MUNGE security.

For example:

import drmaa
from multiprocessing.pool import ThreadPool
import tempfile
import os
import stat

pool = ThreadPool(2)

session = drmaa.Session()
session.initialize()

def pTask(n):

    smt = "ls . > test.out"
    script_file = tempfile.NamedTemporaryFile(mode="w", dir=os.getcwd(), delete=False)
    script_file.write(smt)
    script_file.close()
    print "Job is in file %s" % script_file.name
    os.chmod(script_file.name, stat.S_IRWXG | stat.S_IRWXU)
    jt = session.createJobTemplate()     
    print "jt created"
    jt.jobEnvironment = {'BASH_ENV': '~/.bashrc'}
    print "environment set"
    jt.remoteCommand = os.path.join(os.getcwd(),script_file.name)
    print "remote command set" 
    jobid = session.runJob(jt)
    print "Job submitted with id: %s, waiting ..." % jobid
    retval = session.wait(jobid, drmaa.Session.TIMEOUT_WAIT_FOREVER)

pool.map(pTask, (1,))

produces the following output

Job is in file /home/userid/tmpbRa0IO
jt created
environment set
error: getting configuration: MUNGE authentication failed: Invalid credential format
remote command set
Traceback (most recent call last):
  File "test_threads.py", line 31, in <module>
    pool.map(pTask, (1,))
  File "/home/mb1ims/.conda/envs/sharc/lib/python2.7/multiprocessing/pool.py", line 251, in map
    return self.map_async(func, iterable, chunksize).get()
  File "/home/mb1ims/.conda/envs/sharc/lib/python2.7/multiprocessing/pool.py", line 567, in get
    raise self._value
drmaa.errors.DeniedByDrmException: code 17: MUNGE authentication failed: Invalid credential format

so the first sign of trouble is when jt.remoteCommand is set, but the script continues and gives an unhandled python error when session.runJob is executed.

@jakirkham
Copy link
Contributor

Not familiar with MUNGE, but guessing only one thread can submit a job at a time due to how MUNGE does validation. Have you tried using a lock around job submission?

Alternatively have you tried using a job array for submission? This would be one request that would allow you to submit multiple jobs.

At the end of the day, am guessing this will require a conversation with your Cluster's Admins to understand how they have configured this security protocol and what qualifies as acceptable usage. Would be interested to hear the results of that conversation and whether there are things DRMAA could do to make it easier to use for this case.

@bring52405
Copy link

Has there been any more discussion on this issue? I have an application that uses drmaa to submit jobs to an sge cluster and I am getting invalid credential format error in munge. I have isolated the issue to drmaa's submit job function. Don't know what to do from here. The munge developer claimed that error indicates that the munge credential is getting truncated.

@IanSudbery
Copy link
Author

Not here. The only place we got was to confirm that this isn't a problem with the java drmaa library - we can submit jobs from multiple threads in java no problem.

Since all our application does is submit drmaa jobs, we've moved away from a multi-threading paradigm and use a different structure to manage the different job streams to get around the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants