You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The behaviour of LocalFile is inconsistent between the use of wildcards and exact filenames in outputfiles. This issue is seen only when the localdir of outputfile is not empty. For wildcards, the outputfiles are not copied to the localdir, and the job status is failed.
Reproduction:
Assume the following code is placed in the ~/ganga directory, and run using the ganga batch mode from the same directory.
Code to reproduce:
j = Job()
j.application.exe = "touch"
j.application.args = ["a.txt", "b.txt"]
# After the job completes, the files a.txt and b.txt are placed in the ~/ganga directory, and the job status is completed
j.outputfiles = [LocalFile("./a.txt"), LocalFile("./b.txt")]
# If the following outputfiles list is used, no files are placed in the ~/ganga directory, and the job status is failed
# j.outputfiles = [LocalFile("./*.txt")]
j.submit()
The copying issue seems to be because of the way the put method in LocalFile.py is implemented.
def put(self):
"""
Copy the file to the destination (in the case of LocalFile the localDir)
"""
# This is useful for placing the LocalFile in a subdir at the end of a job
# FIXME this method should be written to work with some other parameter than localDir for job outputs but for now this 'works'
if self.localDir:
try:
job = self.getJobObject()
except AssertionError:
return
# Copy to 'desitnation'
if path.isfile(path.join(job.outputdir, self.namePattern)):
if not path.exists(path.join(job.outputdir, self.localDir)):
os.makedirs(path.join(job.outputdir, self.localDir))
shutil.copy(path.join(job.outputdir, self.namePattern),
path.join(job.outputdir, self.localDir, self.namePattern))
It tries to check whether the outputfiles are present in the job's temporary directory (job.outputdir in the code), and copies them to path.join(job.outputdir, self.localDir) (this should correspond to ~/ganga in the example at the top).
Please not the use of the ./ in the reproduction script at the beginning of the filename, which causes an absolute path to be used. The absolute path causes the job.outputdir to be ignored in the path.join call, and the file gets copied to self.localDir
Though the reproduction uses this weird quirk of path.join, this issue should still exist when self.localdir is a relative subdirectory of job.outputdir.
For wildcards, the namePattern is *.txt, while for exact file names, the namePattern would be a.txt
Since the code looks for an exact match, and no file is present that is named *.txt, no files are copied when wildcards are used.
Fix
If the put method is slightly modified to use pattern matching rather than looking for exact filenames, wildcard files are copied too.
def put(self):
"""
Copy the file to the destination (in the case of LocalFile the localDir)
"""
# This is useful for placing the LocalFile in a subdir at the end of a job
# FIXME this method should be written to work with some other parameter than localDir for job outputs but for now this 'works'
if self.localDir:
try:
job = self.getJobObject()
except AssertionError:
return
if not path.exists(path.join(job.outputdir, self.localDir)):
os.makedirs(path.join(job.outputdir, self.localDir))
for outputFile in glob.glob(path.join(job.outputdir, self.namePattern)):
shutil.copy(outputFile,
path.join(job.outputdir, self.localDir, path.basename(outputFile)))
This does not fix the issue with the job failing, I have yet to check what causes that particular problem.
The text was updated successfully, but these errors were encountered:
When we use LocalFile in the inputfiles attribute, we want to make sure that any relative PATH is expanded to an absolute one (this ensures that what is persisted in the Ganga repository doesn't depend on from which directory Ganga was started). However, when it is used in the outputfiles attribute, you could argue that we should only expand the relative filename when we need to use it.
If we have the same filename in multiple subdirectories (so ./a/file.txt and ./b/file.txt), then a wildcard ./*/file.txt would lead to two files with the same name copied back. For a LocalFile we might just replicate that directory structure but it is not obvious that this would work for DiracFile, GoogleFile etc.
The Issue:
The behaviour of LocalFile is inconsistent between the use of wildcards and exact filenames in outputfiles. This issue is seen only when the localdir of outputfile is not empty. For wildcards, the outputfiles are not copied to the localdir, and the job status is failed.
Reproduction:
Assume the following code is placed in the
~/ganga
directory, and run using the ganga batch mode from the same directory.Code to reproduce:
A recording of the reproduction is also available at https://asciinema.org/a/utqAYu97FVkC0MzxcGVnMiXkO
Analysis
The copying issue seems to be because of the way the
put
method inLocalFile.py
is implemented.It tries to check whether the outputfiles are present in the job's temporary directory (
job.outputdir
in the code), and copies them topath.join(job.outputdir, self.localDir)
(this should correspond to~/ganga
in the example at the top).Please not the use of the
./
in the reproduction script at the beginning of the filename, which causes an absolute path to be used. The absolute path causes thejob.outputdir
to be ignored in thepath.join
call, and the file gets copied toself.localDir
Though the reproduction uses this weird quirk of
path.join
, this issue should still exist when self.localdir is a relative subdirectory ofjob.outputdir
.For wildcards, the namePattern is
*.txt
, while for exact file names, the namePattern would bea.txt
Since the code looks for an exact match, and no file is present that is named
*.txt
, no files are copied when wildcards are used.Fix
If the
put
method is slightly modified to use pattern matching rather than looking for exact filenames, wildcard files are copied too.This does not fix the issue with the job failing, I have yet to check what causes that particular problem.
The text was updated successfully, but these errors were encountered: