-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cli: implement upload command #133
Conversation
Codecov ReportBase: 86.95% // Head: 84.76% // Decreases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #133 +/- ##
==========================================
- Coverage 86.95% 84.76% -2.20%
==========================================
Files 33 34 +1
Lines 2660 2802 +142
==========================================
+ Hits 2313 2375 +62
- Misses 273 351 +78
- Partials 74 76 +2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Can you also reproduce this issue when dealing with symlinks? Directory structure:
reana-client:
reana-client-go:
|
I was able to reproduce it. In fact after spending some time trying to fix the issue I think it would be better to drop the support of symlinks because it could easily lead to troubles. It actually appears that Go's As an experiment you could create a directory symlink which would point back to parent As a quick solution to the problem I modified the code to detect the symlinked files and ignore it. We can discuss how we want to proceed with it and probably fix it in another PR as it appears to be a bit more difficult to do it right. |
cmd/upload.go
Outdated
pathInfo, err := os.Stat(path) | ||
if err != nil { | ||
return nil, nil, err | ||
} | ||
if pathInfo.IsDir() { | ||
dirs = append(dirs, path) | ||
} else { | ||
files = append(files, path) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Symlinks cause some troubles also here, as they can be appended to files
.
Given that filepath.Walk
works also on files, I am wondering if we can avoid dividing paths into files
and dirs
. This would make the code less complicated, as we wouldn't need the filterDirectories
function. On the other hand, if a user specifies a file in directories
(or a directory in files
) in the specification we would need to check manually whether that is actually a directory or not to spot the error. What do you think?
(I also don't know if you would prefer to fix this issues later in a new PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! I've changed the logic to use filepath.Walk
on both files and directories and it works fine with less code, thanks! In this case even if user specifies a file in directories
it works as expected.
On the other hand, if a user specifies a file in directories (or a directory in files) in the specification we would need to check manually whether that is actually a directory or not to spot the error. What do you think?
You mean we should report to a user if a file is found in directories
and viceversa?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean we should report to a user if a file is found in
directories
and viceversa?
Yes, that is what I was referring to. I think we can discuss this in a separate issue, given that it also affects the python client!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, listing directories under files
should be reported back to user, and vice versa. Since we have made that distinction, we should keep it.
(Alternatively, we can respect both, so if a user uses only files
and puts directories there, we can "silently" accept them. This is something that could pave the way towards using only one directive in the future.)
BTW, regarding listing files and directories "unnecessarily", I have another observation. If a uses specifies the same file twice, it'll be uploaded twice, for example:
$ git diff
diff --git a/reana.yaml b/reana.yaml
index 4c69293..75d70e8 100644
--- a/reana.yaml
+++ b/reana.yaml
@@ -1,6 +1,11 @@
version: 0.6.0
inputs:
files:
+ - code/gendata.C
+ - code/gendata.C
+ - code/gendata.C
+ - code/gendata.C
+ - code/gendata.C
- code/gendata.C
- code/fitdata.C
parameters:
leads to:
$ reana-client run -w test
...
==> Uploading files...
==> Detected .gitignore file. Some files might get ignored.
==> SUCCESS: File /code/gendata.C was successfully uploaded.
==> SUCCESS: File /code/gendata.C was successfully uploaded.
==> SUCCESS: File /code/gendata.C was successfully uploaded.
==> SUCCESS: File /code/gendata.C was successfully uploaded.
==> SUCCESS: File /code/gendata.C was successfully uploaded.
==> SUCCESS: File /code/gendata.C was successfully uploaded.
==> SUCCESS: File /code/fitdata.C was successfully uploaded.
==> Starting workflow...
It would be nice to first make a list of all user-desired files and directories and walk through them, collect all the uploadable objects (avoiding symlinks), and then make a set out of them, so that we don't do any unnecessary upload the same object many times.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
P.S. The above upload behaviour was observed with REANA 0.8 clusters. Using the same client with REANA master branch clusters , one gets:
==> Starting workflow...
==> ERROR: Cannot start workflow test.1:
Input path declared multiple times: code/gendata.C
However, instead of throwing an error, it may be better to accept such a configuration and simply "deduplicate" the desired files and directories to be uploaded inside the client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Especially because reana-client validate
doesn't complain about the same file being listed multiple times. So, if we accept such a configuration as valid, we should allow processing it... Fully agree that these concerns are something for a new issue.)
Yes, I don't think many people use the symlinks in sources, so as an MVP we could detect whether a file object is a symlink, and if it is, print a visible warning to the user that it'll be ignored (e.g. as we do for gitignore and friends) and continue with uploading the rest. |
Updated PR with the following changes:
|
cmd/upload.go
Outdated
return err | ||
} | ||
if pathInfo.IsDir() { | ||
return fmt.Errorf("directory is found in files directive: %s", file) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("directory is found in files directive: %s", file) | |
return fmt.Errorf("found directory in `inputs.files`: %s", file) |
WDYT?
cmd/upload.go
Outdated
return err | ||
} | ||
if !pathInfo.IsDir() { | ||
return fmt.Errorf("file is found in directories directive: %s", dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return fmt.Errorf("file is found in directories directive: %s", dir) | |
return fmt.Errorf("found file in `inputs.directories`: %s", dir) |
WDYT?
closes #131
To test:
Please note that #134 will be tackled in a separate PR