Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli: implement upload command #133

Merged
merged 1 commit into from
Oct 3, 2022
Merged

Conversation

audrium
Copy link
Member

@audrium audrium commented Sep 23, 2022

closes #131

To test:

$ go build
$ cd ../reana-demo-helloworld 
$ reana-client create -w upload-test
$ ../reana-client-go/reana-client-go upload -w upload-test reana.yaml -l DEBUG # check https://localhost:30443 for the reana.yaml file to be present in the workspace
$ ../reana-client-go/reana-client-go upload -w upload-test -l DEBUG # check https://localhost:30443 for all the workspace input files to be present in the workspace
# modify reana.yaml to specify some input directories and test that all files are uploaded

Please note that #134 will be tackled in a separate PR

@codecov-commenter
Copy link

codecov-commenter commented Sep 23, 2022

Codecov Report

Base: 86.95% // Head: 84.76% // Decreases project coverage by -2.19% ⚠️

Coverage data is based on head (d7b972e) compared to base (a16c6b0).
Patch coverage: 41.95% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #133      +/-   ##
==========================================
- Coverage   86.95%   84.76%   -2.20%     
==========================================
  Files          33       34       +1     
  Lines        2660     2802     +142     
==========================================
+ Hits         2313     2375      +62     
- Misses        273      351      +78     
- Partials       74       76       +2     
Impacted Files Coverage Δ
pkg/workflows/manager.go 5.26% <0.00%> (-5.85%) ⬇️
cmd/upload.go 57.84% <57.84%> (ø)
cmd/root.go 80.11% <100.00%> (+1.70%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@mdonadoni
Copy link
Member

Can you also reproduce this issue when dealing with symlinks?

Directory structure:

test
├── dir
│   └── inside.txt
├── dir_symlink -> dir
├── file_symlink.txt -> outside.txt
└── outside.txt

reana-client:

==> Detected .gitignore file. Some files might get ignored.
==> SUCCESS: File /test/outside.txt was successfully uploaded.
==> SUCCESS: Symlink resolved to /test/outside.txt. Uploaded hard copy.
==> SUCCESS: File /test/dir/inside.txt was successfully uploaded.

reana-client-go:

==> SUCCESS: File test/dir/inside.txt was successfully uploaded.
==> ERROR: file 'test/dir_symlink' is a directory

@audrium
Copy link
Member Author

audrium commented Sep 26, 2022

Can you also reproduce this issue when dealing with symlinks?

I was able to reproduce it. In fact after spending some time trying to fix the issue I think it would be better to drop the support of symlinks because it could easily lead to troubles. It actually appears that Go's filepath.Walk function does not follow symlinks by design because it's difficult to get it correct avoiding infinite loops.

As an experiment you could create a directory symlink which would point back to parent test folder and reana-client would break while uploading files! Also, while inspecting the current reana-client code I noticed it renames the symlinked files to their original names. It could also lead to troubles if a part of the workflow is designed to call a symlink by it's name (e.g while uploading file_symlink.txt it would be named outside.txt in the workflow).

As a quick solution to the problem I modified the code to detect the symlinked files and ignore it. We can discuss how we want to proceed with it and probably fix it in another PR as it appears to be a bit more difficult to do it right.

cmd/upload.go Outdated Show resolved Hide resolved
cmd/upload.go Outdated
Comment on lines 144 to 165
pathInfo, err := os.Stat(path)
if err != nil {
return nil, nil, err
}
if pathInfo.IsDir() {
dirs = append(dirs, path)
} else {
files = append(files, path)
}
Copy link
Member

@mdonadoni mdonadoni Sep 27, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symlinks cause some troubles also here, as they can be appended to files.

Given that filepath.Walk works also on files, I am wondering if we can avoid dividing paths into files and dirs. This would make the code less complicated, as we wouldn't need the filterDirectories function. On the other hand, if a user specifies a file in directories (or a directory in files) in the specification we would need to check manually whether that is actually a directory or not to spot the error. What do you think?

(I also don't know if you would prefer to fix this issues later in a new PR)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I've changed the logic to use filepath.Walk on both files and directories and it works fine with less code, thanks! In this case even if user specifies a file in directories it works as expected.

On the other hand, if a user specifies a file in directories (or a directory in files) in the specification we would need to check manually whether that is actually a directory or not to spot the error. What do you think?

You mean we should report to a user if a file is found in directories and viceversa?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean we should report to a user if a file is found in directories and viceversa?

Yes, that is what I was referring to. I think we can discuss this in a separate issue, given that it also affects the python client!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, listing directories under files should be reported back to user, and vice versa. Since we have made that distinction, we should keep it.

(Alternatively, we can respect both, so if a user uses only files and puts directories there, we can "silently" accept them. This is something that could pave the way towards using only one directive in the future.)

BTW, regarding listing files and directories "unnecessarily", I have another observation. If a uses specifies the same file twice, it'll be uploaded twice, for example:

$ git diff
diff --git a/reana.yaml b/reana.yaml
index 4c69293..75d70e8 100644
--- a/reana.yaml
+++ b/reana.yaml
@@ -1,6 +1,11 @@
 version: 0.6.0
 inputs:
   files:
+    - code/gendata.C
+    - code/gendata.C
+    - code/gendata.C
+    - code/gendata.C
+    - code/gendata.C
     - code/gendata.C
     - code/fitdata.C
   parameters:

leads to:

$ reana-client run -w test
...
==> Uploading files...
==> Detected .gitignore file. Some files might get ignored.
==> SUCCESS: File /code/gendata.C was successfully uploaded.
==> SUCCESS: File /code/gendata.C was successfully uploaded.
==> SUCCESS: File /code/gendata.C was successfully uploaded.
==> SUCCESS: File /code/gendata.C was successfully uploaded.
==> SUCCESS: File /code/gendata.C was successfully uploaded.
==> SUCCESS: File /code/gendata.C was successfully uploaded.
==> SUCCESS: File /code/fitdata.C was successfully uploaded.
==> Starting workflow...

It would be nice to first make a list of all user-desired files and directories and walk through them, collect all the uploadable objects (avoiding symlinks), and then make a set out of them, so that we don't do any unnecessary upload the same object many times.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. The above upload behaviour was observed with REANA 0.8 clusters. Using the same client with REANA master branch clusters , one gets:

==> Starting workflow...
==> ERROR: Cannot start workflow test.1:
Input path declared multiple times: code/gendata.C

However, instead of throwing an error, it may be better to accept such a configuration and simply "deduplicate" the desired files and directories to be uploaded inside the client?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Especially because reana-client validate doesn't complain about the same file being listed multiple times. So, if we accept such a configuration as valid, we should allow processing it... Fully agree that these concerns are something for a new issue.)

@tiborsimko
Copy link
Member

As a quick solution to the problem I modified the code to detect the symlinked files and ignore it.

Yes, I don't think many people use the symlinks in sources, so as an MVP we could detect whether a file object is a symlink, and if it is, print a visible warning to the user that it'll be ignored (e.g. as we do for gitignore and friends) and continue with uploading the rest.

@audrium
Copy link
Member Author

audrium commented Sep 28, 2022

Updated PR with the following changes:

  • detect whether a file path is a symlink and print a warning to the user
  • prohibit uploading the same file twice
  • detect whether a directory is specified under files directive and vice versa and report to the user (with an error)

cmd/upload.go Outdated
return err
}
if pathInfo.IsDir() {
return fmt.Errorf("directory is found in files directive: %s", file)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("directory is found in files directive: %s", file)
return fmt.Errorf("found directory in `inputs.files`: %s", file)

WDYT?

cmd/upload.go Outdated
return err
}
if !pathInfo.IsDir() {
return fmt.Errorf("file is found in directories directive: %s", dir)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return fmt.Errorf("file is found in directories directive: %s", dir)
return fmt.Errorf("found file in `inputs.directories`: %s", dir)

WDYT?

@audrium audrium merged commit ba97769 into reanahub:master Oct 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cli: implement upload functionality
4 participants