-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add --also-remote
option to verdi process dump
#6578
base: main
Are you sure you want to change the base?
Add --also-remote
option to verdi process dump
#6578
Conversation
Tries to open a connection to the remote Computer using the transport of the CalcJobNode, and `get`s the files and folders which were not originally dumped, either as they weren't part of the CalcJobNode's repository or retrieved.
469ecfa
to
5693a63
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6578 +/- ##
==========================================
+ Coverage 77.51% 77.82% +0.31%
==========================================
Files 560 566 +6
Lines 41444 42077 +633
==========================================
+ Hits 32120 32741 +621
- Misses 9324 9336 +12 ☔ View full report in Codecov by Sentry. |
|
||
# Transport.get() works for files and folders, so we don't need to make a distinction | ||
for rem, loc in zip(remote_paths, local_paths): | ||
transport.get(str(rem), str(loc), ignore_nonexisting=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think transport.get()
has overwrite=True
for default.
Maybe in this case it makes sense to issue a warning (that I'm overwriting your local files)
A one other points, also mentioned by others:
- maybe one can pass glob patterns (e.g. all
*.log
files)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, good point! In principle, dumping has an overwrite argument, as well. Though, I'm getting a bit tired of populating this setting down the hierarchy to the different sub-WorkChains and CalcJobs. So I've been thinking of either making it global, for just the parent directory of the dumping, or possibly making it more fine-grained. The remote files are a case where the latter case could make sense.
Currently working on the possibility to provide explicit list of files or glob patterns to include/exclude! 🚀
Thanks @GeigerJ2 I like this feature, |
is_flag=True, | ||
default=False, | ||
show_default=True, | ||
help='If true, try to obtain also intermediate files on the remote computer that were not initially retrieved.', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps give a bit more detail here, e.g., mention that this requires opening a connection to the remote computer and that it possible cannot retrieve the files if the remote folder has been altered or deleted entirely in the meantime
) | ||
|
||
if also_remote: | ||
echo.echo_report( | ||
'`--also-remote` set to True. Will try to retrieve additional files from the `workdir` of the remote Computer.' # noqa: E501 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not really the workdir of the Computer (which is the base directory of the scratch) but rather of the calcjob itself
'`--also-remote` set to True. Will try to retrieve additional files from the `workdir` of the remote Computer.' # noqa: E501 | ||
) | ||
echo.echo_report( | ||
'If files are non-existent, they will be skipped silently. Check if the output files are what you expect.' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a good idea? will be difficult to tell if they werent copied because they no longer exist or because there was a problem
|
||
remote_workdir = calcjob_node.get_remote_workdir() | ||
if remote_workdir is None: | ||
raise NotExistentAttributeError(f"CalcJobNode <{calcjob_node.pk}> doesn't have a `remote_workdir`.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not caught, do you really want to let this let the entire dump fail? Perhaps just log a warning and move on
retrieve_list = list(calcjob_node.get_retrieve_list()) # type: ignore[arg-type] | ||
repository_list = calcjob_node.base.repository.list_object_names() | ||
exclude_list = retrieve_list + repository_list |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few things here:
- Shouldn't everything that it is in the retrieve list also be in the repository list? Maybe there may be more in the retrieve list, but that would then correspond to a file that doesn't exist, so it will presumable still not be there on the remote, so it can be ignored.
- a bigger problem though I think is that the
retrieve_list
can contain wildcards and I don't think you currently handle them in the code below right? If you just rely on the contents of the repository, this problem disappears.
|
||
# Getting the transport fails, propagate exception outwards | ||
# (could just remove the except, but being explicit might make it clearer here) | ||
except ConfigurationError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wait, a ConfigurationError
? Where is that being thrown? Looks like a different problem.
Tries to open a connection to the remote Computer using the transport of the CalcJobNode, and
get
s the files and folders which were not originally dumped, either as they weren't part of the CalcJobNode's repository or retrieved.Will add the tests once the implementation is somewhat stable.
Don't want to assign you officially for review, but still pinging @khsrali and @agoscinski here if you guys want to have a look 👀
Note:
With PR #6584, it'll be possible to give a warning to the user about the total size of files that would be written to disk. We might eventually decide that it takes too long and one has to open too many transports, just to obtain the total file size to warn a user, so one could just issue a generic warning. The problem is that these files and directories are on the remote computer (if not deleted already), so one has to connect to get information about them.
Also, the returned file size will differ, as some of the files in the directory to which the
RemoteData
points to, will have already been retrieved. This can be checked for, based on e.g.retrieve_list
,local_copy_list
, etc.