-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created SCP wrapper #51
base: main
Are you sure you want to change the base?
Conversation
…e on both source and destination to update the provenance records
Contains the use of function present in #44 |
destination = cmd[-1] | ||
source = cmd[-2] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When reading the man-page, I noticed something: like cp
, scp
supports multiple arguments, with the last being the destination. scp remote0:file0.txt remote1:file1.txt remote2:destination_directory/
, and the others being sources.
This changes the "four-cases" approach a small bit, but I think it actually makes it a lot cleaner. We should have get_source_prov_info(host_path_pair) -> list[ProvInfo]
(think of a better name), which has two case local or remote. We should be able to "accumulate" all of the returned values together into one big list and send it to put_prov_info(host_path_pair, prov_info_list)
, which has two case: putting on a local or on a remote.
So there are still four cases: 2 in get_source_prov
and 2 in write_dest_prov
.
"ssh", | ||
f"-p {port}", | ||
f"{remote_user}@{remote_host}", | ||
f'python3 -c \'import os, json; stat_info = os.stat("{file_path}"); device_id = stat_info.st_dev; result = {{"device_major": os.major(device_id), "device_minor": os.minor(device_id)}}; print(json.dumps(result))\'', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this? Stat has major and minor device as fields:
$ stat --help
...
%Hd major device number in decimal
%Ld minor device number in decimal
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@charmoniumQ I tried using f'stat -c "device_major: %t\ndevice_minor: %T\n" {file_path}'
instead but it gave me value 0 for both device-major
and device-minor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…ic of get stats on remote function to avoid a conversion error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will have more comments later. But first things first.
port = extract_port_from_scp_command(cmd) | ||
if port is None: | ||
port = 22 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe translate_scp_to_ssh_args
will correctly set the port, so we should avoid special-handling of it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This port is for the other scp command we use to transfer the process-by-id and process_id_that_wrote_inode_version files. I believe all options of the main scp command will apply to the file transfer except for -r. Instead of including this option, I should collect the scp_options and use them to run the subsequent scp commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
except Exception as e: | ||
traceback.print_exc() | ||
print(str(e)) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's not re-handle the exception like this. We should use the default exception handler wherever possible. If the default one is giving you difficulties, let me know and we can figure out how to configure it differently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can remove this. The only advantage it gives me is that traceback.print_exc
takes me to the exact location where the error occurred, whereas the default exception handling only provides the line number and file.
prov_upload_src_local(src_inode_version, src_inode_metadata, cmd, src_file_path, destination, port) | ||
else: | ||
prov_upload_src_remote(sources[idx], src_inode_version, src_inode_metadata, cmd, destination, port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should strive to take all the relevant information we would need to write the prov graph in get_source_info
, including the dataflow graph producing the source inode. Let's discuss if there is some corner case making this impossible.
If we can get all the source info up-front, we don't need to "care" whether the source is local or remote; just the src_inode_metadata
that we already got. However, the combined prov_upload(..., destination)
will have to have a case for local destination and remote destination.
I hope this suggestion simplifies the code, since it means we can basically delete one of the two prov_upload_src_*
functions.
…_options and sources
…e code to accommodate this change
Co-authored-by: Samuel Grayson <[email protected]>
Co-authored-by: Sam Grayson <[email protected]>
Handled all four cases for different sources and destinations. Tested using a SSH server on docker .
Key steps implemented when SCP is used to transfer files:
Retrieving Source Inode Information
src_inode_version
andsrc_inode_metadata
from the source.Uploading Files
dest_inode_version
anddest_inode_metadata
from the source.run get_prov_upstream function
Obtaining Remote Home Directory
Directory Creation on Destination
Uploading Source Inode Version and Process Id
src_inode_version
andprocess_id
from the source to the destination.Random Process ID Generation
random_pid
to uniquely reference the scp process involved.Creating SCP Process JSON
Linking SCP Process ID to Destination Inode Version
Copying SCP Process Information to Destination