Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created SCP wrapper #51

Draft
wants to merge 40 commits into
base: main
Choose a base branch
from
Draft

Created SCP wrapper #51

wants to merge 40 commits into from

Conversation

Shofiya2003
Copy link
Contributor

Handled all four cases for different sources and destinations. Tested using a SSH server on docker .
Key steps implemented when SCP is used to transfer files:

  1. Retrieving Source Inode Information

    • Extracted the src_inode_version and src_inode_metadata from the source.
  2. Uploading Files

    • Uploaded the necessary files from the source to the destination.
    • Retrieving Destination Inode Information
    • Extracted the dest_inode_version and dest_inode_metadata from the source.
  3. run get_prov_upstream function

  4. Obtaining Remote Home Directory

    • Determined the remote home directory for the destination.
  5. Directory Creation on Destination

    • Created the necessary directories on the destination
  6. Uploading Source Inode Version and Process Id

    • Transferred the src_inode_version and process_id from the source to the destination.
  7. Random Process ID Generation

    • Generated a random_pid to uniquely reference the scp process involved.
  8. Creating SCP Process JSON

    • Created a JSON file representing the SCP process
  9. Linking SCP Process ID to Destination Inode Version

    • Added a reference to the SCP process ID in the destination's inode version metadata.
  10. Copying SCP Process Information to Destination

    • Transferred the SCP process information to a designated file on the destination.

@Shofiya2003 Shofiya2003 marked this pull request as draft August 15, 2024 20:46
@Shofiya2003
Copy link
Contributor Author

Contains the use of function present in #44

@Shofiya2003 Shofiya2003 marked this pull request as ready for review August 17, 2024 21:32
probe_src/python/probe_py/manual/cli.py Outdated Show resolved Hide resolved
probe_src/python/probe_py/manual/cli.py Outdated Show resolved Hide resolved
probe_src/python/probe_py/manual/cli.py Outdated Show resolved Hide resolved
probe_src/python/probe_py/manual/cli.py Outdated Show resolved Hide resolved
Comment on lines 213 to 214
destination = cmd[-1]
source = cmd[-2]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reading the man-page, I noticed something: like cp, scp supports multiple arguments, with the last being the destination. scp remote0:file0.txt remote1:file1.txt remote2:destination_directory/, and the others being sources.

This changes the "four-cases" approach a small bit, but I think it actually makes it a lot cleaner. We should have get_source_prov_info(host_path_pair) -> list[ProvInfo] (think of a better name), which has two case local or remote. We should be able to "accumulate" all of the returned values together into one big list and send it to put_prov_info(host_path_pair, prov_info_list), which has two case: putting on a local or on a remote.

So there are still four cases: 2 in get_source_prov and 2 in write_dest_prov.

probe_src/python/probe_py/manual/cli.py Outdated Show resolved Hide resolved
probe_src/python/probe_py/manual/cli.py Outdated Show resolved Hide resolved
probe_src/python/probe_py/manual/cli.py Outdated Show resolved Hide resolved
"ssh",
f"-p {port}",
f"{remote_user}@{remote_host}",
f'python3 -c \'import os, json; stat_info = os.stat("{file_path}"); device_id = stat_info.st_dev; result = {{"device_major": os.major(device_id), "device_minor": os.minor(device_id)}}; print(json.dumps(result))\'',
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this? Stat has major and minor device as fields:

$ stat --help
...
  %Hd  major device number in decimal
  %Ld  minor device number in decimal
...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@charmoniumQ I tried using f'stat -c "device_major: %t\ndevice_minor: %T\n" {file_path}' instead but it gave me value 0 for both device-major and device-minor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probe_src/tasks.md Show resolved Hide resolved
@charmoniumQ charmoniumQ marked this pull request as draft August 20, 2024 06:17
@Shofiya2003 Shofiya2003 marked this pull request as ready for review August 25, 2024 19:53
Copy link
Owner

@charmoniumQ charmoniumQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will have more comments later. But first things first.

probe_src/python/probe_py/manual/cli.py Outdated Show resolved Hide resolved
probe_src/python/probe_py/manual/remote_access.py Outdated Show resolved Hide resolved
probe_src/python/probe_py/manual/remote_access.py Outdated Show resolved Hide resolved
Comment on lines 48 to 50
port = extract_port_from_scp_command(cmd)
if port is None:
port = 22
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe translate_scp_to_ssh_args will correctly set the port, so we should avoid special-handling of it here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This port is for the other scp command we use to transfer the process-by-id and process_id_that_wrote_inode_version files. I believe all options of the main scp command will apply to the file transfer except for -r. Instead of including this option, I should collect the scp_options and use them to run the subsequent scp commands.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probe_src/python/probe_py/manual/remote_access.py Outdated Show resolved Hide resolved
Comment on lines 70 to 73
except Exception as e:
traceback.print_exc()
print(str(e))

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's not re-handle the exception like this. We should use the default exception handler wherever possible. If the default one is giving you difficulties, let me know and we can figure out how to configure it differently.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove this. The only advantage it gives me is that traceback.print_exc takes me to the exact location where the error occurred, whereas the default exception handling only provides the line number and file.

probe_src/python/probe_py/manual/remote_access.py Outdated Show resolved Hide resolved
probe_src/python/probe_py/manual/remote_access.py Outdated Show resolved Hide resolved
probe_src/python/probe_py/manual/remote_access.py Outdated Show resolved Hide resolved
Comment on lines 67 to 69
prov_upload_src_local(src_inode_version, src_inode_metadata, cmd, src_file_path, destination, port)
else:
prov_upload_src_remote(sources[idx], src_inode_version, src_inode_metadata, cmd, destination, port)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should strive to take all the relevant information we would need to write the prov graph in get_source_info, including the dataflow graph producing the source inode. Let's discuss if there is some corner case making this impossible.

If we can get all the source info up-front, we don't need to "care" whether the source is local or remote; just the src_inode_metadata that we already got. However, the combined prov_upload(..., destination) will have to have a case for local destination and remote destination.

I hope this suggestion simplifies the code, since it means we can basically delete one of the two prov_upload_src_* functions.

@charmoniumQ charmoniumQ marked this pull request as draft August 26, 2024 15:56
@Shofiya2003 Shofiya2003 marked this pull request as ready for review September 9, 2024 23:45
@charmoniumQ charmoniumQ marked this pull request as draft October 16, 2024 20:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants