Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rsync non-zero exit status when rsync'ing seqrepo #171

Open
brendanreardon opened this issue Dec 17, 2024 · 9 comments
Open

Rsync non-zero exit status when rsync'ing seqrepo #171

brendanreardon opened this issue Dec 17, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@brendanreardon
Copy link

Describe the bug
biocommons::seqrepo currently returns a non-zero exit status when pulling, as demonstrated in the All platforms Quick Start. This seems to be an issue when using rsync to get dl.biocommons.org::seqrepo. It seems be reproducible across seqrepo releases, and was also reproduced by a colleague. I am receiving this error when using biocommons.seqrepo==0.6.9, but I also am experiencing this error on release 0.6.8.

To Reproduce
To reproduce behavior, follow the All platforms Quick Start after ensuring that prior dependencies are installed.

(vrs) breardon@blueberry:seqrepo$ sudo mkdir -p /usr/local/share/seqrepo
(vrs) breardon@blueberry:seqrepo$ sudo chown $USER /usr/local/share/seqrepo
(vrs) breardon@blueberry:seqrepo$ seqrepo pull -i 2024-02-20
usage: rsync [-0468BCDEFHIKLOPRSTWVabcdghklnopqrtuvxyz] [-e program] [-f filter]
	[--8-bit-output] [--address=sourceaddr]
	[--append] [--backup-dir=dir] [--bwlimit=limit] [--cache | --no-cache]
	[--compare-dest=dir] [--contimeout] [--copy-dest=dir] [--copy-unsafe-links]
	[--del | --delete-after | --delete-before | --delete-during]
	[--delay-updates] [--dirs] [--no-dirs]
	[--exclude] [--exclude-from=file]
	[--extended-attributes]
	[--existing] [--force] [--ignore-errors]
	[--ignore-existing] [--ignore-non-existing] [--include]
	[--include-from=file] [--inplace] [--keep-dirlinks] [--link-dest=dir]
	[--max-delete=NUM] [--max-size=SIZE] [--min-size=SIZE]
	[--modify-window=sec] [--no-motd] [--numeric-ids]
	[--out-format=FMT] [--partial] [--password-file=pwfile] [--port=portnumber]
	[--progress] [--protocol] [--read-batch=file]
	[--remove-source-files] [--rsync-path=program] [--safe-links] [--size-only]
	[--sockopts=sockopts] [--specials] [--suffix] [--super] [--timeout=seconds]
	[--only-write-batch=file | --write-batch=file]
	source ... directory
Traceback (most recent call last):
  File "/opt/miniconda3/envs/vrs/bin/seqrepo", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/opt/miniconda3/envs/vrs/lib/python3.11/site-packages/biocommons/seqrepo/cli.py", line 732, in main
    opts.func(opts)
  File "/opt/miniconda3/envs/vrs/lib/python3.11/site-packages/biocommons/seqrepo/cli.py", line 539, in pull
    remote_instances = _get_remote_instances(opts)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/vrs/lib/python3.11/site-packages/biocommons/seqrepo/cli.py", line 64, in _get_remote_instances
    lines = subprocess.check_output(rsync_cmd).decode().splitlines()[1:]
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/vrs/lib/python3.11/subprocess.py", line 466, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/vrs/lib/python3.11/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['/usr/bin/rsync', '--no-motd', '--copy-dirlinks', 'dl.biocommons.org::seqrepo']' returned non-zero exit status 1.

Expected behavior
Successful rsync of seqrepo data.

Additional context
I'm running this on Mac OS 15.0.1, opensync protocol version 29, and rsync version 2.6.9.

Thank you!

@brendanreardon brendanreardon added the bug Something isn't working label Dec 17, 2024
@jsstevenson
Copy link
Contributor

jsstevenson commented Dec 17, 2024

Yeah, this is strange. I'm seeing it happen from the rsync command in get_remote_instances, which is:

rsync --no-motd --copy-dirlinks dl.biocommons.org::seqrepo

It's displaying the usage info, as if this is a syntax error. This is, of course, a command that has worked in the past, and my local man page shows rsync [OPTION]... SRC as a legal command (edit: I think the issue is that I now have openrsync, for which it's not a legal command per man openrsync). I'm not sure if Mac OS has updated its shipped rsync binary or something, but this is what I get from rsync --version:

openrsync: protocol version 29
rsync version 2.6.9 compatible

@korikuzma
Copy link

korikuzma commented Dec 17, 2024

So I ran the following (off-premise/off-vpn):

$ nc -zv dl.biocommons.org 873

Connection to dl.biocommons.org port 873 [tcp/rsync] succeeded!

and then:

$ rsync -v --no-motd dl.biocommons.org::seqrepo .

rsync: warning: receiver has empty file list: exiting

@reece Would you be able to provide some input if the seqrepo module is empty/missing or guidance on how to resolve this issue?

@jsstevenson
Copy link
Contributor

jsstevenson commented Dec 18, 2024

I think this is an openrsync vs rsync issue. You can install the latter from homebrew and then the behavior as hardcoded in seqrepo is correct:

[ main] ~/code/biocommons.seqrepo % /opt/homebrew/bin/rsync --no-motd --copy-dirlinks dl.biocommons.org::seqrepo
drwxrwxr-x          6,144 2024/11/25 10:06:07 .
dr-xr-xr-x          6,144 2016/08/27 23:25:40 2016-08-27
dr-xr-xr-x          6,144 2016/08/28 11:52:54 2016-08-28
dr-xr-xr-x          6,144 2016/09/06 13:36:03 2016-09-06
dr-xr-xr-x          6,144 2016/10/05 13:42:15 2016-10-04
dr-xr-xr-x          6,144 2016/10/26 00:38:49 2016-10-24
dr-xr-xr-x          6,144 2016/12/14 00:42:42 2016-12-13
dr-xr-xr-x          6,144 2017/07/04 18:40:02 2017-07-04
dr-xr-xr-x          6,144 2017/07/19 19:22:07 2017-07-19
dr-xr-xr-x          6,144 2017/09/08 12:34:12 2017-09-08
dr-xr-xr-x          6,144 2017/10/26 18:38:22 2017-10-26
dr-xr-xr-x          6,144 2017/11/18 17:11:49 2017-11-18
dr-xr-xr-x          6,144 2018/08/21 18:33:25 2018-08-21
dr-xr-xr-x          6,144 2018/10/03 15:24:55 2018-10-03
dr-xr-xr-x          6,144 2018/11/26 11:34:32 2018-11-26
dr-xr-xr-x          6,144 2019/06/21 22:44:11 2019-06-20
drwxr-xr-x          6,144 2020/04/12 23:49:03 2020-04-13
drwxr-xr-x          6,144 2020/10/15 20:46:04 2020-10-16
drwxr-xr-x          6,144 2020/10/27 12:40:39 2020-10-27
dr-xr-xr-x          6,144 2020/11/27 13:34:26 2020-11-27
dr-xr-xr-x          6,144 2021/01/04 19:46:39 2021-01-05
dr-xr-xr-x          6,144 2021/01/29 14:36:59 2021-01-29
dr-xr-xr-x          6,144 2024/02/20 15:05:03 2024-02-20
dr-xr-xr-x          6,144 2024/11/25 10:02:54 2024-05-23

I don't know if this will finish before I have to go to work, but it appears to be downloading properly:

[ main ⚙ .venv-312] ~/code/biocommons.seqrepo % seqrepo --rsync-exe /opt/homebrew/bin/rsync pull
receiving incremental file list
created directory /usr/local/share/seqrepo/2024-05-23.0vxk3h66
./
aliases.sqlite3
<downloading stuff>

@korikuzma
Copy link

@jsstevenson Ya, I agree with you

@brendanreardon
Copy link
Author

Huh! Good catch @jsstevenson. I downloaded rsync via conda-forge, specified the rsync-exe as you did to the new rsync path, and it also seems to be working.

(vrs) breardon@blueberry:2024-12-20$ seqrepo --rsync-exe /opt/miniconda3/envs/vrs/bin/rsync pull
receiving incremental file list
created directory /usr/local/share/seqrepo/2024-05-23.e2onempw
./
aliases.sqlite3
    869,302,272  55%    8.81MB/s    0:01:17

opensync also is not shown anymore with rsync --version

(vrs) breardon@blueberry:~$ rsync --version
rsync  version 3.3.0  protocol version 31
Copyright (C) 1996-2024 by Andrew Tridgell, Wayne Davison, and others.
Web site: https://rsync.samba.org/
Capabilities:
    64-bit files, 64-bit inums, 64-bit timestamps, 64-bit long ints,
    socketpairs, symlinks, symtimes, hardlinks, hardlink-specials,
    hardlink-symlinks, no IPv6, atimes, batchfiles, inplace, append, ACLs,
    xattrs, optional secluded-args, iconv, no prealloc, stop-at, crtimes
Optimizations:
    no SIMD-roll, no asm-roll, openssl-crypto, no asm-MD5
Checksum list:
    xxh128 xxh3 xxh64 (xxhash) md5 md4 sha1 none
Compress list:
    zstd lz4 zlibx zlib none
Daemon auth list:
    sha512 sha256 sha1 md5 md4

rsync comes with ABSOLUTELY NO WARRANTY.  This is free software, and you
are welcome to redistribute it under certain conditions.  See the GNU
General Public Licence for details.

@theferrit32
Copy link
Contributor

I am on MacOS 14.7.1 with the default apple-shipped rsync command (/usr/bin/rsync: version 2.6.9 protocol version 29) and this works for me:

SEQREPO_ROOT_DIR=$(pwd)/data seqrepo pull

I don't have openrsync installed. Maybe if it is installed it is shadowing the MacOS rsync command on the PATH

@jsstevenson
Copy link
Contributor

jsstevenson commented Jan 3, 2025

Yeah @theferrit32 it'll switch for you when you upgrade to Mac OS 15. The only external confirmation I can find of this is a random linkedin post but it tracks with what we're observing above.

btw, rsync is licensed under the GPL and openrsync seems to have been written for BSD, I assume that's why this has happened

@theferrit32
Copy link
Contributor

@jsstevenson that's inconvenient that they'd swap out a core command line utility and name the executable to be the same despite the API not being backwards compatible, and also not mention it in the release notes. I'm not seeing this in the MacOS or xcode release notes. But thanks for finding that.

@jsstevenson
Copy link
Contributor

"Inconvenient" is definitely one word for it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants