Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always fall back to creating file symlinks on Windows #1425

Merged
merged 7 commits into from
Jun 27, 2024

Conversation

EliahKagan
Copy link
Member

@EliahKagan EliahKagan commented Jun 26, 2024

Fixes #1420
Fixes #1421
Fixes #1422

Changes

This adds tests that symlinks can specify targets with unusual names that, on Windows, are invalid or reserved (#1420), which I verified locally to fail until changes are made to the code under test; as well as a regression test that symlinks to directories on Windows are created as directory symlinks (#1422; see verification that the test is able to fail).

This also modifies gix-fs/src/symlink.rs, using the technique proposed in #1420 to convert all errors accessing metadata to guess the proper symlink type, and not just NotFound errors, into false. This falls back to creating file symlinks in more situations, solving #1420 and #1421.

This does not contain new tests specifically for #1421. Since the same technique fixed both, I am unsure if that is needed at this time. There may also be various other error conditions that would have prevented checkout or been misinterpreted as collisions that this now addresses by falling back to creating file symlinks, as Git does.

Comparison to Git

There is one subtlety, though not a new one, in comparison to the Git behavior. On Windows, Git repeatedly rechecks if symbolic links that it created as file symlinks should be replaced with directory symlinks. This allows Git to check out the best type of symlink in more situations where ordering would otherwise matter, though it still will check out a file symlink when the target is an existing but dangling directory symlink (thus it is still not intuitive), and I think it may have performance implications. Even if it does not cause performance problems, I think it may need to be implemented very carefully if done for gitoxide, since the facilities in the Rust standard library access a large amount of metadata with multiple Windows API calls plus further fallback calls. (The algorithm Git for Windows uses for this is worst-case quadratic in the number of files involved. If only files from the index were involved, then it would be possible to implement a linear-work alternative by using topological sort. But the files involved need not all be from the index.)

However, that difference between Git and gitoxide in the treatment of symlinks was already present. This leaves that difference unchanged, and merely converts various situations that were previously errors into successful creation of file symlinks.

Manual testing

I have repeated the manual testing described in #1420 and #1421 after installing the version from this feature branch with cargo install --path ., further verifying the fixes.

With the repository that has a symlink to ???:

C:\Users\ek\src> gix clone [email protected]:EliahKagan/symlink-to-qmarks.git
 01:28:31 indexing done 7.0 objects in 0.00s (24.3K objects/s)
 01:28:31 decompressing done 1.5KB in 0.00s (5.0MB/s)
 01:28:31     Resolving done 7.0 objects in 0.05s (137.0 objects/s)
 01:28:31      Decoding done 1.5KB in 0.05s (29.6KB/s)
 01:28:31 writing index file done 1.3KB in 0.00s (5.1MB/s)
 01:28:31  create index file done 7.0 objects in 0.06s (110.0 objects/s)
 01:28:31          read pack done 1.1KB in 0.06s (16.9KB/s)
 01:28:31           checkout done 3.0 files in 0.00s (1.8K files/s)
 01:28:31            writing done 918B in 0.00s (549.7KB/s)
HEAD:refs/remotes/origin/HEAD (implicit)
        1f0a60ad33050fe90560d80b752f49d6b2121b06 HEAD symref-target:refs/heads/main -> refs/remotes/origin/HEAD [new]
+refs/heads/*:refs/remotes/origin/*
        1f0a60ad33050fe90560d80b752f49d6b2121b06 refs/heads/main -> refs/remotes/origin/main [new]
C:\Users\ek\src> cmd /c dir symlink-to-qmarks
 Volume in drive C is OS
 Volume Serial Number is B203-10FB

 Directory of C:\Users\ek\src\symlink-to-qmarks

06/26/2024  01:28 AM    <DIR>          .
06/26/2024  01:28 AM    <DIR>          ..
06/26/2024  01:28 AM    <DIR>          .git
06/26/2024  01:28 AM               617 COPYING
06/26/2024  01:28 AM               298 README.md
06/26/2024  01:28 AM    <SYMLINK>      symlink [???]
               3 File(s)            915 bytes
               3 Dir(s)  88,286,535,680 bytes free

With the repository that has a symlink to CON:

C:\Users\ek\src> gix clone [email protected]:EliahKagan/symlink-to-con.git
 01:29:00 indexing done 7.0 objects in 0.00s (21.1K objects/s)
 01:29:00 decompressing done 1.6KB in 0.00s (4.7MB/s)
 01:29:00     Resolving done 7.0 objects in 0.05s (137.0 objects/s)
 01:29:00      Decoding done 1.6KB in 0.05s (32.3KB/s)
 01:29:00 writing index file done 1.3KB in 0.00s (4.3MB/s)
 01:29:00  create index file done 7.0 objects in 0.06s (111.0 objects/s)
 01:29:00          read pack done 1.2KB in 0.07s (17.3KB/s)
 01:29:00           checkout done 3.0 files in 0.00s (1.8K files/s)
 01:29:00            writing done 1.1KB in 0.00s (646.6KB/s)
HEAD:refs/remotes/origin/HEAD (implicit)
        fe64a5bc3362f7657fbb9e5da495101b5d815fe5 HEAD symref-target:refs/heads/main -> refs/remotes/origin/HEAD [new]
+refs/heads/*:refs/remotes/origin/*
        fe64a5bc3362f7657fbb9e5da495101b5d815fe5 refs/heads/main -> refs/remotes/origin/main [new]
C:\Users\ek\src> cmd /c dir symlink-to-con
 Volume in drive C is OS
 Volume Serial Number is B203-10FB

 Directory of C:\Users\ek\src\symlink-to-con

06/26/2024  01:29 AM    <DIR>          .
06/26/2024  01:29 AM    <DIR>          ..
06/26/2024  01:29 AM    <DIR>          .git
06/26/2024  01:29 AM               617 COPYING
06/26/2024  01:29 AM               441 README.md
06/26/2024  01:29 AM    <SYMLINK>      symlink [CON]
               3 File(s)          1,058 bytes
               3 Dir(s)  88,287,436,800 bytes free

With the repository that has a symlink to a file that, on my system, cannot even have its metadata accessed due to giving a permission error:

C:\Users\ek\src> gix clone [email protected]:EliahKagan/symlink-to-inaccessible.git
 01:29:19 indexing done 19.0 objects in 0.00s (43.0K objects/s)
 01:29:19 decompressing done 3.6KB in 0.00s (7.7MB/s)
 01:29:19     Resolving done 19.0 objects in 0.05s (373.0 objects/s)
 01:29:19      Decoding done 3.6KB in 0.05s (70.1KB/s)
 01:29:19 writing index file done 1.6KB in 0.00s (5.4MB/s)
 01:29:19  create index file done 19.0 objects in 0.06s (305.0 objects/s)
 01:29:19          read pack done 2.5KB in 0.06s (40.2KB/s)
 01:29:19           checkout done 3.0 files in 0.00s (1.6K files/s)
 01:29:19            writing done 1.5KB in 0.00s (774.6KB/s)
HEAD:refs/remotes/origin/HEAD (implicit)
        77d5afef8ef5ff6f3af10ab900cd1a08e18fc1cc HEAD symref-target:refs/heads/main -> refs/remotes/origin/HEAD [new]
+refs/heads/*:refs/remotes/origin/*
        77d5afef8ef5ff6f3af10ab900cd1a08e18fc1cc refs/heads/main -> refs/remotes/origin/main [new]
C:\Users\ek\src> cmd /c dir symlink-to-inaccessible
 Volume in drive C is OS
 Volume Serial Number is B203-10FB

 Directory of C:\Users\ek\src\symlink-to-inaccessible

06/26/2024  01:29 AM    <DIR>          .
06/26/2024  01:29 AM    <DIR>          ..
06/26/2024  01:29 AM    <DIR>          .git
06/26/2024  01:29 AM               617 COPYING
06/26/2024  01:29 AM               826 README.md
06/26/2024  01:29 AM    <SYMLINK>      symlink [\Program Files\WindowsApps\MutableBackup]
               3 File(s)          1,443 bytes
               3 Dir(s)  88,286,195,712 bytes free

In addition, I have further tested that the changes do not break the creation of directory symlinks where that was already working, by cloning this test repository. Unlike the above results, this is the same as before the changes (because it was already working before the changes):

C:\Users\ek\src> gix clone [email protected]:EliahKagan/has-symlink-to-directory.git
 01:27:49 indexing done 9.0 objects in 0.00s (18.0K objects/s)
 01:27:49 decompressing done 1.6KB in 0.00s (2.9MB/s)
 01:27:49     Resolving done 9.0 objects in 0.05s (175.0 objects/s)
 01:27:49      Decoding done 1.6KB in 0.05s (31.4KB/s)
 01:27:49 writing index file done 1.3KB in 0.00s (5.5MB/s)
 01:27:49  create index file done 9.0 objects in 0.07s (127.0 objects/s)
 01:27:49          read pack done 1.2KB in 0.12s (9.6KB/s)
 01:27:49           checkout done 4.0 files in 0.00s (1.6K files/s)
 01:27:49            writing done 949B in 0.00s (380.7KB/s)
HEAD:refs/remotes/origin/HEAD (implicit)
        0e838aaff928701c050ec77dc46a66d16aa75464 HEAD symref-target:refs/heads/main -> refs/remotes/origin/HEAD [new]
+refs/heads/*:refs/remotes/origin/*
        0e838aaff928701c050ec77dc46a66d16aa75464 refs/heads/main -> refs/remotes/origin/main [new]
C:\Users\ek\src> cmd /c dir has-symlink-to-directory
 Volume in drive C is OS
 Volume Serial Number is B203-10FB

 Directory of C:\Users\ek\src\has-symlink-to-directory

06/26/2024  01:27 AM    <DIR>          .
06/26/2024  01:27 AM    <DIR>          ..
06/26/2024  01:27 AM    <DIR>          .git
06/26/2024  01:27 AM    <SYMLINKD>     a [b]
06/26/2024  01:27 AM    <DIR>          b
06/26/2024  01:27 AM               617 COPYING
06/26/2024  01:27 AM               331 README.md
               2 File(s)            948 bytes
               5 Dir(s)  88,283,258,880 bytes free

Other considerations

I have checked on another branch, as well as locally, that tests pass even without generated archives.

I have proceeded conservatively with respect to the shell script heredoc style issues raised in #1423, leaving the seemingly unnecessary use in place in <<- even where in existing tests that are directly relevant to the changes here, while using << in the new tests. I'd be pleased to change this in either direction; but that could probably be done separately and later.

These are effectively special cases of dangling symlinks.

They should be treated the same as ordinary dangling symlinks, but
the error kind isn't NotFound for these, so currently they are not
created. The new tests should pass once that is fixed.

See GitoxideLabs#1420.
When the metadata of a symlink's target cannot be obtained, even if
the error is something other than `NotFound`, this falls back to
creating file symbolic links. This only affects scenarios where
either the checkout would fail entirely or where the symlink would
have been treated as a collision and skipped (even though it was
not really a collision, since only its target had an error). Other
cases are not affected, and all exisitng scenarios where directory
symlink would be created will still create directory symlinks.

This builds on 31d02a8 (GitoxideLabs#1363) by supporting dangling symlinks even
when the target filenames are unusual, such as when the name is
invalid or reserved. Windows permits such symlinks to be created,
and going ahead and creating the matches the Git behavior.

This should also support other errors beisdes `NotFound`. For
example, some permissions-related errors, in some cases where
traversal or acccess (even to access metadata) are not allowed,
would fail to create a symlink. This should address that as well.

This works by using `Path::is_dir()` in the standard library, which
automatically converts all errors (not just `NotFound`) into
`false`. The logic here is thus quite similar to what was already
present, just more tolerant, even though the code itself is shorter
and simpler.

This fixes GitoxideLabs#1420, and also fixes GitoxideLabs#1421.
At least as `std::fs::metdata` currently works on Windows, this
entails that symlinks to directories on Windows are created as
directory symlinks rather than as file symlinks (since file
symlinks cannot always be automatically dereferenced, and accessing
metadata via the `stat`-like `std::fs::metadata` function is one
situation where that cannot be done).

This fixes GitoxideLabs#1422. Note that this issue pertains solely to what the
test suite covers; the issue is not a bug in the code under test,
and this commit does not modify the code under test.
Generated on an Ubuntu 22.04 LTS system by running all tests with
`cargo nextest run --all --no-fail-fast`, verifying that all tests
pass, and committing the newly created archive files, which
correspond to the new invalid and reserved target symlink tests
and the directory symlink test.
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the fix and all the research that went into it.

Your diligence (among other traits) I look up to and attempt to apply similar standards in my own work. My primary school teacher always said one should reach for the impossible to reach the possible :D.

With that said, I have a small request regarding deduplication of tests, but with that done I am looking forward to merging.

gix-worktree-state/tests/state/checkout.rs Outdated Show resolved Hide resolved
This helps to clarify, rather than obscure, their similarities,
including the similarity between the old test case and the new ones
that cover dangling symlinks whose targets are unusually named.

For now, the associated fixtures remain separate.
@EliahKagan EliahKagan requested a review from Byron June 26, 2024 17:10
Copy link
Member

@Byron Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work, thanks a lot!

@Byron Byron merged commit 6df6e84 into GitoxideLabs:main Jun 27, 2024
19 checks passed
@EliahKagan EliahKagan deleted the strange-symlink-targets branch June 27, 2024 05:31
EliahKagan added a commit to EliahKagan/gitoxide that referenced this pull request Jun 30, 2024
These are the changed archives generated by fixture scripts by
running `cargo nextest run --all --no-fail-fast` on an Ubuntu 22.04
system. (All tests passed, as expected.)

The changed archives divide into two cases. First, there are those
that are changed due to the stylistic changes to heredocs in the
preceding commit 2641f8b (which change the CRC32 hashes of the
scripts and thus cause archives to be regenerated):

* gix-index/tests/fixtures/generated-archives/v2_icase_name_clashes.tar
* gix-worktree-state/tests/fixtures/generated-archives/make_dangling_symlink.tar
* gix/tests/fixtures/generated-archives/make_rev_spec_parse_repos.tar

Second, there are those that were already being generated when the
tests were run, rather than using the committed archives:

* gix-diff/tests/fixtures/generated-archives/make_diff_repo.tar
* gix-odb/tests/fixtures/generated-archives/make_repo_multi_index.tar

These were being generated every time that `nextest` command was
run on Unix-like systems (or at least GNU/Linux and macOS). I don't
know why this was happening, but it suggests a bug somewhere.

- That oddity precedes 55c635a (GitoxideLabs#1425), I just didn't commit the
  two archives not related to the changes being made there at that
  time, since unlike here, that was not a cleanup PR.

- It precedes, or mostly precedes, the change in dcab79a (GitoxideLabs#1415). At
  least the first of those archives, make_diff_repo.tar, already
  behaved this way in its .tar.xz form before that, as noted in:
  GitoxideLabs#1361 (comment)
LuaKT pushed a commit to LMonitor/gitoxide that referenced this pull request Aug 20, 2024
These are the changed archives generated by fixture scripts by
running `cargo nextest run --all --no-fail-fast` on an Ubuntu 22.04
system. (All tests passed, as expected.)

The changed archives divide into two cases. First, there are those
that are changed due to the stylistic changes to heredocs in the
preceding commit 2641f8b (which change the CRC32 hashes of the
scripts and thus cause archives to be regenerated):

* gix-index/tests/fixtures/generated-archives/v2_icase_name_clashes.tar
* gix-worktree-state/tests/fixtures/generated-archives/make_dangling_symlink.tar
* gix/tests/fixtures/generated-archives/make_rev_spec_parse_repos.tar

Second, there are those that were already being generated when the
tests were run, rather than using the committed archives:

* gix-diff/tests/fixtures/generated-archives/make_diff_repo.tar
* gix-odb/tests/fixtures/generated-archives/make_repo_multi_index.tar

These were being generated every time that `nextest` command was
run on Unix-like systems (or at least GNU/Linux and macOS). I don't
know why this was happening, but it suggests a bug somewhere.

- That oddity precedes 55c635a (GitoxideLabs#1425), I just didn't commit the
  two archives not related to the changes being made there at that
  time, since unlike here, that was not a cleanup PR.

- It precedes, or mostly precedes, the change in dcab79a (GitoxideLabs#1415). At
  least the first of those archives, make_diff_repo.tar, already
  behaved this way in its .tar.xz form before that, as noted in:
  GitoxideLabs#1361 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants