-
-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for executable bits that disagree with shebangs #1708
Conversation
It is rare that text files should be stored in a repository with a leading `#!` (shebang, a.k.a. hashbang) but `-x` permissions, or with no leading `#!` but `+x` permissions. But it is easy for this to happen by accident. This introduces a script to check for that situation -- currently only in files whose names end in `.sh`, which are the files in which this minor problem has tended to arise in this repository. This adds a `justfile` recipe for it, as well as a CI job that runs the script. (The CI job does not run it via `just`, since not doing so allows it to save time by not installing anything.) Currently, this: - Looks only at what is committed and staged, ignoring unstaged files and unstaged mode changes. (This is intended to allow it to be cross platform, because on Windows, Git repositories support the same modes as anywhere else, but the filesystem doesn't support Unix-style executable permissions.) - Is implemented as a shell script. Unlike `copy-packetline.sh`, there would be no major disadvantage to having this be Rust code instead, since it is never used to correct a problem that keeps Rust code from building. - Is called from a separate CI job than any others. But it could probably be called from one of the existing jobs instead. There are some files already in the repository that fail the new check, which should be given `+x` permissions. In this commit, they are kept as-is, and new files that should be detected as *wrongly* having `+x` permissions are added. This is to verify that the script is fully working as expected, including when run on CI. Once that is confirmed, the new test files can be removed, the scripts missing `+x` fixed, and the CI job made to run only on Ubuntu. (See the commented discussion in GitoxideLabs#1589 for further information.)
The version of `bash` on macOS is old and doed not have `lastpipe`. This uses process substitution instead. This also fixes a couple of bugs in the CI job: - Change the name of the CI job to be the same as the script basename and `justfile` recipe. (I'm not sure if `check-mode` or `check-modes` is better, but it should be the same for all.) - Add the job as a dependency of `tests-pass`. (Otherwise it will fail `check-blocking` if not explicitly listed as not being required for PR auto-merge. Since the problems it finds are usually easy to fix correctly, it should probably not be exempt.)
Five fixture scripts were found by `etc/check-mode.sh` to begin with `#!` but not be marked executable. This adds `+x` to them. (I inspected them to make sure that this, and not removing the shebangs, was the correct fix.) This commit also removes the two temporary test files that had been added to ensure `etc/check-mode.sh` is capable of identifying the opposite situation. The script is now expected to pass.
By running it only on Ubuntu (not macOS and Windows), and removing parts that were only needed for running it on Windows. Unlike `etc/copy-packetline.sh`, the `etc/check-mode.sh` script probably does not need to be made easy to retest on other platforms after changes, becaus it does not delete (nor otherwise modify) the repository, and becuase it is considerably shorter, simpler, and less reliant on subtleties of standard tools that vary across platforms.
Thanks a lot for your help, it's much appreciated!
I agree, and am happy to start with |
I put the actual and expected bytes in symbolic form for clearer debug output when running the script with Various approaches with So, yes, this is an area where it might become more intuitive when implemented in Rust. But really the part that I don't like is: Lines 6 to 7 in 34efe03
There's a lot of code in the world that uses Process substitution does not remove any newlines. It can be used, followed by the removal of exactly one newline character, but this isn't really less complicated or less confusing than that code. I think this is more natural to express in most languages, including Rust. |
Ignorance is a bliss. I am not aware of this, so I wouldn't have cared for it, and it should still have worked fine as long as |
In the rest of this comment, I always use In a Git repository The problem is that command substitution removes any number of trailing newlines. So in a repository This is only a problem for trailing newlines. Newlines that have any non-newline character anywhere after them, including consecutive newlines that have a non-newline character anywhere after them, are not a problem. It is rare for a filename to end in
This problem is usually minor, because it is rarely triggered and the impact of it being triggered is usually limited, and the case can sometimes be made for not bothering to avoid it. I especially try to avoid it in scripts that make changes. Having already addressed it in gitoxide/etc/copy-packetline.sh Lines 14 to 23 in 34efe03
In both scripts, the problem is avoided. In a repository (This would also work without passing Then, of course, it is necessary to remove This is not the only way to deal with the problem. For example, process substitution can be used, which preserves all newlines. Then one can make sure to remove a single newline. But because one must still remove the newline, using process substitution does not necessarily entail simpler code. I do not recommend being especially worried about this problem. However, what makes it even easier not to be worried is if one avoids it. So one of my goals is to make it easier to be confident about the code's correctness. Or, more importantly, to be able to place one's attention on other areas of the code, if any, that are suspected to be wrong. The treatment of trailing newlines is not the only modification performed in command substitution on output.
But the behavior of dropping |
Thanks a lot for elaborating! In any case, I think the explainer really would be great in its own "What you probably didn't know about
In this moment I think I'd definitely write it in Rust, just to not have to worry about that particular issue (and probably many more) anymore. |
In GitoxideLabs#1708, `check-mode.sh` deliberately parsed `git ls-files -sz` records using default nonempty `IFS`, to conveniently read fields into variables. But this would strip leading and trailing whitespace from the paths. Trailing whitespace is not currently present, since only files with `.sh` suffixes are processed, so the only paths that would be displayed incorrectly were those with leading whitespace. (We do not intentionally have such files in the repository, but they can arise accidentally, and they might also, in principle, be added at some point to support tests.) This was *sort of* okay, because the script never accesses the repository or filesystem using the paths, but only displays them to the user. But this was still bad for a few reasons: - The paths were at least in principle ambiguous. - A path with accidental leading whitespace would not be noticed. - The `path` variable looks like it would be usable, so it could be easily misused in future changes, if not fixed. - Having to think through whether or not this parsing bug is excessively serious, when reasoning about the code, may make the code more cumbersome to understand and maintain. This is to say that I should never have done it in that way. Therefore, this replaces the default-`IFS` logic with empty-`IFS` calls to `read` and regular-expression processing, using `bash`'s `[[` keyword, which supports a `=~` operator that matches against a regular expression and populates the `BASH_REMATCH` array variable with captures. (This approach to parsing `git ls-file -sz` is based on code I wrote in https://github.com/EliahKagan/gitscripts/blob/master/git-lsx.) As discussed in GitoxideLabs#1708, this script may be converted to Rust. This change is not intended as a substitute for that. It should actually make that slightly easier, in that a regex approach has clearer semantics, whether it is translated into code that uses regex or not. (If regex are used, it will probably still need adjustment, as not all dialects reocognize POSIX classes like `[[:digit:]]`.) This commit includes some files to test the fix in the script. They are at the top level of the repository, since the bug this fixes only occurs with whitespace at the beginning of the first path component (relative to the root of the working tree). These files will be removed shortly. The messages for them are: mode +x but no shebang: $'\tb.sh' mode -x but has shebang: \ a.sh Before the fix, they would have been shown as `a.sh` instead of `\ a.sh`, and `b.sh` instead of `$'\tb.sh'`. Note that the approach to quoting has not itself changed: the `%q` format specifier for the `printf` builtin in `bash` formats them this, when the paths are parsed in such a way that the characters that would need quoting if used in the shell are not lost.
In GitoxideLabs#1708, `check-mode.sh` deliberately parsed `git ls-files -sz` records using default nonempty `IFS`, to conveniently read fields into variables. But this would strip leading and trailing whitespace from the paths. Trailing whitespace is not currently present, since only files with `.sh` suffixes are processed, so the only paths that would be displayed incorrectly were those with leading whitespace. (We do not intentionally have such files in the repository, but they can arise accidentally. They might also, in principle, be added at some point to support tests. They would not likely be kept, because they are incompatible with Windowws and `git` will not check them out there, but they could plausibly exist in some commits on a feature branch, even with no mistakes.) This was *sort of* okay, because the script never accesses the repository or filesystem using the paths, but only displays them to the user. But this was still bad for a few reasons: - The paths were at least in principle ambiguous. - A path with accidental leading whitespace would not be noticed. - The `path` variable looks like it would be usable, so it could be easily misused in future changes, if not fixed. - Having to think through whether or not this parsing bug is excessively serious, when reasoning about the code, may make the code more cumbersome to understand and maintain. This is to say that I should never have done it in that way. Therefore, this replaces the default-`IFS` logic with empty-`IFS` calls to `read` and regular-expression processing, using `bash`'s `[[` keyword, which supports a `=~` operator that matches against a regular expression and populates the `BASH_REMATCH` array variable with captures. (This approach to parsing `git ls-file -sz` is based on code I wrote in https://github.com/EliahKagan/gitscripts/blob/master/git-lsx.) As discussed in GitoxideLabs#1708, this script may be converted to Rust. This change is not intended as a substitute for that. It should actually make that slightly easier, in that a regex approach has clearer semantics, whether it is translated into code that uses regex or not. (If regex are used, it will probably still need adjustment, as not all dialects reocognize POSIX classes like `[[:digit:]]`.) This commit includes some files to test the fix in the script. They are at the top level of the repository, since the bug this fixes only occurs with whitespace at the beginning of the first path component (relative to the root of the working tree). These files will be removed shortly. The messages for them are: mode +x but no shebang: $'\tb.sh' mode -x but has shebang: \ a.sh Before the fix, they would have been shown as `a.sh` instead of `\ a.sh`, and `b.sh` instead of `$'\tb.sh'`. Note that the approach to quoting has not itself changed: the `%q` format specifier for the `printf` builtin in `bash` formats them this, when the paths are parsed in such a way that the characters that would need quoting if used in the shell are not lost.
In GitoxideLabs#1708, `check-mode.sh` deliberately parsed `git ls-files -sz` records using default nonempty `IFS`, to conveniently read fields into variables. But this would strip leading and trailing whitespace from the paths. Trailing whitespace is not currently present, since only files with `.sh` suffixes are processed, so the only paths that would be displayed incorrectly were those with leading whitespace. (We do not intentionally have such files in the repository, but they can arise accidentally. They might also, in principle, be added at some point to support tests. They would not likely be kept, because they are incompatible with Windowws and `git` will not check them out there, but they could plausibly exist in some commits on a feature branch, even with no mistakes.) This was *sort of* okay, because the script never accesses the repository or filesystem using the paths, but only displays them to the user. But this was still bad for a few reasons: - The paths were at least in principle ambiguous. - A path with accidental leading whitespace would not be noticed. - The `path` variable looks like it would be usable, so it could be easily misused in future changes, if not fixed. - Having to think through whether or not this parsing bug is excessively serious, when reasoning about the code, may make the code more cumbersome to understand and maintain. This is to say that I should never have done it in that way. Therefore, this replaces the default-`IFS` logic with empty-`IFS` calls to `read` and regular-expression processing, using `bash`'s `[[` keyword, which supports a `=~` operator that matches against a regular expression and populates the `BASH_REMATCH` array variable with captures. (This approach to parsing `git ls-file -sz` is based on code I wrote in https://github.com/EliahKagan/gitscripts/blob/master/git-lsx.) As discussed in GitoxideLabs#1708, this script may be converted to Rust. This change is not intended as a substitute for that. It should actually make that slightly easier, in that a regex approach has clearer semantics, whether it is translated into code that uses regex or not. (If regex are used, it will probably still need adjustment, as not all dialects recognize POSIX classes like `[[:digit:]]`.) This commit includes some files to test the fix in the script. They are at the top level of the repository, since the bug this fixes only occurs with whitespace at the beginning of the first path component (relative to the root of the working tree). These files will be removed shortly. The messages for them are: mode +x but no shebang: $'\tb.sh' mode -x but has shebang: \ a.sh Before the fix, they would have been shown as `a.sh` instead of `\ a.sh`, and `b.sh` instead of `$'\tb.sh'`. Note that the approach to quoting has not itself changed: the `%q` format specifier for the `printf` builtin in `bash` formats them this, when the paths are parsed in such a way that the characters that would need quoting if used in the shell are not lost.
Yes, I have opened #1709 to fix a mistake I made in the script. 🤦♂️ |
As discussed in comments in #1589, this adds a shell script to report
.sh
files that have#!
but are not executable, or that don't have#!
but are executable. It also adds ajustfile
recipe to call it, and a CI job, though I suspect it might be slightly preferable to include it in or combine it with another CI job. Both for mode metadata and for file (blob) contents, this usesgit
to get information from the repository reflecting what is committed and staged, but not any unstaged files or changes.This found five fixture scripts that should have been marked executable but weren't. The opposite situation was not found, so I temporarily created some files to test it (see commit messages for details) and then removed them. The five fixture scripts that lacked
+x
now have it, so the new script reports success when run locally as well as when run on CI. The new script does not modify the repository; I ranchmod
to add+x
to the files it found.Not all
.sh
files in the repository have, or should have, executable permissions, because some are "libraries" rather than "programs": in particular, the journey tests have files they source, that only make sense to source and should not be run, and that do not have shebangs. The new script does not misdetect these as wrong, because their shebang status and executable status agree: they have no shebangs, and are not marked executable.Ideally we would scan not just
.sh
files but all files that Git considers to be text: those that are marked as such with.gitattributes
, as well as all files where this status is not specified by.gitattributes
but that Git auto-detects as text. After all, if a.rs
file were given+x
permissions, then that would be a mistake, which would be convenient to catch. We may also have other scripts besides shell scripts in the project at some point. Furthermore,justfile
is not currently scanned.Since all the mistakes of the kind this catches have happened in
.sh
files, I think this change as it stands may be worthwhile. I am interested in extending this to operate on all files Git considers to be text in its own operations. But my thinking is that it makes sense to put that off until I've decided whether I want to rewrite this in Rust. Maybe it will also have bearing on whether this should be rewritten in Rust.