Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[osh] Test ${a[@]:-}, ${a[*]:-}, etc. based on the string result #2215

Merged
merged 9 commits into from
Jan 3, 2025

Conversation

akinomyoga
Copy link
Collaborator

This follows the behavioral change from Bash 4.4 to 5.0.

osh/word_eval.py Outdated Show resolved Hide resolved
Copy link
Contributor

@andychu andychu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thank you, it looks like this improves bash compatibility so it's good

Although now I am sort of interested in "${@:-}" and so forth because it's a POSIX construct


I wonder if you can also improve test case 18, because it looks like we are missing coverage there. I think it may hit these same code paths

Or add another test case below 18


I am a bit surprised by the difference between

set -- ""
set -- "" ""

and also zsh is different

#### $@ and - and +
echo argv=${@-minus}
echo argv=${@+plus}
echo argv=${@:-minus}
echo argv=${@:+plus}

is_falsey = False
break
else:
is_falsey = len(strs) == 0 or (len(strs) == 1 and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm this seems worth documenting in doc/ref/chap-word-lang.md under op-test ?

I didn't know this !

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is actually a combination of $* / $@, double quoting, and IFS. I just calculated the width of the resulting length without actually calculating the string, which may look non-trivial. I'll add documentation and also a code comment here.

Copy link
Collaborator Author

@akinomyoga akinomyoga Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I realized that OSH has another divergence from Bash:

$ bash -c 'IFS=; set -- "1x2" "3x4"; echo $*'
1x2 3x4
$ bin/osh -c 'IFS=; set -- "1x2" "3x4"; echo $*'
1x23x4

This is actually specified by POSIX, so all shells except OSH is consistent:

$ busybox sh -c 'IFS=; set -- "1x2" "3x4"; echo $*'
1x2 3x4
$ dash -c 'IFS=; set -- "1x2" "3x4"; echo $*'
1x2 3x4
$ ksh -c 'IFS=; set -- "1x2" "3x4"; echo $*'
1x2 3x4
$ mksh -c 'IFS=; set -- "1x2" "3x4"; echo $*'
1x2 3x4
$ zsh -c 'IFS=; set -- "1x2" "3x4"; echo $*'
1x2 3x4
$ yash -c 'IFS=; set -- "1x2" "3x4"; echo $*'
1x2 3x4

@andychu Is there a way to tell whether the currently evaluated variable substitution is inside double quotations or not? SimpleVarSub and BracedVarSub don't seem to contain the necessary context information.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added 228ea80.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK interesting, yeah that looks like a bug

Regarding IFS=, we have to rewrite word splitting ... long story, but you can see there are multiple bugs reported over the years in test/spec.sh word-split

This is one of the biggest remaining areas of divergence

I am not sure if this is the same bug


For $*, is the problem simply that we should ignore IFS?? We should always use a space rather than the join char? Not sure

$ dash -c 'IFS=x; set -- a b; echo $*'
a b

$ dash -c 'IFS=; set -- a b; echo $*'
a b

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically I am not sure that we need to see if it's double quoted or not, for this case?

But I know that is an issue with word splitting in general. We have a complex set of data structures like Piece that know about quoting

Copy link
Collaborator Author

@akinomyoga akinomyoga Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Basically I am not sure that we need to see if it's double quoted or not, for this case?

I think so (edit: I added a TODO and a failing test in commit 7ea2d8b). POSIX says (list formatting is added by me):

$*

Expands to the positional parameters, starting from one, initially producing one field for each positional parameter that is set.

  • When the expansion occurs in a context where field splitting will be performed, any empty fields may be discarded and each of the non-empty fields shall be further split as described in 2.6.5 Field Splitting.
  • When the expansion occurs in a context where field splitting will not be performed, the initial fields shall be joined to form a single field with the value of each parameter separated by the first character of the IFS variable if IFS contains at least one character, or separated by a if IFS is unset, or with no separation if IFS is set to a null string.

So, IFS is used only in the context where field splitting (word splitting) is inactive. OSH now uses IFS regardless of the context, which cause the deviation.

But I know that is an issue with word splitting in general. We have a complex set of data structures like Piece that know about quoting

I think we can include the context information in SimpleVarSub and BracedVarSub at the parser or lexer stage (instead of analyzing the AST on the evaluation time); i.e., $* in echo "$*" and echo $* would be treated as different constructs in the parser.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a simple solution may be to convert $* to $@ when $* is used in a context where field splitting is active. It seems $* and $@ are functionally equivalent in the context where field splitting is performed.

# a : 'no-colon' 'with-colon'
# a[0]: 'no-colon' 'with-colon'
# a[@]: 'no-colon' 'with-colon'
# a[*]: 'no-colon' 'with-colon'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it could be useful to split this test up like the other one, since bash agrees with OSH on the first part

Although it is a bit annoying because you have to share test-hyphen

I suppose that could be source spec/testdata/var-op-test-helper.sh or something

Or I guess it is OK to copy and paste too, since the function is short

Copy link
Collaborator Author

@akinomyoga akinomyoga Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akinomyoga
Copy link
Collaborator Author

I wonder if you can also improve test case 18, because it looks like we are missing coverage there. I think it may hit these same code paths

Or add another test case below 18

I added a set of tests in commit 655d14d.

@akinomyoga akinomyoga changed the base branch from master to soil-staging January 3, 2025 14:08
Copy link
Contributor

@andychu andychu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK thanks, this looks good ... I guess the main point is that is an array with 1 value of empty string is considered empty


Thanks for adding the docs

I am going to adjust them a bit, and also adjust the test failures

osh/word_eval.py Show resolved Hide resolved
argv=
argv=plus
## END
## BUG mksh/osh STDOUT:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I usually let OSH fail and adjust allowed_failures

i.e. we don't usually have BUG osh, which is yellow - we make it RED

This is basically a matter of convention

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually wondered if it would be possible to add another type of confirmed failure (e.g. "TODO") that produces red characters (or characters of another color) in the table. With the current strategy,

  • When the number of allowed failures is increased by some changes, we cannot identify which failure was the increased one. We need to check out an older commit, build it, and compare the test results manually. Or one needs to open the github-jobs website, looking up the corresponding page, and compare the results.
  • Even when the problem is fixed in the process of implementing another feature, the failure can be left to be FAIL if the result does not exactly match the expected result (which probably describes the Bash behavior).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm well technically ALL the ~100 failures now are TODO ! :-) Some of them will be classified as "OK osh", but never "BUG osh" I think ... our bugs are just things we haven't done yet

And notice there are ZERO FAIL on bash/dash/mksh/zsh ... so FAIL is really reserved for OSH

and BUG is reserved for the other shells

At least that is the convention I follow, but we can talk about it more if it causes a problem

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if I could call it "a problem", but at least the following situation can be improved:

  • When the number of allowed failures is increased by some changes, we cannot identify which failure was the increased one. We need to check out an older commit, build it, and compare the test results manually. Or one needs to open the github-jobs website, looking up the corresponding page, and compare the results.
  • Even when the problem is fixed in the process of implementing another feature, the failure can be left to be FAIL if the result does not exactly match the expected result (which probably describes the Bash behavior).

doc/ref/chap-word-lang.md Outdated Show resolved Hide resolved
When the variable substituion is inside double quotations, the `[*]` form joins
the elements by the first character of `IFS`:

$ printf '<%s>\n' "${a[*]}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these cases can more concisely explained by a table, with 4 entries or 8 entries, like:

$@
"$@"
$*
"$*"

@andychu andychu merged commit c326e7c into soil-staging Jan 3, 2025
18 checks passed
@andychu
Copy link
Contributor

andychu commented Jan 3, 2025

OK thanks

I also fixed the ${*:-} bug on top of this - 31ce9f6

Thanks for finding it and writing the tests - that is the hard part!

@akinomyoga akinomyoga deleted the array-op-test branch January 3, 2025 22:39
@akinomyoga
Copy link
Collaborator Author

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants