Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix whitespace in number parsing #3195

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

thaliaarchi
Copy link
Contributor

@thaliaarchi thaliaarchi commented Nov 2, 2024

This is two separate number parsing bug fixes:

Do not skip leading whitespace in jvp_strtod

jvp_strtod skips leading whitespace, but its decnum counterpart decNumberFromString (called within jv_number_with_literal) does not. Those two are called interchangeably, so it leads to inconsistent behavior depending on whether the decnum feature is enabled. Additionally, classify, used in the token scanner, only considers [ \t\n\r] to be whitespace, but jvp_strtod consumes the larger set [ \t\n\v\f\r], so those extra characters are considered literals.

Commit ce0e788 (improve tonumber/0 performance by parsing input as number literal, 2024-03-02) changed tonumber to not accept leading or trailing whitespace. However, this case was missed.

Changing this deviates from the behavior of strdod from <stdlib.h> and is technically a breaking API change, since it is a public symbol.

$ ./configure
$ make -j8
$ ./jq -n '" 123" | tonumber'
jq: error (at <unknown>): string (" 123") cannot be parsed as a number
$ ./jq -n '"123 " | tonumber'
jq: error (at <unknown>): string ("123 ") cannot be parsed as a number

$ make clean
$ ./configure --disable-decnum
$ make -j8
$ ./jq -n '" 123" | tonumber'
123
$ ./jq -n '"123 " | tonumber'
jq: error (at <unknown>): string ("123 ") cannot be parsed as a number

Handle input errors for --indent

Handle malformed and overflowing arguments. Also, forbid leading and trailing spaces to match the behavior of tonumber.

This will produce a merge conflict in conjunction with PR#3194 Parse short options in order given, because they modify the same area. However, they are logically independent and can be merged in either order. After the first merge, I will rebase the second onto the first. Done

Handle malformed and overflowing arguments. Also, forbid leading and
trailing spaces to match the behavior of tonumber from commit ce0e788
(improve tonumber/0 performance by parsing input as number literal,
2024-03-02).
`jvp_strtod` skips leading whitespace, but its decnum counterpart
`decNumberFromString` (called within `jv_number_with_literal`) does not.
Those two are called interchangeably, so it leads to inconsistent
behavior depending on whether the decnum feature is enabled.
Additionally, `classify`, used in the token scanner, only considers
[ \t\n\r] to be whitespace, but `jvp_strtod` consumes the larger set
[ \t\n\v\f\r], so those extra characters are considered literals.

Changing this deviates from the behavior of `strdod` from <stdlib.h> and
is technically a breaking API change, since it is a public symbol.
tests/jq.test Show resolved Hide resolved
@emanuele6
Copy link
Member

The patch is fine, but I have missed why we want to make tonumber/0, in the next version of jq, no longer ignore trailing and leading whitespace instead of just making skip whitespace before and after also when using decNum to not change the behaviour.

Copy link
Member

@emanuele6 emanuele6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because that is what gojq also does, I guess

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants