-
Notifications
You must be signed in to change notification settings - Fork 281
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: allow backslash-escaped special characters in double-quoted strings (#700) #703
base: master
Are you sure you want to change the base?
fix: allow backslash-escaped special characters in double-quoted strings (#700) #703
Conversation
Tests succeed before any changes:
|
Repro test case fails properly before fix applied:
|
Tests pass, including repro test case, after fix and linting applied:
|
It could be argued that we should accept single-quoted strings containing special characters as well, since the strings are quoted. Obviously, though, since backslash-escapes are not active in single-quoted YAML scalars, the special characters would be embedded in such strings without escaping. However:
Any wider change in this area likely requires more thought to understand whether it would be a breaking change. (Note, this fix actually accepts such embedded special characters in double-quoted strings, since we can't really tell that they were escaped in the original YAML, unless we do something a lot more invasive; however, testing seems to show that It's my opinion that the narrower fix is most appropriate here, and that the wider, more invasive change should go into a backlog if it's at all desired. |
Unsure why this would be the case, as the |
@adrienverge, this feels ready for review. |
Hello @jmknoble, thanks for contributing! This fix looks good.
Understood. I agree with your conclusion: let's fix double-quotes for now. However I'm not comfortable with relying on an English string ( >>> from yaml.reader import Reader
>>> Reader('').check_printable('\u0020')
>>> Reader('').check_printable('\u001b')
yaml.reader.ReaderError: unacceptable character #x001b: special characters are not allowed
👌 Also can you squash all your commits into one, with a commit message starting with |
I get the brittleness concern. How does this look? --- a/yamllint/rules/quoted_strings.py
+++ b/yamllint/rules/quoted_strings.py
@@ -204,12 +204,21 @@ def _quote_match(quote_type, token_style):
(quote_type == 'double' and token_style == '"'))
-def _quotes_are_needed(string, is_inside_a_flow):
+def _quotes_are_needed(string, style, is_inside_a_flow):
# Quotes needed on strings containing flow tokens
if is_inside_a_flow and set(string) & {',', '[', ']', '{', '}'}:
return True
+ if style == '"':
+ try:
+ yaml.reader.Reader('').check_printable('key: ' + string)
+ except yaml.reader.ReaderError:
+ # Special characters in a double-quoted string
+ # are assumed to have been backslash-escaped
+ return True
+
loader = yaml.BaseLoader('key: ' + string)
+
# Remove the 5 first tokens corresponding to 'key: ' (StreamStartToken,
# BlockMappingStartToken, KeyToken, ScalarToken(value=key), ValueToken)
for _ in range(5):
Let me know what you think of the above, please, before I go to the trouble of doing that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this look?
This looks good! Except for the last added empty line: it seems unrelated?
Ok, @adrienverge, let me know if you approve of the changes as of 3f84294, and I will then close this PR and open a new one with a squashed commit. Proposed commit message: quoted-strings: only-when-needed: allow escaped special characters in double-quoted strings
- Fixes #700
- Accepts escaped special characters in double quoted strings (`key: "\u001b"`)
- Still throws exception for unescaped (embedded) special characters, whether or not quotes are present
- To reproduce: `yamllint -d 'rules: {quoted-strings: {required: only-when-needed}}' - <<<'key: "\u001b"'` |
Hello @jmknoble, Thanks for the update! No need to open a new pull request, let's track this feature's history at a single place. You can push force to this one. Otherwise I'll squash myself before merging, with a commit message like:
Before this I just have a question, regarding your comment #703 (comment). Do you confirm that writing def test_only_when_needed_special_characters(self):
conf = 'quoted-strings: {required: only-when-needed}\n'
self.check('---\n'
'k1: "\\u001b"\n',
conf)
+ self.check('---\n'
+ "k1: '\\u001b'\n",
+ conf, problem=(2, 5))
+ self.check('---\n'
+ 'k1: \\u001b\n',
+ conf)
self.assertRaises(yaml.reader.ReaderError, self.check,
'---\n'
'k1: "\u001b"\n',
conf)
self.assertRaises(yaml.reader.ReaderError, self.check,
'---\n'
"k1: '\u001b'\n",
conf)
self.assertRaises(yaml.reader.ReaderError, self.check,
'---\n'
'k1: \u001b\n',
conf) |
According to the YAML 1.2.1 specification:
I take this to mean that there are three cases (or four, depending on how you look at it) (all examples below are in YAML scalar notation):
The nuance is that the same YAML scalar content will produce different, but equally valid, results depending on how it is quoted. For example, with the following Python script: from pprint import pprint
import yaml
values = ["\\u001b", "\\\\u001b", "\u001b"]
quotes = ['"', "'", ""]
i = 0
yaml_cases = []
for value in values:
for quote in quotes:
i += 1
yaml_cases.append(f"key{i}: {quote}{value}{quote}")
print("yaml_cases = ", end="")
pprint(yaml_cases)
print("\nResults:\n")
for yaml_case in yaml_cases:
try:
result = yaml.safe_load(yaml_case)
print(repr(yaml_case), "=>", repr(result))
except yaml.reader.ReaderError as e:
print(repr(yaml_case), "=>", repr(e)) we get the following results:
As far as I can tell, that means the tests you suggest are correct. I'll add those to this PR. |
quoted-strings
areonly-when-needed
#700