-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support overwriting response's mime and charset #184
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- HTTPie crashes (quickly) if it can't parse the encoding.
- HTTPie recognizes some encoding names
encoding_rs
doesn't, notablyutf16
. (It has to be spelledutf-16
.)
First one shouldn't be too hard, just change Args
to response_charset: Option<Encoding>
and do the parsing earlier.
The second one is tricky:
- HTTPie defers to Python (this table), we defer to encoding-rs (this table).
- encoding-rs doesn't support all the encodings Python does. That's too bad if you find a server that serves up UTF-7 but there's not much we can do about it.
- Python normalizes the names. It seems to ignore case as well as most non-alphanumeric ASCII characters (like
-
and*
and\x7F
, but not.
for some reason, and not unicode characters). So--response-charset=~~~~UtF////16@@
just works. - Python supports some aliases we don't (and vice versa, but that's less important). For example
u8
/u16
for UTF-8/UTF-16.
I think we can get quite far by first normalizing it by removing all ASCII non-alphanumeric characters except for _
, -
and :
(lowercasing isn't necessary, encoding-rs does that) and then:
- Try it as-is.
- Check if it's one of the aliases
u8
/u16
/u32
/utf
. - Try it with all
-
and_
removed. - Try it with all
_
turned into-
. - Try it with all
-
turned into_
. - Try it with all
-
and_
removed and a-
inserted before the first digit.
If all that fails, we can put a link to the table in the error message.
|
||
match normalized_encoding.as_str() { | ||
"u8" | "utf" => return Ok(encoding_rs::UTF_8), | ||
"u16" => return Ok(encoding_rs::UTF_16LE), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
encoding_rs
associates the label utf-16
with UTF_16LE
but I am not sure if it is the same in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't think that encoding_rs
supports utf-32
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my phone (ARM) it's little-endian:
>>> 'a'.encode('utf_16_be').decode('utf16')
'愀'
>>> 'a'.encode('utf_16_le').decode('utf16')
'a'
I wouldn't be surprised if it depended on the machine's architecture. x86 is also little-endian so we'd agree on most machines.
In any case, encoding-rs is made for the web, so if it disagrees with Python then it's probably Python which is wrong.
src/cli.rs
Outdated
fn parse_encoding_label() { | ||
assert_eq!( | ||
parse_encoding("~~~~UtF////16@@").unwrap(), | ||
encoding_rs::UTF_16LE | ||
); | ||
assert_eq!(parse_encoding("utf_8").unwrap(), encoding_rs::UTF_8); | ||
assert_eq!(parse_encoding("utf8").unwrap(), encoding_rs::UTF_8); | ||
assert_eq!(parse_encoding("utf-8").unwrap(), encoding_rs::UTF_8); | ||
assert_eq!( | ||
parse_encoding("iso8859_6").unwrap(), | ||
encoding_rs::ISO_8859_6 | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any test cases you think are worth adding here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also try u8
(for the alias), utf16
, iso_8859-2:1987
(for the colon), l1
, elot-928
(the original has a required underscore), utf_16_be
, utf16be
, utf-16-be
.
And maybe "notreal"
and ""
to round it off?
I guess that's a lot of cases. You could put them in a const &[(&str, &Encoding)]
and test in a loop.
src/cli.rs
Outdated
Err(Error::with_description( | ||
&format!( | ||
"{} is not a supported encoding, please refer to https://encoding.spec.whatwg.org/#names-and-labels\ | ||
for supported encodings", | ||
encoding | ||
), | ||
ErrorKind::InvalidValue, | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is missing a space after the link, and you get the bold red "error:" twice because structopt wraps another clap error around this one:
error: Invalid value for '--response-charset ': error: utf32 is not a supported encoding, please refer to https://encoding.spec.whatwg.org/#names-and-labelsfor supported encodings
I think an error type of &'static str
would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are flags other than --response-charset
that are currently returning a clap error instead of a &str. I fix those as well.
src/cli.rs
Outdated
fn parse_encoding_label() { | ||
assert_eq!( | ||
parse_encoding("~~~~UtF////16@@").unwrap(), | ||
encoding_rs::UTF_16LE | ||
); | ||
assert_eq!(parse_encoding("utf_8").unwrap(), encoding_rs::UTF_8); | ||
assert_eq!(parse_encoding("utf8").unwrap(), encoding_rs::UTF_8); | ||
assert_eq!(parse_encoding("utf-8").unwrap(), encoding_rs::UTF_8); | ||
assert_eq!( | ||
parse_encoding("iso8859_6").unwrap(), | ||
encoding_rs::ISO_8859_6 | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also try u8
(for the alias), utf16
, iso_8859-2:1987
(for the colon), l1
, elot-928
(the original has a required underscore), utf_16_be
, utf16be
, utf-16-be
.
And maybe "notreal"
and ""
to round it off?
I guess that's a lot of cases. You could put them in a const &[(&str, &Encoding)]
and test in a loop.
Ticks another item from #4