Skip to content

Latest commit

 

History

History
21 lines (15 loc) · 1.66 KB

FAQ.md

File metadata and controls

21 lines (15 loc) · 1.66 KB

Why do some HTTP headers show up mangled?

HTTP header values are officially only supposed to contain ASCII. Other bytes are "opaque data":

Historically, HTTP has allowed field content with text in the ISO-8859-1 charset [ISO-8859-1], supporting other charsets only through use of [RFC2047] encoding. In practice, most HTTP header field values use only a subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD limit their field values to US-ASCII octets. A recipient SHOULD treat other octets in field content (obs-text) as opaque data.

(RFC 7230)

In practice some headers are for some purposes treated like UTF-8, which supports all languages and characters in Unicode. But if you try to access header values through a browser's fetch() API or view them in the developer tools then they tend to be decoded as ISO-8859-1, which only supports a very limited number of characters and may not be the actual intended encoding.

xh as of version 0.23.0 shows the ISO-8859-1 decoding by default to avoid a confusing difference with web browsers. If the value looks like valid UTF-8 then it additionally shows the UTF-8 decoding.

That is, the following request:

xh -v https://example.org Smile:☺

Displays the Smile header like this:

Smile: â�º (UTF-8: ☺)

The server will probably see � instead of the smiley. Or it might see after all. It depends!