Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

->decoded_content should decode application/json, etc [rt.cpan.org #82963] #72

Closed
oalders opened this issue Mar 31, 2017 · 6 comments
Closed

Comments

@oalders
Copy link
Member

oalders commented Mar 31, 2017

Migrated from rt.cpan.org#82963 (status was 'open')

Requestors:

From [email protected] on 2013-01-25 22:13:06:

Currently $response->decoded_content will decode the bytes of e.g.
"Content-type: text/json; charset=UTF-8" messages because it knows
"text/*" is ... text.

It would be nice if this could be extended to also decode the text for
content-types such as "application/json; charset=UTF-8",
"application/javascript; charset=ISO-8859-15", etc.

From [email protected] on 2013-07-21 18:05:19:

On Fri Jan 25 17:13:06 2013, MAUKE wrote:
> Currently $response->decoded_content will decode the bytes of e.g.
> "Content-type: text/json; charset=UTF-8" messages because it knows
> "text/*" is ... text.
> 
> It would be nice if this could be extended to also decode the text for
> content-types such as "application/json; charset=UTF-8",
> "application/javascript; charset=ISO-8859-15", etc.

Bump, just ran into the same issue after a few hours.
in HTTP::Headers->content_is_text,
shouldn't the presence of charset in the content-type imply that the content is characters, ie text?

From [email protected] on 2013-08-13 09:27:23:

Second on this.

When I say decode, I know what I am doing - currently there is no way to force it.

  $response->decoded_content(charset => 'utf-8')

Adding (charset_strict => 1, raise_error => 1) doesn't help.

Better yet, the content type I get is
  Content-Type: application/json; charset=UTF-8
Maybe content_is_text() should returns true if the charset is present in the content-type header?

From [email protected] on 2014-08-27 03:12:50:

Third.  Currently, the code says:

if ($self->content_is_text || (my $is_xml = $self->content_is_xml)) {

Examples where LWP currently breaks include:

application/json
application/yaml
application/x-yaml
application/pdf
application/* (that isn't +xml)

The Content-Type really shouldn't matter.  If the Content-Type is "pork/beans; charset=UTF-8", it should still be decoded.

If the remote agent broadcasted a charset, it's telling us that it had encoded that data with that character set.  We shouldn't care if the data inside the onion is text, audio, application-specific, some proprietary format, whatever.

Please remove this 'if' line.  It's a pretty intelligent interface, so it would be a waste of code to have other folks design their own decoding interface just because of this restriction.
@vanHoesel
Copy link
Member

As described in a lengthy comment on Always charset decode, regardless of content type, application/json none of the examples mentioned here above should have any charset-decoding done automagically.

As recommended in RFC 6657 – MIME Charset Default Update §3b New Rules for Default "charset" Parameter Values for "text/*" Media Types, text/* SHOULD

require explicit unconditional inclusion of the "charset" parameter, eliminating the need for a default value.

And thus Content-Type MUST matter and the charset parameter for any other media type is just 'auxiliary meta info' for a processor of the 'binary' data like application/*. It is not the other way around , a charset param DOES NOT mean that it is of type type/*.

Saying «It's a pretty intelligent interface» is a great compliment, but I prefer an interface to do less guess work and do less automagically things under the hood. Especially regarding encoding/decoding layers. We all know it goes wrong too many times.

Sure, I'd like to have an intelligent interface, but not on the cost to make 'semi-core' Perl modules going to do the wrong thing. Maybe a HTTP::Message::MIME::ApplicationJSON © would be a better alternative ?

Conclusion

$mess->decoded_contnent is doing the right thing

this issue SHOULD be closed

@dracos
Copy link

dracos commented Mar 11, 2018

Looks like https://metacpan.org/release/LWP-JSON-Tiny might meet your criteria (haven't tried it myself though), for anyone else coming across this.

@vanHoesel
Copy link
Member

or a naughty over-ride HTTP::Message::JSON

@dracos
Copy link

dracos commented Mar 11, 2018

Yes, that's from the same package :)

@vanHoesel
Copy link
Member

as described in #99, unless we add extra parameters or an entire new method, this can not be solved without breaking any existing API based on JSON

@oalders
Copy link
Member Author

oalders commented Apr 3, 2018

Closing this as we're working towards a solution in #99

@oalders oalders closed this as completed Apr 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants