Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOMDocument::loadHTML() UTF-8 issue #131

Merged
merged 1 commit into from
May 10, 2016

Conversation

techi602
Copy link
Contributor

DomDocument::loadHTML loads UTF-8 html as iso-8859-1 unless xml header is set
http://stackoverflow.com/a/8218649/1703973

@jeroenvdheuvel
Copy link

I made a similar pull request #137. I chose to keep the characters UTF-8 encoded, to make sure the library user stays in control. Instead of deciding some characters need to be escaped.

@tijsverkoyen can you take a look at both pull requests?

@jeroenvdheuvel
Copy link

jeroenvdheuvel commented May 10, 2016

The downside of this approach over #137 is that createDomDocumentFromHtml is now altering the inner content of html and xml nodes. For instance Žluťoučký is changed to Žluťoučký.

This can be a good thing, but when I call CssToInlineStyles->convert and read the documentation

Will inline the $css into the given $html
Remark: if the html contains <style>-tags those will be used, the rules
in $css will be appended.

I would assume my content would stay intact (unchanged) and only styles are being inlined. This pull request actually changes content and converts UTF-8 characters to their html entities. This puts the library in control instead of the user who was already able to escape the UTF-8 characters.

@techi602
Copy link
Contributor Author

@jeroenvdheuvel already fixing this issue in #145

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants