-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to he
for decoding HTML entities
#3
Conversation
I'm reluctant to merge this because Don't get me wrong, you seem to have a great understanding of character encoding in HTML and JavaScript (far more extensive than mine at the moment) but the code looks like something that's been spat out of an automated build tool. My suggestion would be to begin by building a pure node.js module. Compile that with If nobody has requested support for all those other really obscure package managers, don't support them. Just tell people where to download the standalone file and be done with it. As it stands, I'd rather fix the bugs in |
Ouch. If you honestly prefer compact, “readable” code that doesn’t deliver on its promise over slightly more convoluted but properly tested and fully working code, I don’t really know what to say. By the way,
Why does this matter? Also, note that after
What good is easy to understand code if it’s broken?
The same thing goes for (part of) the tests, by the way: This way, whenever the official Needless to say, doing all this manually would be soul-crushing and error-prone.
That’s because it is. Why is that a problem, though?
My tests make use of
There’s no way (that I know of) to fix all the bugs in
|
I prefer working code, but I'm not convinced we can't get to working code without anything in the repository that's not directly related to that project. Are you aware of any bugs in Having the project go through some sort of build tool is fine. The problem is that the results of that build file are mostly what you see when you open the repository on GitHub. Even the stuff in You missed my point with the tests. According to the output of In general, adding build steps to JavaScript vastly slows down development. It makes sense to have an automated step to download the entities and it makes sense to add a build step to deal with all the backwards, non-CommonJS module systems. These are both great because they only need to be run once, when you do a release. Here, for reference, is your entire module, without any requirement for a build step: var regexEncode = require('./regexEncode.js');
var encodeMap = require('./data/encode-map.json');
var decodeMap = require('./data/decode-map.json');
var regexAstralSymbols = /[\uD800-\uDBFF][\uDC00-\uDFFF]/g;
var regexEscape = /[&<>"']/g;
var escapeMap = {
'&': '&',
'<': '<',
'"': '"',
'\'': ''',
// See http://mathiasbynens.be/notes/ambiguous-ampersands: the following
// is not strictly necessary unless part of a tag or an unquoted attribute
// value. We’re only escaping it to match existing `htmlEscape`
// implementations.
'>': '>'
};
// Inspired by `ucs2encode` in http://mths.be/punycode
var codePointToSymbol = function(codePoint) {
var output = '';
if (codePoint > 0xFFFF) {
codePoint -= 0x10000;
output += String.fromCharCode(codePoint >>> 10 & 0x3FF | 0xD800);
codePoint = 0xDC00 | codePoint & 0x3FF;
}
output += String.fromCharCode(codePoint);
return output;
};
exports.escape = function(string) {
return string.replace(regexEscape, function($0) {
return escapeMap[$0]; // no need to check `has()` here
});
};
exports.encode = function(string) {
return string
.replace(regexEncode, function($0) {
return encodeMap[$0]; // no need to check `has()` here
})
.replace(regexAstralSymbols, function($0) {
// http://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae
var high = $0.charCodeAt(0);
var low = $0.charCodeAt(1);
var codePoint = (high - 0xD800) * 0x400 + low - 0xDC00 + 0x10000;
return '&#x' + codePoint.toString(16).toUpperCase() + ';';
})
// Replace other non-ASCII chars with a hexadecimal escape
.replace(/[^\0-\x7F]/g, function($0) {
return '&#x' + $0.charCodeAt(0).toString(16).toUpperCase() + ';';
});
};
exports.unescape =
exports.decode = function(html) {
return html
.replace(/&#([0-9]+);?/g, function($0, codePoint) {
return codePointToSymbol(codePoint);
})
.replace(/&#[xX]([0-9a-fA-F]+);?/g, function($0, hexDigits) {
var codePoint = parseInt(hexDigits, 16);
return codePointToSymbol(codePoint);
})
.replace(/&([0-9a-zA-Z]+;?)/g, function($0, reference) {
if ({}.hasOwnProperty.call(decodeMap, reference)) {
return decodeMap[reference];
} else {
// ambiguous ampersand; see http://mths.be/notes/ambiguous-ampersands
return '&' + reference;
}
});
} My guess is that by using a With this, your repository would consist of:
If it were me, I wouldn't include As for your final point:
This totally misses how software development works. When errors occur and when changes occur, any code I take on is code I have to maintain. For example, every time there's a character encoding bug, I have to consider the possibility that it's caused by |
Wow, @ForbesLindesay's code is ridiculously nicer than what's in the |
Also, a minor critique of your directory structure @ForbesLindesay: the data/ directory should probably not be in the Git repository, but should be added to the package with a |
I agree it looks much nicer this way. If But I want there to be a single build of
|
Browserify is well written, maintainable code that thousands of people are using. It's pretty safe to rely on. In any case, my point is that I'm deciding between If you don't trust browserify, write you own module to produce standalone builds for use in non node.js environments. If you wanted to, you could create something that takes an approximation of the ES6 module format as its input and outputs something that can be digested by node/browsers/r.js etc. That would be pretty neat, but all of that stuff is unrelated to If you think browserify has problems that can be improved, submit pull requests. @substack is pretty on the ball about accepting those pull requests when they're good and they fix real problems. |
Thanks to your suggestions, the directory structure for he is now a bit cleaner. Check it out. For example, https://github.com/mathiasbynens/he/tree/master/data contains all the data as separate JSON files. I do intend to keep the export logic for various platforms in the main |
Any update on this? |
Fixes #2.