Replace escaped Unicode chars (`\u20ac`) in stored JSON? #173

jimallman · 2017-02-22T21:31:44Z

While chasing a Unicode-related bug, I realized that our stored JSON (on GitHub) has ugly escaped Unicode characters, e.g. in this study and this tree collection.

These Unicode characters are handled gracefully in our indexing and web apps, but these escape sequences aren't strictly needed as we store all JSON as utf-8. Meanwhile, they're hideous and make it hard to read and search the stored files on GitHub.

Is this something we want or need to fix?
Would this fix apply to all document types (studies, tree collections, tax. amendments)?
Are there other clients or use cases that would be broken by this change?

If we want to restore pretty Unicode for data saved in the future, it seems to all boil down to a single call to json.dump in peyotl that's used for all JSON docs. If we add ensure_ascii=False to this call as shown here, it should save Unicode characters directly (sans escape) in phylesystem.

The text was updated successfully, but these errors were encountered:

jimallman · 2017-02-22T21:32:54Z

See related Python docs for json.dump here.

jar398 · 2017-02-22T23:06:42Z

Yes, 'we' want to fix it (I have always urged the project to be UTF-8 only)
Apply everywhere
I doubt it, and if there are, we'll find out and can fix them

jimallman added enhancement question labels Feb 22, 2017

jimallman changed the title ~~Replace encoded Unicode chars (\u20ac) in stored JSON?~~ Replace escaped Unicode chars (\u20ac) in stored JSON? Feb 22, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace escaped Unicode chars (`\u20ac`) in stored JSON? #173

Replace escaped Unicode chars (`\u20ac`) in stored JSON? #173

jimallman commented Feb 22, 2017

jimallman commented Feb 22, 2017

jar398 commented Feb 22, 2017

Replace escaped Unicode chars (\u20ac) in stored JSON? #173

Replace escaped Unicode chars (\u20ac) in stored JSON? #173

Comments

jimallman commented Feb 22, 2017

jimallman commented Feb 22, 2017

jar398 commented Feb 22, 2017

Replace escaped Unicode chars (`\u20ac`) in stored JSON? #173

Replace escaped Unicode chars (`\u20ac`) in stored JSON? #173