Taxonomy files are UTF-8 #146

jar398 · 2016-10-19T02:35:30Z

Class FileReader documentation says "The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate. To specify these values yourself, construct an InputStreamReader on a FileInputStream."

I have no idea what the default encoding is, but the results I'm seeing from the taxon_info service are gibberish (e.g. the synonym in OTT id 3717384). I bet that if TaxonomyLoaderOTT.java were changed to follow the above advice, the results would be better. (It would be necessary to check that the JSON is being written in UTF-8 as well.)

(This problem may affect treemachine as well, but it doesn't deal with synonyms, which is where most of the fancy characters lie.)

jar398 · 2017-04-02T22:44:51Z

Fixed on March 4; need to verify that it really works, then close the issue.

jar398 mentioned this issue Mar 4, 2017

Implement id aliases #148

Merged

jar398 mentioned this issue Apr 2, 2017

Unicode problem - Ceramium tetricum var. ß pectinatum OpenTreeOfLife/reference-taxonomy#317

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Taxonomy files are UTF-8 #146

Taxonomy files are UTF-8 #146

jar398 commented Oct 19, 2016

jar398 commented Apr 2, 2017

Taxonomy files are UTF-8 #146

Taxonomy files are UTF-8 #146

Comments

jar398 commented Oct 19, 2016

jar398 commented Apr 2, 2017