Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exported JSON cannot be loaded #14

Open
rlaemmel opened this issue Jun 10, 2013 · 1 comment
Open

Exported JSON cannot be loaded #14

rlaemmel opened this issue Jun 10, 2013 · 1 comment
Assignees
Labels

Comments

@rlaemmel
Copy link
Collaborator

It looks like the exported JSON has an encoding problem.

Suppose you are extracting http://en.wikipedia.org/wiki/Category:Programming_languages_created_in_the_1940s

This gives you JSON like this:

[{"id":1,"title":"Programming languages created in the 1940s","level":0,"transitivePages":1,"pages":1,"transitiveSubcategories":0,"parentCategories":0,"subcategories":0,"type":"Category"},{"start":1,"type":"ContainsPage","end":2},{"id":2,"title":"Plankalk<9F>l","type":"Page"}]

Upon loading in say Python you get this:

> > > import json
> > > jsonfile = open("bug.json", 'rb')
> > > cgraph = json.load(jsonfile)
> > > Traceback (most recent call last):
> > >   File "", line 1, in 
> > >   File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/**init**.py", line 278, in load
> > >     **kw)
> > >   File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/**init**.py", line 326, in loads
> > >     return _default_decoder.decode(s)
> > >   File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 366, in decode
> > >     obj, end = self.raw_decode(s, idx=_w(s, 0).end())
> > >   File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 382, in raw_decode
> > >     obj, end = self.scan_once(s, idx)
> > > UnicodeDecodeError: 'utf8' codec can't decode byte 0x9f in position 8: invalid start byte
> > > 
> > > 
@ghost ghost assigned dmosen Jun 10, 2013
dmosen pushed a commit that referenced this issue Jun 10, 2013
System default encoding was used, forced to use UTF-8.
@dmosen
Copy link
Owner

dmosen commented Jun 11, 2013

Left open for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants