Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: Can't clean for JSON for intake.catalog.local.LocalCatalogEntry #31

Open
scottyhq opened this issue Dec 6, 2019 · 6 comments
Labels
bug Something isn't working

Comments

@scottyhq
Copy link
Collaborator

scottyhq commented Dec 6, 2019

Running into an error outputting an intake.catalog.local.LocalCatalogEntry in a jupyter notebook. print(entry) works, but display(entry) a ValueError: Can't clean for JSON

pinging @jhamman and @martindurant for help sorting this one out. I think it's likely a simple fix.

import intake 
import intake_stac
print(intake.__version__) #0.5.3
print(intake_stac.__version__) #0.2.1

cat = open_stac_catalog('https://storage.googleapis.com/pdd-stac/disasters/catalog.json')
list(cat)
entry = cat['Houston-East-20170831-103f-100d-0f4f-RGB']
type(entry) #intake.catalog.local.LocalCatalogEntry
print(entry)
"""
name: Houston-East-20170831-103f-100d-0f4f-RGB
container: catalog
plugin: ['stac_item']
description: 
direct_access: True
user_parameters: []
metadata: 
args: 
  stac_obj: Houston-East-20170831-103f-100d-0f4f-RGB
"""
display(entry)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/IPython/core/formatters.py in __call__(self, obj)
    916             method = get_real_method(obj, self.print_method)
    917             if method is not None:
--> 918                 method()
    919                 return True
    920 

/srv/conda/envs/notebook/lib/python3.7/site-packages/intake/catalog/entry.py in _ipython_display_(self)
    113         }, metadata={
    114             'application/json': {'root': contents["name"]}
--> 115         }, raw=True)
    116 
    117     def __getattr__(self, attr):

/srv/conda/envs/notebook/lib/python3.7/site-packages/IPython/core/display.py in display(include, exclude, metadata, transient, display_id, *objs, **kwargs)
    309     for obj in objs:
    310         if raw:
--> 311             publish_display_data(data=obj, metadata=metadata, **kwargs)
    312         else:
    313             format_dict, md_dict = format(obj, include=include, exclude=exclude)

/srv/conda/envs/notebook/lib/python3.7/site-packages/IPython/core/display.py in publish_display_data(data, metadata, source, transient, **kwargs)
    120         data=data,
    121         metadata=metadata,
--> 122         **kwargs
    123     )
    124 

/srv/conda/envs/notebook/lib/python3.7/site-packages/ipykernel/zmqshell.py in publish(self, data, metadata, source, transient, update)
    127         # hooks before potentially sending.
    128         msg = self.session.msg(
--> 129             msg_type, json_clean(content),
    130             parent=self.parent_header
    131         )

/srv/conda/envs/notebook/lib/python3.7/site-packages/ipykernel/jsonutil.py in json_clean(obj)
    189         out = {}
    190         for k,v in iteritems(obj):
--> 191             out[unicode_type(k)] = json_clean(v)
    192         return out
    193     if isinstance(obj, datetime):

/srv/conda/envs/notebook/lib/python3.7/site-packages/ipykernel/jsonutil.py in json_clean(obj)
    189         out = {}
    190         for k,v in iteritems(obj):
--> 191             out[unicode_type(k)] = json_clean(v)
    192         return out
    193     if isinstance(obj, datetime):

/srv/conda/envs/notebook/lib/python3.7/site-packages/ipykernel/jsonutil.py in json_clean(obj)
    189         out = {}
    190         for k,v in iteritems(obj):
--> 191             out[unicode_type(k)] = json_clean(v)
    192         return out
    193     if isinstance(obj, datetime):

/srv/conda/envs/notebook/lib/python3.7/site-packages/ipykernel/jsonutil.py in json_clean(obj)
    189         out = {}
    190         for k,v in iteritems(obj):
--> 191             out[unicode_type(k)] = json_clean(v)
    192         return out
    193     if isinstance(obj, datetime):

/srv/conda/envs/notebook/lib/python3.7/site-packages/ipykernel/jsonutil.py in json_clean(obj)
    195 
    196     # we don't understand it, it's probably an unserializable object
--> 197     raise ValueError("Can't clean for JSON: %r" % obj)

ValueError: Can't clean for JSON: Houston-East-20170831-103f-100d-0f4f-RGB
@jhamman
Copy link
Collaborator

jhamman commented Dec 6, 2019

We inherit the _ipython_display_ method from intake's CatalogEntry. My guess is that there is some bits of metadata on the stac object that are not parsable by ipython's json parser. We should be able to overwrite this behavior (or fix this upstream with sat-stac).

https://github.com/intake/intake/blob/a4d216d1378fc8eaedc6796c1516317316ec6a8e/intake/catalog/entry.py#L106-L115

@scottyhq
Copy link
Collaborator Author

So, the error is due to datetime objects in the metadata:

Adding this to the above code results in a more informative error

import json
json.dumps(entry.metadata)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-1-d97249792b93> in <module>
     13 import json
     14 #json.dumps(entry)
---> 15 json.dumps(entry.metadata)

/srv/conda/envs/notebook/lib/python3.7/json/__init__.py in dumps(obj, skipkeys, ensure_ascii, check_circular, allow_nan, cls, indent, separators, default, sort_keys, **kw)
    229         cls is None and indent is None and separators is None and
    230         default is None and not sort_keys and not kw):
--> 231         return _default_encoder.encode(obj)
    232     if cls is None:
    233         cls = JSONEncoder

/srv/conda/envs/notebook/lib/python3.7/json/encoder.py in encode(self, o)
    197         # exceptions aren't as detailed.  The list call should be roughly
    198         # equivalent to the PySequence_Fast that ''.join() would do.
--> 199         chunks = self.iterencode(o, _one_shot=True)
    200         if not isinstance(chunks, (list, tuple)):
    201             chunks = list(chunks)

/srv/conda/envs/notebook/lib/python3.7/json/encoder.py in iterencode(self, o, _one_shot)
    255                 self.key_separator, self.item_separator, self.sort_keys,
    256                 self.skipkeys, _one_shot)
--> 257         return _iterencode(o, 0)
    258 
    259 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

/srv/conda/envs/notebook/lib/python3.7/json/encoder.py in default(self, o)
    177 
    178         """
--> 179         raise TypeError(f'Object of type {o.__class__.__name__} '
    180                         f'is not JSON serializable')
    181 

TypeError: Object of type datetime is not JSON serializable

The metadata looks like this:

{'datetime': datetime.datetime(2017, 8, 31, 17, 24, 57, 555491, tzinfo=tzlocal()),
 'provider': 'Planet',
 'license': 'CC-BY-SA',
 'eo:cloud_cover': 2,
 'eo:gsd': 3.7,
 'eo:sun_azimuth': 145.5,
 'eo:sun_elevation': 64.9,
 'eo:view_angle': 0.2,
 'pl:epsg_code': 32615,
 'pl:ground_control': True,
 'pl:instrument': 'PS2',
 'pl:provider': 'planetscope',
 'bbox': [-95.73737276800716,
  29.561332400220497,
  -95.05332428370095,
  30.157560439570304],
 'geometry': {'type': 'Polygon',
  'coordinates': [[[-95.73737276800716, 30.14525788823348],
    [-95.06532619920118, 30.157560439570304],
    [-95.05332428370095, 29.57334931237589],
    [-95.7214758280382, 29.561332400220497],
    [-95.73737276800716, 30.14525788823348]]]},
 'date': datetime.date(2017, 8, 31),
 'catalog_dir': ''}

The following can fix the issue, but I'm confused as to where this should go in the codebase:

import datetime
def convert_datetime(o):
    if isinstance(o, datetime.datetime) or isinstance(o, datetime.date):
        return o.__str__()
    else:
        return o

md = entry.metadata
clean = {k: convert_datetime(v) for k, v in md.items()}

Maybe @ian-r-rose has a suggestion based on this intake pull request intake/intake#327

@martindurant
Copy link
Member

At what point is JSON encoding required? I'm pretty sure that YAML has no problem with this.

@ian-r-rose
Copy link

Interesting, I hadn't considered that there would be datetime objects in the metadata.

@martindurant when I added the custom __repr__ I used JSON for the mimetype, with the knowledge that it was a widely-supported one by a variety of frontends (JupyterLab, nteract, etc).

Is there some fuller accounting of the non-JSON-able types that might pop up in the metadata? If not, the basic workaround that @scottyhq points to seems reasonable to me, if a bit fragile. We could provide a default serializer function str() to the JSON-serialization to try to cover all possible objects that might be in the metadata.

@martindurant
Copy link
Member

msgpack and yaml are the serialisers of reference in intake; but no, there is no list of expected metadata contents, and some drivers might choose to store more complex things if they are not to be serialised at all.

@scottyhq
Copy link
Collaborator Author

Thanks for the input. As far as I can tell we are using satstac to read JSON metadata that starts as strings ("datetime": "2019-10-31T19:02:13.439292+00:00"), but gets converted to datetime objects - @matthewhanson can confirm: https://github.com/pangeo-data/intake-stac/blob/d2c3f01b2e9931da7b1d87aaa67c7e0c107c5fc7/intake_stac/catalog.py#L207

So another short-term solution is just to not convert the strings for intake metadata.

@jhamman jhamman added the bug Something isn't working label Mar 3, 2020
@jhamman jhamman mentioned this issue Aug 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants