Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't serialize to JSON #591

Closed
mattwthompson opened this issue May 4, 2020 · 7 comments
Closed

Can't serialize to JSON #591

mattwthompson opened this issue May 4, 2020 · 7 comments
Assignees
Labels
Collection Related to the collections layer

Comments

@mattwthompson
Copy link
Contributor

Describe the bug
I wanted to save a collection to disk in order to avoid needing to downloads a large dataset every time I ran a test or re-started a notebook.

To Reproduce

import json

import qcportal


client = ptl.FractalClient()
ds = client.get_collection('OptimizationDataset', 'OpenFF Optimization Set 1')
ds.to_json(filename='data.json')

raises TypeError: Object of type set is not JSON serializable

Expected behavior
I expected to be able to save this out to JSON

Additional context
There is data in the collection object:

image

@loriab suggested on Slack that a set may have snuck in somewhere. This is probably a terrible collection to debug on since it includes something like 20,000 records.

@mattwthompson mattwthompson changed the title JSON Can't serialize to JSON May 4, 2020
@bennybp
Copy link
Contributor

bennybp commented May 6, 2020

Ok I see the problem. The history key is a set

@bennybp bennybp added the Collection Related to the collections layer label Nov 2, 2020
@dotsdl
Copy link
Collaborator

dotsdl commented Nov 3, 2020

Diving into this one now.

@dotsdl
Copy link
Collaborator

dotsdl commented Nov 3, 2020

@bennybp I see you made a commit that addressed history being a set in a branch on your fork. Are you planning to merge this? Otherwise I can make the change in a PR here.

@bennybp
Copy link
Contributor

bennybp commented Nov 3, 2020

I started to fix it in that PR but abandoned it. It is a little more involved than just changing it to a list (there are some places that use the set functionality that have to also be modified.

Looking at the database, the database stores this info as JSON. Not entirely sure where this gets converted from a set to a list on the backend...

@dotsdl
Copy link
Collaborator

dotsdl commented Nov 3, 2020

Ah cool, thank you for that clarification. Working on a solution that doesn't fail in other places.

dotsdl added a commit that referenced this issue Nov 4, 2020
…Collections

Addresses #591

Pydantic models support direct JSON serialization; we take advantage of
this here to support use of e.g. sets in a model.

The `to_json` and `from_json` methods also technically dealt with dicts,
not JSON strings; this is a distinction that matters since not all dicts
are valid JSON constructs. We resolve that distinction here.

Need explicit roundtripping tests for `*_json` and `*_dict` methods yet.
@mattwthompson
Copy link
Contributor Author

I ran into this again today. For my provenance, the quickest solution is to just pop the history. There's probably a way to map it onto a list but I don't think I need it for my use use case and just data['history']= list(data['history']) did not completely work - it was happy to write to disk but could not be read back. I didn't look further into why.

import json

import qcportal

client = qcportal.FractalClient(verify=False)

dataset = client.get_collection(
    "OptimizationDataset",
    "OpenFF Iodine Chemistry Optimization Dataset v1.0",
)

with open("dataset.json", "w") as file:

    data = dataset.to_json()
    data.pop("history")

    json.dump(data, file)

with open("dataset.json", "r") as file:
    data = json.load(file)

@bennybp
Copy link
Contributor

bennybp commented Sep 14, 2023

Superseded by #740 for v0.50

@bennybp bennybp closed this as completed Sep 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Collection Related to the collections layer
Projects
None yet
Development

No branches or pull requests

3 participants