Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError during unpickling DataContainer: '_allow_compute' attribute missing #239

Open
n0rdp0l opened this issue Oct 30, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@n0rdp0l
Copy link

n0rdp0l commented Oct 30, 2024

Describe the bug
I am encountering an AttributeError when unpickling a DataContainer object from the xeofs package. The error message is:

AttributeError: 'DataContainer' object has no attribute '_allow_compute'

This occurs when I attempt to unpickle a dictionary containing a DataContainer instance using dill. It seems that during the unpickling process, the __init__ method is not called, so the _allow_compute attribute is not initialized. As a result, when __setitem__ is called during unpickling, it tries to access _allow_compute, which doesn't exist.

Reproducible Minimal Working Example

import dill
from xeofs.data_container.data_container import DataContainer

# Create a DataContainer object and add an item
data_container = DataContainer()
data_container['key'] = 'value'

# Serialize the DataContainer object
with open('data_container.pkl', 'wb') as f:
    dill.dump(data_container, f)

# Deserialize the DataContainer object
with open('data_container.pkl', 'rb') as f:
    loaded_data_container = dill.load(f)  # This line raises the AttributeError

Traceback

File ".../bwcore/forecasting/forecaster.py", line 76, in load_forecast_dict
    forecast_dict = dill.load(f)
  File ".../site-packages/dill/_dill.py", line 297, in load
    return Unpickler(file, ignore=ignore, **kwds).load()
  File ".../site-packages/dill/_dill.py", line 452, in load
    obj = StockUnpickler.load(self)
  File ".../site-packages/xeofs/data_container/data_container.py", line 24, in __setitem__
    self._allow_compute[__key] = self._allow_compute.get(__key, True)
AttributeError: 'DataContainer' object has no attribute '_allow_compute'

Expected behavior
I expect the DataContainer object to be successfully unpickled without errors, with all its attributes properly initialized.

Desktop (please complete the following information):

  • OS: Ubuntu 24.04 LTS aarch64
  • Python [3.10]
  • xeofs version [3.0.2]

Additional context
I believe the issue arises because __init__ is not called during unpickling, so any attributes initialized there (like _allow_compute) are missing. I came across this explanation which mentions that __init__ is not called during unpickling.

To address this, I did a monkey patch to the __setitem__ method in the DataContainer class to check for the existence of _allow_compute and initialize it if necessary:

class DataContainer(dict):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self._allow_compute = dict({k: True for k in self.keys()})

    def __setitem__(self, __key: str, __value: DataArray) -> None:
        super().__setitem__(__key, __value)
        # Ensure _allow_compute exists
        if not hasattr(self, '_allow_compute'):
            self._allow_compute = {}
        self._allow_compute[__key] = self._allow_compute.get(__key, True)

With this change, the unpickling process completes without errors.

I'm not entirely sure if this issue is due to something on my side, since my pipeline uses Python 3.10, and perhaps the pickling behavior differs in this version. If this is indeed a bug, perhaps implementing __getstate__ and __setstate__ methods in the DataContainer class could ensure that _allow_compute is properly handled during pickling and unpickling.

Could you please look into this issue? Any guidance or confirmation on whether this is a known issue or if there's a better way to handle it would be appreciated. If you need any more information let me know :)

@n0rdp0l n0rdp0l added the bug Something isn't working label Oct 30, 2024
@slevang
Copy link
Contributor

slevang commented Oct 30, 2024

I can confirm that this is not pickle-able. Seems like a reasonable feature request and solution. Inheriting from UserDict also seems to work but I don't really know the nuances.

Feel free to make a PR with tests to prevent regression. Besides the DataContainer, have you checked that we can pickle all the other model components?

Looking at the DataContainer again, I wonder if it makes sense to rework this to be a DataTree, since we're using that as a common structure elsewhere now. That would be a much more involved PR though. @nicrie do you have any particular reasons why you chose this subclassed dictionary structure?

@nicrie
Copy link
Contributor

nicrie commented Oct 31, 2024

@nicrie do you have any particular reasons why you chose this subclassed dictionary structure?

Nope I just wanted a simple, consistent way to handle data across all models. Totally agree with @slevang that reworking DataContainer to inherit from DataTree would be the ideal approach.

No strong feelings on the other parts -- I'd be fine with a temporary fix like @n0rdp0l suggested, which could later be replaced with a larger update using DataTrees.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants