-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dill load and sklearn clone result in error #1026
Comments
I could not 100% reproduce the issue, thus I had to make some small changes: from sklearn.datasets import make_regression
from sklearn.base import clone
import numpy as np
import torch
import skorch
import dill
dill.__version__ # 0.3.6
X, y = make_regression()
X, y = X.astype(np.float32), y.astype(np.float32).reshape(-1, 1) # added
base_model = skorch.NeuralNetRegressor(torch.nn.Linear(100, 1))
cloned_model = clone(base_model)
dumped_model = dill.loads(dill.dumps(base_model))
cloned_dumped_model = clone(dumped_model)
base_model.fit(X, y) # works
cloned_model.fit(X, y) # works
dumped_model.fit(X, y) # THIS ALREADY FAILS FOR ME
cloned_dumped_model.fit(X, y) # fails with same error First, could you please confirm that my snippet produces the same error for you? Second, is the error you get also:
|
I can reproduce it, yes. And indeed the dumped version does die also. What does work is: dumped_model = dill.loads(pickle.dumps(base_model))
dumped_model.fit(X, y) the error is the same. the loss is None here. The process looks okay to me and goes through all the initializations, and I have tracked it to the train_step function, where if you print the optimizers you will get an empty list. But when you take the models themselves, and print the pre-fit atributes, everything looks good! quite frustrating. |
okay, after some checks: from sklearn.datasets import make_regression
from sklearn.base import clone
import numpy as np
import torch
import skorch
import dill
import pickle
dill.__version__ # 0.3.6
X, y = make_regression()
X, y = X.astype(np.float32), y.astype(np.float32).reshape(-1, 1) # added
base_model = skorch.NeuralNetRegressor(torch.nn.Linear(100, 1))
cloned_model = clone(base_model)
dumped_model = dill.loads(dill.dumps(base_model))
dumped_fitted_model = dill.loads(dill.dumps(base_model.fit(X, y)))
cloned_dumped_model = clone(dumped_model)
cloned_dumped_fitted_model = clone(cloned_dumped_model)
base_model.fit(X, y) # works
cloned_model.fit(X, y) # works
dumped_model.fit(X, y) # fails
dumped_fitted_model.fit(X, y) # works
cloned_dumped_fitted_model.fit(X, y) # fails
cloned_dumped_model.fit(X, y) # fails with same error |
Thanks for investigating further. This is super strange IMO, because the Edit: Just checked it, dill does call |
we could print a trace of execution with the final fit and diff across, maybe? |
Sorry, I don't understand. How can this be done? |
I was thinking pdb might be of some help here, will report if I manage anything. In the meantime, I have found that dumping byref with dill solves the fail: # works
dill.loads(dill.dumps(base_model, byref=True)).fit(X, y)
clone(dill.loads(dill.dumps(base_model.fit(X, y), , byref=True))).fit(X, y) |
Dumping a skorch model with dill then reloading it (does not matter if with dill or pickle) makes it incompatible with sklearn.base.clone , apparently due to some attributes becoming empty - the optimizers I think, but Ihad no time to investigate further. This behaviour occurs neither with pickle or joblib.
This makes using functions such as cross_val_predict unuseable after loading a previously dumped model.
to reproduce:
python=3.10
tried with a bunch of versions for dill / torch / skorch / sklearn. all bug out.
The text was updated successfully, but these errors were encountered: