Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cloudpickle is not stable in notebooks #538

Open
shobsi opened this issue Jun 28, 2024 · 3 comments
Open

cloudpickle is not stable in notebooks #538

shobsi opened this issue Jun 28, 2024 · 3 comments

Comments

@shobsi
Copy link

shobsi commented Jun 28, 2024

Consider this code in a notebook cell:

import cloudpickle

MY_PI = 3.1415

def get_pi():
  return MY_PI

print(cloudpickle.dumps(get_pi))

every time I rerun this cell I get a different output. This is unlike a Python script where it gives a consistent output. I am trying to use cloudpickle to capture the function and persist it in a storage for later use. I want to update the storage only when there is a material change in the behavior of the function, but because of this behavior in the notebook I am running into redundant updates of the storage which is costly. Is there a way I can avoid this?

@shobsi
Copy link
Author

shobsi commented Jul 12, 2024

Gentle ping! Any recommendations is much appreciated.

@shobsi
Copy link
Author

shobsi commented Jul 19, 2024

Just to share more observed behavior, the output of the above code contains string like ipython-input-x-y where x is the cell id and y is some hash representation of the code in the cell. Since the cell id changes with every rerun of the cell, the overall output changes.

@ogrisel
Copy link
Contributor

ogrisel commented Aug 5, 2024

I gave it a try using the main branch of cloudpickle and I cannot reproduce: the output is stable when rerunning the same within the same active notebook session. But it changes upon notebook restart because the temporary file used internally by ipykernel to store the source is changed in that case.

Here is an updated version of the snippet I used in my notebook cell to get a richer output:

import cloudpickle
from pickletools import dis
from hashlib import sha256

MY_PI = 3.1415

def get_pi():
  return MY_PI

dumped_get_pi = cloudpickle.dumps(get_pi)
print(sha256(dumped_get_pi).hexdigest())
print(dis(dumped_get_pi))

I used ipykernel version 6.29.5 and jupyterlab version 4.2.4.

Anyways, I am not sure we want to make cloudpickle too magic w.r.t. the handling of jupyter's implementation details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants