Skip to content

illinoisdata/kishu

Repository files navigation

build status codecov Python version PyPi version

Kishu: Versioned and Undoable Notebook System

Kishu is a system for intelligent versioning of notebook session states on Jupyter-based platforms (e.g. JupyterLab, Jupyter Hub). Kishu efficiently creates checkpoints of both the variable and code states of a notebook session, allowing users to both undo cell executions and manage branching states containing objects such as machine learning models, plots, and dataframes through a Git-like commit and checkout interface.

Using Kishu

Kishu can be installed from PyPI:

pip install kishu jupyterlab_kishu

Once installed, you are ready to use Kishu in your notebook workflows for undoing cell executions and managing branching notebook states.

Note: installing jupyterlab_kishu will also install jupyterlab into your environment.

Initializing Kishu

Initialize Kishu to your notebook: To start protecting your notebook session, Kishu can be initialized and attached through the Kishu > Initialize/Re-attach option under the Kishu tab (left) or in the Jupyter command palette (right). Alternatively, you can use the shortcut Ctrl+K then Ctrl+I / ⌘+K then ⌘+I.



You will see the below notification if Kishu is initialized successfully:



Undoable Notebook Workflow

Automatic Tracking: Once initialized, Kishu will start automatically saving the variable state after each cell execution. Undoing a cell execution: To undo your latest cell execution, you can use the Kishu > Initialize/Re-attach option under the Kishu tab (left) or in the Jupyter command palette (right). Alternatively, you can use the shortcut Ctrl+K then Ctrl+Z / ⌘+K then ⌘+Z.



If successful, you will see the following notification, and your variable state will be rolled back to that before your latest cell execution.



Only affects variable state: Undoing cell executions only affects the variable state (right). The code state (i.e., the cells you write, left) is untouched. This can be useful, for example, to 'un-drop' a dataframe column dropped by a cell while keeping the cell code itself intact:



Branching Notebook Workflow

Kishu can also be used to manage branching code and variable states; it supports making checkpoints of the notebook and variable state at any point during a notebook session, which can be returned to later via a checkout.

Committing to make a checkpoint: Kishu can store the current state of your notebook, including both the variable state and your code state, with the Kishu > Commit option under the Kishu tab (left) or in the Jupyter command palette (right). Alternatively, you can use the shortcut Ctrl+K then Ctrl+C / ⌘+K then ⌘+C.



You will be then prompted to enter a commit message. Use something memorable so you can easily return to this commit later:



You will see the below message if the commit is successful:



Checkout to a checkpoint: You can return to any session state that has been committed in the past with the Kishu > Checkout option under the Kishu tab (left) or in the Jupyter command palette (right). Alternatively, you can use the shortcut Ctrl+K then Ctrl+V / ⌘+K then ⌘+V.



This will bring up a menu for you to select the appropriate checkpoint:



You will see the below message if the checkout is successful:



Checking out will replace both the current variable and code state with that of the selected checkpoint (see below). It will also overwrite your current variable and code state; commit to make a checkpoint before checking out if you wish to keep your current notebook state.



Configuring Kishu

Kishu can be configured through editing the ~/.kishu/config.ini file. A full list of configurable options can be found here.

Supported Libraries

This is the current list of libraries, their versions, and their classes supported by Kishu:

- ✅ : supported: All changes to instances of this class are always captured.
- ❓ : unstable: Kishu may report changes on non-changes to instances of this class, i.e., false positives.
- ❌ : failing: Some changes to an instance of this class may not be captured.

    ✅ arrow==1.3.0, arrow.arrow.Arrow
    ✅ astropy==5.2.2, astropy.convolution.Box2DKernel
    ✅ astropy==5.2.2, astropy.convolution.Gaussian2DKernel
    ✅ astropy==5.2.2, astropy.io.fits.HDUList
    ✅ astropy==5.2.2, astropy.io.fits.PrimaryHDU
    ✅ astropy==5.2.2, astropy.modeling.fitting.LinearLSQFitter
    ✅ astropy==5.2.2, astropy.modeling.functional_models.Ellipse2D
    ✅ astropy==5.2.2, astropy.modeling.functional_models.Linear1D
    ✅ astropy==5.2.2, astropy.modeling.polynomial.Polynomial1D
    ✅ astropy==5.2.2, astropy.modeling.polynomial.Polynomial2D
    ✅ astropy==5.2.2, astropy.nddata.NDData
    ✅ astropy==5.2.2, astropy.nddata.NDDataRef
    ✅ astropy==5.2.2, astropy.stats.SigmaClip
    ✅ astropy==5.2.2, astropy.table.QTable
    ✅ astropy==5.2.2, astropy.units.Quantity
    ✅ astropy==5.2.2, astropy.visualization.PercentileInterval
    ✅ astropy==5.2.2, astropy.wcs.WCS
    ✅ bokeh==2.4.3, bokeh.plotting._figure.figure
    ✅ catboost==1.2.3, catboost
    ✅ dask==2023.5.0, dask
    ✅ dataprep==0.4.5, dataprep.datasets
    ✅ dataprep==0.4.5, dataprep.eda.intermediate.Intermediate
    ✅ dill==0.3.8, dill
    ✅ gensim==4.3.2, gensim
    ✅ gym==0.26.2, gym
    ✅ ipywidgets==7.8.5, ipywidgets
    ✅ keras==2.13.1, keras.src.initializers.initializers.RandomNormal
    ✅ keras==2.13.1, keras.src.initializers.initializers.RandomUniform
    ✅ keras==2.13.1, keras.src.layers.core.dense.Dense
    ✅ keras==2.13.1, keras.src.optimizers.schedules.learning_rate_schedule.ExponentialDecay
    ✅ lightgbm==4.3.0, lightgbm.basic.Dataset
    ✅ llm==0.13.1, llm.default_plugins.openai_models.Chat
    ✅ lmfit==1.2.2, lmfit.parameter.Parameters
    ✅ matplotlib==3.7.5, matplotlib.colors.ListedColormap
    ✅ matplotlib==3.7.5, matplotlib.dates.AutoDateFormatter
    ✅ matplotlib==3.7.5, matplotlib.dates.WeekdayLocator
    ✅ matplotlib==3.7.5, matplotlib.ticker.AutoLocator
    ✅ networkx==3.1, networkx.classes.digraph.DiGraph
    ✅ networkx==3.1, networkx.classes.graph.Graph
    ✅ nltk==3.8.1, nltk.stem.porter.PorterStemmer
    ✅ numpy==1.24.3, ast
    ✅ numpy==1.24.3, copy
    ✅ numpy==1.24.3, datetime.time
    ✅ numpy==1.24.3, datetime.timedelta
    ✅ numpy==1.24.3, hashlib
    ✅ numpy==1.24.3, itertools
    ✅ numpy==1.24.3, json
    ✅ numpy==1.24.3, numpy.ndarray
    ✅ numpy==1.24.3, numpy.ndarray
    ✅ numpy==1.24.3, pickle
    ✅ numpy==1.24.3, random.Random
    ✅ numpy==1.24.3, re.Pattern
    ✅ numpy==1.24.3, urllib.request.Request
    ✅ numpy==1.24.3, uuid.UUID
    ✅ opencv-python==4.9.0.80, cv2
    ✅ optuna==3.5.0, optuna.Study
    ✅ pandas==1.5.3, pandas.DataFrame
    ✅ pandas==1.5.3, pandas.Series
    ✅ pathlib==1.0.1, pathlib.PosixPath
    ✅ photoutils==0.0.1, photutils.psf.matching.CosineBellWindow
    ✅ photoutils==0.0.1, photutils.psf.matching.HanningWindow
    ✅ photoutils==0.0.1, photutils.utils.CutoutImage
    ✅ photoutils==0.0.1, photutils.utils.ImageDepth
    ✅ plotly==5.18.0, plotly.express
    ✅ plotly==5.18.0, plotly.figure_factory
    ✅ plotly==5.18.0, plotly.graph_objects
    ✅ plotly==5.18.0, plotly.graph_objs
    ✅ plotly==5.18.0, plotly.io
    ✅ plotly==5.18.0, plotly.offline
    ✅ plotly==5.18.0, plotly.subplots
    ✅ polars==0.14.29, polars.DataFrame
    ✅ prophet==1.1.5, prophet.Prophet
    ✅ pyspark==3.5.1, pyspark.sql
    ✅ qiskit==0.45.0, qiskit.QuantumCircuit
    ✅ scikit-image==0.21.0, skimage
    ✅ scikit-image==0.21.0, skimage.morphology
    ✅ scikit-learn==1.3.2, sklearn.cluster
    ✅ scikit-learn==1.3.2, sklearn.cluster
    ✅ scikit-learn==1.3.2, sklearn.compose
    ✅ scikit-learn==1.3.2, sklearn.datasets
    ✅ scikit-learn==1.3.2, sklearn.datasets
    ✅ scikit-learn==1.3.2, sklearn.decomposition
    ✅ scikit-learn==1.3.2, sklearn.discriminant_analysis
    ✅ scikit-learn==1.3.2, sklearn.dummy
    ✅ scikit-learn==1.3.2, sklearn.ensemble
    ✅ scikit-learn==1.3.2, sklearn.feature_extraction.text
    ✅ scikit-learn==1.3.2, sklearn.feature_selection
    ✅ scikit-learn==1.3.2, sklearn.impute
    ✅ scikit-learn==1.3.2, sklearn.impute
    ✅ scikit-learn==1.3.2, sklearn.impute
    ✅ scikit-learn==1.3.2, sklearn.kernel_ridge
    ✅ scikit-learn==1.3.2, sklearn.linear_model
    ✅ scikit-learn==1.3.2, sklearn.linear_model
    ✅ scikit-learn==1.3.2, sklearn.manifold
    ✅ scikit-learn==1.3.2, sklearn.metrics
    ✅ scikit-learn==1.3.2, sklearn.metrics.pairwise
    ✅ scikit-learn==1.3.2, sklearn.mixture
    ✅ scikit-learn==1.3.2, sklearn.model_selection
    ✅ scikit-learn==1.3.2, sklearn.multiclass
    ✅ scikit-learn==1.3.2, sklearn.naive_bayes
    ✅ scikit-learn==1.3.2, sklearn.neighbors
    ✅ scikit-learn==1.3.2, sklearn.neural_network
    ✅ scikit-learn==1.3.2, sklearn.pipeline
    ✅ scikit-learn==1.3.2, sklearn.preprocessing
    ✅ scikit-learn==1.3.2, sklearn.random_projection
    ✅ scikit-learn==1.3.2, sklearn.svm
    ✅ scikit-learn==1.3.2, sklearn.tree
    ✅ scikit-learn==1.3.2, sklearn.utils
    ✅ scipy==1.10.1, scipy.interpolate
    ✅ scipy==1.10.1, scipy.ndimage
    ✅ scipy==1.10.1, scipy.ndimage.interpolate
    ✅ scipy==1.10.1, scipy.optimize
    ✅ scipy==1.10.1, scipy.signal
    ✅ scipy==1.10.1, scipy.signal.windows
    ✅ scipy==1.10.1, scipy.sparse
    ✅ scipy==1.10.1, scipy.spatial
    ✅ scipy==1.10.1, scipy.spatial
    ✅ scipy==1.10.1, scipy.spatial.distance
    ✅ scipy==1.10.1, scipy.spatial.distance._hausdorff
    ✅ scipy==1.10.1, scipy.special
    ✅ scipy==1.10.1, scipy.stats
    ✅ statsmodels==0.14.1, statsmodels.api
    ✅ tensorflow==2.13.1, tensorflow
    ✅ tensorflow==2.13.1, tensorflow.keras.models
    ✅ tensorflow==2.13.1, tensorflow.keras.optimizers
    ✅ textblob==0.17.1, textblob.TextBlob
    ✅ torch==2.4.1, torch
    ✅ torch==2.4.1, torch.nn
    ✅ torch==2.4.1, torch.nn.functional
    ✅ torch==2.4.1, torch.utils.data
    ✅ transformers==4.38.2, huggingface
    ✅ transformers==4.38.2, transformers
    ✅ typing==3.7.4.3, typing
    ✅ wordcloud==1.9.3, wordcloud.WordCloud
    ❓ matplotlib==3.7.5, matplotlib.Axes
    ❓ matplotlib==3.7.5, matplotlib.Axes
    ❓ seaborn==0.13.0, seaborn
    ❓ torch==2.4.1, torch.optim
    ❓ polars==0.14.29, polars.LazyFrame
    ❓ matplotlib==3.7.5, matplotlib.colors.BoundaryNorm
    ❓ matplotlib==3.7.5, matplotlib.lines.Line2D
    ❓ matplotlib==3.7.5, matplotlib.patches.Ellipse
    ❓ matplotlib==3.7.5, matplotlib.patches.Arrow
    ❓ matplotlib==3.7.5, matplotlib.image.AxesImage
    ❓ matplotlib==3.7.5, matplotlib.image.FigureImage
    ❓ matplotlib==3.7.5, matplotlib.offsetbox.AnchoredOffsetbox
    ❓ astropy==5.2.2, astropy.visualization.mpl_normalize.ImageNormalize
    ❓ astropy==5.2.2, astropy.wcs.Celprm
    ❓ matplotlib==3.7.5, 'mpl_toolkits.mplot3d.art3d.Line3DCollection

Migrating Checkpoints

Kishu's per-notebook checkpoint files are stored under the ~/.kishu/ directory. For example, Kishu will store the checkpoint files for notebook with name Untitled under ~/.kishu/Untitled. These checkpoints can be copied, along with the notebook, to a new machine where Kishu (and other libraries identical to the source environment) is installed for elastic migration.

Limitations

Kishu may fail to correctly checkpoint notebook sessions containing the following items:

Silent Pickling Errors

Kishu relies on the assumption that any object, when pickled then unpickled, is identical to the original object, and does not automatically detect cases where this assumption is violated (i.e., silent pickling errors). This is typically caused by errors in the object class' reduce function which acts as its pickling instructions; for example, an object with the below reduction (incorrectly) returns an empty (byte)string when pickled.

  def __reduce__(self):
      return ""

As a potential workaround, you can add object classes with incorrect reductions to a blocklist in Kishu's config to inform it to never try to store (and always recompute) objects belonging to these classes.

Non-Deterministic and Unpicklable Objects

Kishu relies on cell replay to reconstruct unpicklable objects (e.g., generators). However, if the unpicklable object itself is created through non-deterministic means, Kishu will fail to exactly recreate it on undo/checkout, for example (assuming the seed for random was not set):

  nondet_gen = (i for i in range(random.randint(5, 10)))

Learn More

Kishu's efficiency is enabled by its low-overhead session state monitoring, deduplicated variable storage, and optimized recomputation-assisted checkout. Our papers on Kishu can be found here; don't forget to star our repository and cite our papers if you like our work!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published