Skip to content

nschung/jupyterlab-docker

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OKDP Jupyter Images

Build, test, tag, and push jupyter images

OKDP jupyter docker images based on jupyter docker-stacks source dockerfiles. It includes (read only copy) jupyter docker-stacks repository as a git-subtree sub project.

The project leverages the features provided by jupyter docker-stacks:

  • Build from the original source docker files
  • Customize the images by using docker build-arg build arguments
  • Run the original tests at every pipeline trigger

The project provides an up to date jupyter lab images especially for pyspark.

Images build workflow

Build/Test

The main build pipeline contains 6 main reusable workflows:

  1. build-test-base: docker-stacks-foundation, base-notebook, minimal-notebook, scipy-notebook
  2. build-test-datascience: r-notebook, julia-notebook, tensorflow-notebook, pytorch-notebook
  3. build-test-spark: pyspark-notebook, all-spark-notebook
  4. tag-push: push the built images to the container registry (main branch only)
  5. auto-rerun: partially re-run jobs in case of failures (github runner issues/main branch only)
  6. unit-tests: run the unit tests (okdp extension) at every pipeline trigger

build pipeline

The build is based on the version compatibility matrix.

The build-matrix section defines the components versions to build. It behaves like a filter of the parent compatibility-matrix section to limit the versions combintations to build. The build process ensures only the compatible versions are built:

For example, the following build-matrix:

build-matrix:
  python_version: ['3.9', '3.10', '3.11']
  spark_version: [3.2.4, 3.3.4, 3.4.2, 3.5.0]
  java_version: [11, 17]
  scala_version: [2.12]

Will build the following versions combinations in regards to compatibility-matrix section:

  • spark3.3.4-python3.10-java17-scala2.12
  • spark3.5.0-python3.11-java17-scala2.12
  • spark3.4.2-python3.11-java17-scala2.12
  • spark3.2.4-python3.9-java11-scala2.12

By default, if no filter is specified:

build-matrix:

All compatible versions combinations are built.

Finally, all the images are tested against the original tests at every pipeline trigger

Push

Development images with tags -<GIT-BRANCH>-latest suffix (ex.: spark3.2.4-python3.9-java11-scala2.12--latest) are produced at every pipeline run regardless of the git branch (main or not).

The official images are pushed to the container registry when:

  1. The workflow is triggered on the main branch only and
  2. The tests are completed successfully

This prevents pull requests or developement branchs to push the official images before they are reviewed or tested. It also provides the flexibility to test against developement images -<GIT-BRANCH>-latest before they are officially pushed.

Tagging

The project builds the images with a long format tags. Each tag combines multiple compatible versions combinations.

There are multiple tags levels and the format to use is depending on your convenience in term of stability and reproducibility.

Here are some examples:

scipy-notebook:

  • python-3.11-2024-02-06
  • python-3.11.7-2024-02-06
  • python-3.11.7-hub-4.0.2-lab-4.1.0
  • python-3.11.7-hub-4.0.2-lab-4.1.0-2024-02-06

datascience-notebook:

  • python-3.9-2024-02-06
  • python-3.9.18-2024-02-06
  • python-3.9.18-hub-4.0.2-lab-4.1.0
  • python-3.9.18-hub-4.0.2-lab-4.1.0-2024-02-06
  • python-3.9.18-r-4.3.2-julia-1.10.0-2024-02-06
  • python-3.9.18-r-4.3.2-julia-1.10.0-hub-4.0.2-lab-4.1.0
  • python-3.9.18-r-4.3.2-julia-1.10.0-hub-4.0.2-lab-4.1.0-2024-02-06

pyspark-notebook:

  • spark-3.5.0-python-3.11-java-17-scala-2.12
  • spark-3.5.0-python-3.11-java-17-scala-2.12-2024-02-06
  • spark-3.5.0-python-3.11.7-java-17.0.9-scala-2.12.18-hub-4.0.2-lab-4.1.0
  • spark-3.5.0-python-3.11.7-java-17.0.9-scala-2.12.18-hub-4.0.2-lab-4.1.0-2024-02-06
  • spark-3.5.0-python-3.11.7-r-4.3.2-java-17.0.9-scala-2.12.18-hub-4.0.2-lab-4.1.0
  • spark-3.5.0-python-3.11.7-r-4.3.2-java-17.0.9-scala-2.12.18-hub-4.0.2-lab-4.1.0-2024-02-06

Please, check the container registry for more images and tags.

Running github actions

Github container registry credentials

Create the following secrets and configuration variables when running with your own github account or organization:

Variable Type Default Description
REGISTRY Configuration variable ghcr.io Container registry
REGISTRY_USERNAME Secret variable Container registry username
REGISTRY_ROBOT_TOKEN Secret variable Container registry password or access token (Scopes: write:packages/delete:packages)

Running with Github

By default, the workflow runs automatically on the following events:

  • Push on the main branch with changes on the configured paths filters
  • Pull request on any branch

Running locally with act

Act can be used to build and test locally.

Here is an example command:

$ act  --container-architecture linux/amd64  \
       -W .github/workflows/main.yml \
       --env ACT_SKIP_TESTS=<true|false> \
       --var REGISTRY=ghcr.io  \
       --secret REGISTRY_USERNAME=<GITHUB_OWNER> \
       --secret REGISTRY_ROBOT_TOKEN=<GITHUB_CONTAINER_REGISTRY_TOKEN>
       --rm

set the option --container-architecture linux/amd64 if you are running locally with Apple's M1/M2 chips.

For more information:

$ act  --help

OKDP custom extensions

  1. Tagging extension is based on the original jupyter docker-stacks source files
  2. Patchs patchs the original jupyter docker-stacks in order to run the tests
  3. Version compatibility matrix to generate all the compatible versions combintations for pyspark
  4. Unit tests in order to test okdp extension at every pipeline run

About

okdp jupyterlab docker images

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 66.9%
  • Dockerfile 13.5%
  • Shell 12.3%
  • Makefile 4.3%
  • Jupyter Notebook 3.0%