OKDP jupyter docker images based on jupyter docker-stacks source dockerfiles. It includes (read only copy) jupyter docker-stacks repository as a git-subtree sub project.
The project leverages the features provided by jupyter docker-stacks:
- Build from the original source docker files
- Customize the images by using docker
build-arg
build arguments - Run the original tests at every pipeline trigger
The project provides an up to date jupyter lab images especially for pyspark.
The main build pipeline contains 6 main reusable workflows:
- build-test-base: docker-stacks-foundation, base-notebook, minimal-notebook, scipy-notebook
- build-test-datascience: r-notebook, julia-notebook, tensorflow-notebook, pytorch-notebook
- build-test-spark: pyspark-notebook, all-spark-notebook
- tag-push: push the built images to the container registry (main branch only)
- auto-rerun: partially re-run jobs in case of failures (github runner issues/main branch only)
- unit-tests: run the unit tests (okdp extension) at every pipeline trigger
The build is based on the version compatibility matrix.
The build-matrix section defines the components versions to build. It behaves like a filter of the parent compatibility-matrix section to limit the versions combintations to build. The build process ensures only the compatible versions are built:
For example, the following build-matrix:
build-matrix:
python_version: ['3.9', '3.10', '3.11']
spark_version: [3.2.4, 3.3.4, 3.4.2, 3.5.0]
java_version: [11, 17]
scala_version: [2.12]
Will build the following versions combinations in regards to compatibility-matrix section:
- spark3.3.4-python3.10-java17-scala2.12
- spark3.5.0-python3.11-java17-scala2.12
- spark3.4.2-python3.11-java17-scala2.12
- spark3.2.4-python3.9-java11-scala2.12
By default, if no filter is specified:
build-matrix:
All compatible versions combinations are built.
Finally, all the images are tested against the original tests at every pipeline trigger
Development images with tags -<GIT-BRANCH>-latest
suffix (ex.: spark3.2.4-python3.9-java11-scala2.12--latest) are produced at every pipeline run regardless of the git branch (main or not).
The official images are pushed to the container registry when:
- The workflow is triggered on the main branch only and
- The tests are completed successfully
This prevents pull requests or developement branchs to push the official images before they are reviewed or tested. It also provides the flexibility to test against developement images -<GIT-BRANCH>-latest
before they are officially pushed.
The project builds the images with a long format tags. Each tag combines multiple compatible versions combinations.
There are multiple tags levels and the format to use is depending on your convenience in term of stability and reproducibility.
Here are some examples:
- python-3.11-2024-02-06
- python-3.11.7-2024-02-06
- python-3.11.7-hub-4.0.2-lab-4.1.0
- python-3.11.7-hub-4.0.2-lab-4.1.0-2024-02-06
- python-3.9-2024-02-06
- python-3.9.18-2024-02-06
- python-3.9.18-hub-4.0.2-lab-4.1.0
- python-3.9.18-hub-4.0.2-lab-4.1.0-2024-02-06
- python-3.9.18-r-4.3.2-julia-1.10.0-2024-02-06
- python-3.9.18-r-4.3.2-julia-1.10.0-hub-4.0.2-lab-4.1.0
- python-3.9.18-r-4.3.2-julia-1.10.0-hub-4.0.2-lab-4.1.0-2024-02-06
- spark-3.5.0-python-3.11-java-17-scala-2.12
- spark-3.5.0-python-3.11-java-17-scala-2.12-2024-02-06
- spark-3.5.0-python-3.11.7-java-17.0.9-scala-2.12.18-hub-4.0.2-lab-4.1.0
- spark-3.5.0-python-3.11.7-java-17.0.9-scala-2.12.18-hub-4.0.2-lab-4.1.0-2024-02-06
- spark-3.5.0-python-3.11.7-r-4.3.2-java-17.0.9-scala-2.12.18-hub-4.0.2-lab-4.1.0
- spark-3.5.0-python-3.11.7-r-4.3.2-java-17.0.9-scala-2.12.18-hub-4.0.2-lab-4.1.0-2024-02-06
Please, check the container registry for more images and tags.
Create the following secrets and configuration variables when running with your own github account or organization:
Variable | Type | Default | Description |
---|---|---|---|
REGISTRY |
Configuration variable | ghcr.io | Container registry |
REGISTRY_USERNAME |
Secret variable | Container registry username | |
REGISTRY_ROBOT_TOKEN |
Secret variable | Container registry password or access token (Scopes: write:packages/delete:packages) |
By default, the workflow runs automatically on the following events:
- Push on the main branch with changes on the configured
paths
filters - Pull request on any branch
Act can be used to build and test locally.
Here is an example command:
$ act --container-architecture linux/amd64 \
-W .github/workflows/main.yml \
--env ACT_SKIP_TESTS=<true|false> \
--var REGISTRY=ghcr.io \
--secret REGISTRY_USERNAME=<GITHUB_OWNER> \
--secret REGISTRY_ROBOT_TOKEN=<GITHUB_CONTAINER_REGISTRY_TOKEN>
--rm
set the option --container-architecture linux/amd64
if you are running locally with Apple's M1/M2 chips.
For more information:
$ act --help
- Tagging extension is based on the original jupyter docker-stacks source files
- Patchs patchs the original jupyter docker-stacks in order to run the tests
- Version compatibility matrix to generate all the compatible versions combintations for pyspark
- Unit tests in order to test okdp extension at every pipeline run