Skip to content

Docker Image Management

Damon McCullough edited this page Oct 28, 2024 · 6 revisions

Our images

We have 5 docker images, stored in dockerhub (along with some more out-of-date images we used to maintain).

base - image with python installed, our standard command line utilities installed, and (currently) my forked gdal installed. This was created because we needed my gdal fork due to a bug in 3.8 (which seems maybe fixed now - 3.9.1 has resolved this bug but has changed api just slightly, breaking library), and it takes a long time to build gdal from source, so it made sense to create a "base" image that rarely gets updated.

base feeds into 3 of our other four images. Let's call these our "production" images

  • build-base: base image, with python packages needed for build and dcpy installed
  • build-geosupport: ", geosupport desktop and python installed
  • dev: ", with tools like sqlfluff, black, mypy, etc. Used for dev container and running tests

Then we also have docker-geosupport. This is a bare container with python and geosupport. This is potentially slightly more public-facing. It doesn't pull from build base, nor do we build "dev" versions of it.

How we maintain them - code

Code for our images is mostly contained in our docker folder within admin/run_environments. In this folder, we have

  • five folders, one for each docker image. Each of these has
    • Dockerfile
    • setup.sh - if logic complex enough that it would be clunky to handle in the dockerfile, it can be run in a script here
      • either Dockerfile or setup.sh can reference config.sh, which is always copied into the build folder and reference variables and/or call utilities
    • requirements.txt - python requirements. Versions should not be specified, as those will be determined from python/constraints.txt (relative to root of repo). keep in mind that 3 of these images are built from base, so changes to base folder will be reflected in others.
  • config.sh - bash script that contains shared logic that's used by multiple containers. Utils like "install geosupport", string variables like "DEV_APT_PACKAGES".
  • delete.sh - a standalone bash script to delete a specific tag of an image on dockerhub. Used in a recurring github action to clean out unused dev images (see section below on dev images)
  • geosupport_versions.py - python script to lookup latest geosupport version and print it to stdout for use in bash scripts
  • publish.sh - a standalone bash script for publishing these images. Expects two required arguments, one optional
    • image - name of image
    • tag - tag of image being published
    • base_tag - only relevant for build-base, build-geosupport, and dev. tag of base image to build off of the script takes these arguments, figures out which image is being built, copies needed files into the relevant folder (our python constraints and config.sh, determines geosupport version if necessary, and builds the image with the given tag and pushes to dockerhub

If you're looking to add functionality to our images, you should likely put it either in config.sh and then use appropriately within the 3 primarily used images (2 build images and dev), or in base folder. However, keep in mind that base takes longer to build, so if it's something we'll likely want to iterate on, it's easier to do so in config.sh and the downstream 3 images. In general, for installs, gdal (in base) and geosupport (in config.sh) are good examples of how we go about defining these things.

It's very possible we simplify these down the road and move towards fewer images. With the growth of dcpy the build images are getting bigger, and we're not getting as much out of having some slightly lighter-weight images.

Dev Images (NOT dev containers!)

We maybe should have chosen a different word to avoid the confusion. These are not images that you use to build a dev container locally, but rather tags of our images specific to dev/feature branches

Dev images are built automatically as part of PR tests if relevant code changes have been made in the PR.

Loading
flowchart
1[Changes to admin/run_environment/docker/base in pr?] -- Yes --> 2[Changes to admin/run_environment/docker/base since last push?]
1 -- No --> 3[Use base:latest]
2 -- Yes --> 11["Build and use base:dev-{branch}"]
3 --> 6[Changes to admin/run_environment in pr?]
6 -- No --> 7[Use build-geosupport/dev:latest]
6 -- Yes --> 8[Changes to admin/run_environment since last push?]
8 -- No --> 9["Use existing build-geosupport/dev:dev-{branch}"]
8 -- Yes --> 10["Build and use build-geosupport/dev:dev-{branch}"]
2 -- No --> 4["Use existing base:dev-{branch}"]
4 --> 8
11 --> 8

Given that the test checks changes since last push (to determine if it should rebuild the image) and that we cancel tests on a new push, you can theoretically run into a situation where your latest push looks for an existing dev image that doesn't actually exist. In this case, it's easiest to just close and reopen your PR, and all tests will re-run.