Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify docker release process in the release pipeline #9928

Merged
merged 6 commits into from
Apr 15, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .changes/unreleased/Under the Hood-20240412-174824.yaml
mikealfare marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
kind: Under the Hood
body: Update Docker release process to reflect distributed release workflows
time: 2024-04-12T17:48:24.29846-04:00
custom:
Author: mikealfare
Issue: "9928"
133 changes: 28 additions & 105 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,133 +1,56 @@
##
# Generic dockerfile for dbt image building.
# See README for operational details
##

# Top level build args
ARG build_for=linux/amd64

##
# base image (abstract)
##
# Please do not upgrade beyond python3.10.7 currently as dbt-spark does not support
# 3.11py and images do not get made properly
FROM --platform=$build_for python:3.10.7-slim-bullseye as base

# N.B. The refs updated automagically every release via bumpversion
ARG [email protected]
ARG [email protected]
ARG [email protected]
ARG [email protected]
ARG [email protected]
ARG [email protected]
# special case args
ARG dbt_spark_version=all
ARG dbt_third_party
# this image gets published to GHCR for production use
ARG py_version=3.10.7

FROM python:$py_version-slim-bullseye as base

# System setup
RUN apt-get update \
&& apt-get dist-upgrade -y \
&& apt-get install -y --no-install-recommends \
git \
ssh-client \
software-properties-common \
make \
build-essential \
ca-certificates \
libpq-dev \
build-essential=12.9 \
ca-certificates=20210119 \
git=1:2.30.2-1+deb11u2 \
libpq-dev=13.14-0+deb11u1 \
make=4.3-4.1 \
openssh-client=1:8.4p1-5+deb11u3 \
software-properties-common=0.96.20.2-2.1 \
&& apt-get clean \
&& rm -rf \
/var/lib/apt/lists/* \
/tmp/* \
/var/tmp/*

# Env vars
ENV PYTHONIOENCODING=utf-8
ENV LANG=C.UTF-8

# Update python
RUN python -m pip install --upgrade pip setuptools wheel --no-cache-dir
RUN python -m pip install --upgrade "pip==24.0" "setuptools==69.2.0" "wheel==0.43.0" --no-cache-dir

# Set docker basics
WORKDIR /usr/app/dbt/
ENTRYPOINT ["dbt"]

##
# dbt-core
##
FROM base as dbt-core
RUN python -m pip install --no-cache-dir "git+https://github.com/dbt-labs/${dbt_core_ref}#egg=dbt-core&subdirectory=core"

##
# dbt-postgres
##
FROM base as dbt-postgres
RUN python -m pip install --no-cache-dir "git+https://github.com/dbt-labs/${dbt_postgres_ref}#egg=dbt-postgres&subdirectory=plugins/postgres"
ARG commit_ref=main

HEALTHCHECK CMD dbt --version || exit 1

##
# dbt-redshift
##
FROM base as dbt-redshift
RUN python -m pip install --no-cache-dir "git+https://github.com/dbt-labs/${dbt_redshift_ref}#egg=dbt-redshift"
WORKDIR /usr/app/dbt/
ENTRYPOINT ["dbt"]

RUN python -m pip install --no-cache-dir "dbt-core @ git+https://github.com/dbt-labs/dbt-core@${commit_ref}#subdirectory=core"

##
# dbt-bigquery
##
FROM base as dbt-bigquery
RUN python -m pip install --no-cache-dir "git+https://github.com/dbt-labs/${dbt_bigquery_ref}#egg=dbt-bigquery"

FROM base as dbt-postgres

##
# dbt-snowflake
##
FROM base as dbt-snowflake
RUN python -m pip install --no-cache-dir "git+https://github.com/dbt-labs/${dbt_snowflake_ref}#egg=dbt-snowflake"
ARG commit_ref=main

##
# dbt-spark
##
FROM base as dbt-spark
RUN apt-get update \
&& apt-get dist-upgrade -y \
&& apt-get install -y --no-install-recommends \
python-dev \
libsasl2-dev \
gcc \
unixodbc-dev \
&& apt-get clean \
&& rm -rf \
/var/lib/apt/lists/* \
/tmp/* \
/var/tmp/*
RUN python -m pip install --no-cache-dir "git+https://github.com/dbt-labs/${dbt_spark_ref}#egg=dbt-spark[${dbt_spark_version}]"
HEALTHCHECK CMD dbt --version || exit 1

WORKDIR /usr/app/dbt/
ENTRYPOINT ["dbt"]

RUN python -m pip install --no-cache-dir "dbt-postgres @ git+https://github.com/dbt-labs/dbt-core@${commit_ref}#subdirectory=plugins/postgres"


##
# dbt-third-party
##
FROM dbt-core as dbt-third-party
RUN python -m pip install --no-cache-dir "${dbt_third_party}"

##
# dbt-all
##
FROM base as dbt-all
RUN apt-get update \
&& apt-get dist-upgrade -y \
&& apt-get install -y --no-install-recommends \
python-dev \
libsasl2-dev \
gcc \
unixodbc-dev \
&& apt-get clean \
&& rm -rf \
/var/lib/apt/lists/* \
/tmp/* \
/var/tmp/*
RUN python -m pip install --no-cache "git+https://github.com/dbt-labs/${dbt_redshift_ref}#egg=dbt-redshift"
RUN python -m pip install --no-cache "git+https://github.com/dbt-labs/${dbt_bigquery_ref}#egg=dbt-bigquery"
RUN python -m pip install --no-cache "git+https://github.com/dbt-labs/${dbt_snowflake_ref}#egg=dbt-snowflake"
RUN python -m pip install --no-cache "git+https://github.com/dbt-labs/${dbt_spark_ref}#egg=dbt-spark[${dbt_spark_version}]"
RUN python -m pip install --no-cache "git+https://github.com/dbt-labs/${dbt_postgres_ref}#egg=dbt-postgres&subdirectory=plugins/postgres"
ARG dbt_third_party

RUN python -m pip install --no-cache-dir "${dbt_third_party}"
69 changes: 9 additions & 60 deletions docker/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,9 @@ This docker file is suitable for building dbt Docker images locally or using wit
## Building an image:
This Dockerfile can create images for the following targets, each named after the database they support:
* `dbt-core` _(no db-adapter support)_
* `dbt-postgres`
* `dbt-redshift`
* `dbt-bigquery`
* `dbt-snowflake`
* `dbt-spark`
* `dbt-third-party` _(requires additional build-arg)_
* `dbt-all` _(installs all of the above in a single image)_

For platform-specific images, please refer to that platform's repository (eg. `dbt-labs/dbt-postgres`)

In order to build a new image, run the following docker command.
```
Expand All @@ -22,53 +18,27 @@ docker build --tag <your_image_name> --target <target_name> <path/to/dockerfile>

---

By default the images will be populated with the most recent release of `dbt-core` and whatever database adapter you select. If you need to use a different version you can specify it by git ref using the `--build-arg` flag:
By default the images will be populated with `dbt-core` on `main`.
If you need to use a different version you can specify it by git ref (tag, branch, sha) using the `--build-arg` flag:
```
docker build --tag <your_image_name> \
--target <target_name> \
--build-arg <arg_name>=<git_ref> \
--build-arg commit_ref=<git_ref> \
<path/to/dockerfile>
```
valid arg names for versioning are:
* `dbt_core_ref`
* `dbt_postgres_ref`
* `dbt_redshift_ref`
* `dbt_bigquery_ref`
* `dbt_snowflake_ref`
* `dbt_spark_ref`

---
>**NOTE:** Only override a _single_ build arg for each build. Using multiple overrides may lead to a non-functioning image.

---

If you wish to build an image with a third-party adapter you can use the `dbt-third-party` target. This target requires you provide a path to the adapter that can be processed by `pip` by using the `dbt_third_party` build arg:
If you wish to build an image with a third-party adapter you can use the `dbt-third-party` target.
This target requires you provide a path to the adapter that can be processed by `pip` by using the `dbt_third_party` build arg:
```
docker build --tag <your_image_name> \
--target dbt-third-party \
--build-arg dbt_third_party=<pip_parsable_install_string> \
<path/to/dockerfile>
```
This can also be combined with the `commit_ref` build arg to specify a version of `dbt-core`.

### Examples:
To build an image named "my-dbt" that supports redshift using the latest releases:
```
cd dbt-core/docker
docker build --tag my-dbt --target dbt-redshift .
```

To build an image named "my-other-dbt" that supports bigquery using `dbt-core` version 0.21.latest and the bigquery adapter version 1.0.0b1:
```
cd dbt-core/docker
docker build \
--tag my-other-dbt \
--target dbt-bigquery \
--build-arg [email protected] \
--build-arg [email protected] \
.
```

To build an image named "my-third-party-dbt" that uses [Materilize third party adapter](https://github.com/MaterializeInc/materialize/tree/main/misc/dbt-materialize) and the latest release of `dbt-core`:
To build an image named "my-third-party-dbt" that uses the latest release of [Materialize third party adapter](https://github.com/MaterializeInc/materialize/tree/main/misc/dbt-materialize) and the latest dev version of `dbt-core`:
```
cd dbt-core/docker
docker build --tag my-third-party-dbt \
Expand All @@ -78,27 +48,6 @@ docker build --tag my-third-party-dbt \
```


## Special cases
There are a few special cases worth noting:
* The `dbt-spark` database adapter comes in three different versions named `PyHive`, `ODBC`, and the default `all`. If you wish to overide this you can use the `--build-arg` flag with the value of `dbt_spark_version=<version_name>`. See the [docs](https://docs.getdbt.com/reference/warehouse-profiles/spark-profile) for more information.

```
docker build --tag my_dbt \
--target dbt-postgres \
--build-arg [email protected] \
<path/to/dockerfile>
```

* If you need to build against another architecture (linux/arm64 in this example) you can overide the `build_for` build arg:
```
docker build --tag my_dbt \
--target dbt-postgres \
--build-arg build_for=linux/arm64 \
<path/to/dockerfile>
```

Supported architectures can be found in the python docker [dockerhub page](https://hub.docker.com/_/python).

## Running an image in a container:
The `ENTRYPOINT` for this Dockerfile is the command `dbt` so you can bind-mount your project to `/usr/app` and use dbt as normal:
```
Expand Down