Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merging dev-rafidka into main #41

Merged
merged 12 commits into from
Apr 22, 2024
Merged

Merging dev-rafidka into main #41

merged 12 commits into from
Apr 22, 2024

Conversation

rafidka
Copy link
Contributor

@rafidka rafidka commented Feb 21, 2024

Issue #, if available: Multiple issues specified by the multiple commits.

Description of changes: 9 Commits, which can be seen below.


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Docker Compose setup for Airflow 2.8.0
To aid with development, I am introducing different image types,
controlled by the following two build arguments:

- `build_type`: This build argument has three different possible values:
  - `standard`: This is the standard build type. it is what customer
    uses.
  - `explorer`: The 'explorer' build type is almost identical to the
    'standard' build type but it doesn't include the entrypoint. This is
    useful for debugging purposes to run the image and look around its
    content without starting airflow, which might require further setup.
  - `explorer-root`: This is similar to the 'explorer' build type, but
    additionally uses the root user, giving the user of this Docker
    image elevated permissions.  The user can, thus, install packages,
    remove packages, or anything else.
- `dev`: When this build argument is set to True, it will result in
  additional packages being installed to aid with development.

For each combination of these two build arguments, a different Docker
image is generated. Thus, we are currently generating these images:

- `amazon-mwaa/airflow:2.8.0`
- `amazon-mwaa/airflow:2.8.0-dev`
- `amazon-mwaa/airflow:2.8.0-explorer`
- `amazon-mwaa/airflow:2.8.0-explorer-dev`
- `amazon-mwaa/airflow:2.8.0-explorer-privileged`
- `amazon-mwaa/airflow:2.8.0-explorer-privileged-dev`
All Dockerfiles share most of the steps, apart from a couple of steps at
the end. As such, to cut build time, I extracted the common steps into a
separate Docker imaeg that all other images build on top of it.
More information about this command can be found in #18.
To make sure that developers don't accidentally use `pip install`
directly, I implemented a script that scans the whole repository for
this and report an error if it finds any such case.

While at this, I also introduced the `quality-checks` folder, which
contains all scripts for ensuring the quality of the repository. I moved
`lint_bash.sh` and `lint_python.sh` and I put the new script,
`pip_install_check.py` under it. This way we have a central place for
all such quality check scripts, which are only expected to multiple in
number as the repository good bigger, more contributors are involved,
and more quality control is required. The new `quality-checks` folder
also contains a script, `run_all.py` that walk through the
`quality-checks` directory and execute any executable script.

Accordingly, I also updated the GitHub workflows and pre-commit
configuration to use the `run_all.py` script instead of manually listing
all quality check scripts.
- Now using VSCode workspace. Not only does this improve the repo navigation, but
  also allow using multiple Python interpreters, which is required since we use different
  Python requirements for the repo code vs the Docker images code.
- Use Pyright for type checking.
- Use ruff for Python linting.
These open source Docker images will be used both externally by our
customers willing to experiment with the images in native Docker and
internally within an Amazon MWAA setup (which relies on Fargate.) This
commit involves multiple small changes to make this possible:

- Introduced a `/healthcheck.sh` script which is used by Fargate to
  monitor health status. This script currently always return success
  status (0 code) just to make the integration possible. In the future,
  we need to:
  - Improve this script to do some real checks.
  - Move this script to a better location (scripts shouldn't be placed
    at the root.)
- Supported reading database credentials from a JSON-formatted
  environment variable, `MWAA__DB__CREDENTIALS`, containing the username
  and password. This is needed because Amazon MWAA employs Secrets
  Manager to pass the credentials safely to the Fargate container in a
  JSON-formatted object.

During the work on this, I temporarily downgraded the Airflow version to
2.7.2 since this a version we internally support, which should make the
testing easier.
Mercury2699
Mercury2699 previously approved these changes Feb 21, 2024
To make the setup work without having to have an actual SQS account, I
made the necessary changes to use a local SQS queue server served by
elasticmq.
To aim for higher quality of the code, I added pydoctsyle to our quality
checks. This will enforce documenting all code.
Mercury2699
Mercury2699 previously approved these changes Feb 26, 2024
create_venvs.py Outdated Show resolved Hide resolved
quality-checks/run_all.py Show resolved Hide resolved
quality-checks/run_all.py Show resolved Hide resolved
quality-checks/run_all.py Show resolved Hide resolved
quality-checks/lint_python.sh Show resolved Hide resolved
images/airflow/2.8.0/python/mwaa/config/database.py Outdated Show resolved Hide resolved
images/airflow/2.8.0/dags/hello_world.py Show resolved Hide resolved
.pre-commit-config.yaml Show resolved Hide resolved
create_venvs.py Outdated Show resolved Hide resolved
images/airflow/2.8.0/python/mwaa/entrypoint.py Outdated Show resolved Hide resolved
create_venvs.py Show resolved Hide resolved
quality-checks/lint_bash.sh Show resolved Hide resolved
quality-checks/pip_install_check.py Outdated Show resolved Hide resolved
quality-checks/run_all.py Show resolved Hide resolved
quality-checks/run_all.py Show resolved Hide resolved
images/airflow/generate-dockerfiles.py Show resolved Hide resolved
rafidka added a commit that referenced this pull request Apr 19, 2024
* Checked for major version in verify_python_version
* More documentation in `generate_base_dockerfile`
* Bumped version to 2.9.0
* Support passing SSL mode for Postgres connection.
* Downgraded to Python 3.11.9 since we don't want to go to Python 3.12
  before sufficient adoption.
* Remove version pinning for Amazon providers since this is covered by
  the Airflow constraints file.
* Update the `requirements.txt` used for development. Removed all but
  the requirements we want, and left the rest for pip to intsall
  automatically. This makes updating the file easier.
* `db_lock` method: renamed `timeout` to `timeout_ms` for clarity.
* Check for both `pip install` and `pip3 install` in
  `pip_install_check.py`.
rafidka added a commit that referenced this pull request Apr 19, 2024
* Checked for major version in verify_python_version
* More documentation in `generate_base_dockerfile`
* Bumped version to 2.9.0
* Support passing SSL mode for Postgres connection.
* Downgraded to Python 3.11.9 since we don't want to go to Python 3.12
  before sufficient adoption.
* Remove version pinning for Amazon providers since this is covered by
  the Airflow constraints file.
* Update the `requirements.txt` used for development. Removed all but
  the requirements we want, and left the rest for pip to intsall
  automatically. This makes updating the file easier.
* `db_lock` method: renamed `timeout` to `timeout_ms` for clarity.
* Check for both `pip install` and `pip3 install` in
  `pip_install_check.py`.
rafidka added a commit that referenced this pull request Apr 19, 2024
* Checked for major version in verify_python_version
* More documentation in `generate_base_dockerfile`
* Bumped version to 2.9.0
* Support passing SSL mode for Postgres connection.
* Downgraded to Python 3.11.9 since we don't want to go to Python 3.12
  before sufficient adoption.
* Remove version pinning for Amazon providers since this is covered by
  the Airflow constraints file.
* Update the `requirements.txt` used for development. Removed all but
  the requirements we want, and left the rest for pip to intsall
  automatically. This makes updating the file easier.
* `db_lock` method: renamed `timeout` to `timeout_ms` for clarity.
* Check for both `pip install` and `pip3 install` in
  `pip_install_check.py`.
* Checked for major version in verify_python_version
* More documentation in `generate_base_dockerfile`
* Bumped version to 2.9.0
* Support passing SSL mode for Postgres connection.
* Downgraded to Python 3.11.9 since we don't want to go to Python 3.12
  before sufficient adoption.
* Remove version pinning for Amazon providers since this is covered by
  the Airflow constraints file.
* Update the `requirements.txt` used for development. Removed all but
  the requirements we want, and left the rest for pip to intsall
  automatically. This makes updating the file easier.
* `db_lock` method: renamed `timeout` to `timeout_ms` for clarity.
* Check for both `pip install` and `pip3 install` in
  `pip_install_check.py`.
* Support an allowlist in `pip_install_check.py` in case some scripts
  need to use `pip install` directly, e.g. the script to install Python
  since it needs to update `pip`.
@rafidka rafidka merged commit 38e0d65 into main Apr 22, 2024
2 checks passed
@rafidka rafidka deleted the dev-rafidka branch April 25, 2024 16:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants