-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merging dev-rafidka into main #41
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Docker Compose setup for Airflow 2.8.0
To aid with development, I am introducing different image types, controlled by the following two build arguments: - `build_type`: This build argument has three different possible values: - `standard`: This is the standard build type. it is what customer uses. - `explorer`: The 'explorer' build type is almost identical to the 'standard' build type but it doesn't include the entrypoint. This is useful for debugging purposes to run the image and look around its content without starting airflow, which might require further setup. - `explorer-root`: This is similar to the 'explorer' build type, but additionally uses the root user, giving the user of this Docker image elevated permissions. The user can, thus, install packages, remove packages, or anything else. - `dev`: When this build argument is set to True, it will result in additional packages being installed to aid with development. For each combination of these two build arguments, a different Docker image is generated. Thus, we are currently generating these images: - `amazon-mwaa/airflow:2.8.0` - `amazon-mwaa/airflow:2.8.0-dev` - `amazon-mwaa/airflow:2.8.0-explorer` - `amazon-mwaa/airflow:2.8.0-explorer-dev` - `amazon-mwaa/airflow:2.8.0-explorer-privileged` - `amazon-mwaa/airflow:2.8.0-explorer-privileged-dev`
All Dockerfiles share most of the steps, apart from a couple of steps at the end. As such, to cut build time, I extracted the common steps into a separate Docker imaeg that all other images build on top of it.
More information about this command can be found in #18.
To make sure that developers don't accidentally use `pip install` directly, I implemented a script that scans the whole repository for this and report an error if it finds any such case. While at this, I also introduced the `quality-checks` folder, which contains all scripts for ensuring the quality of the repository. I moved `lint_bash.sh` and `lint_python.sh` and I put the new script, `pip_install_check.py` under it. This way we have a central place for all such quality check scripts, which are only expected to multiple in number as the repository good bigger, more contributors are involved, and more quality control is required. The new `quality-checks` folder also contains a script, `run_all.py` that walk through the `quality-checks` directory and execute any executable script. Accordingly, I also updated the GitHub workflows and pre-commit configuration to use the `run_all.py` script instead of manually listing all quality check scripts.
- Now using VSCode workspace. Not only does this improve the repo navigation, but also allow using multiple Python interpreters, which is required since we use different Python requirements for the repo code vs the Docker images code. - Use Pyright for type checking. - Use ruff for Python linting.
These open source Docker images will be used both externally by our customers willing to experiment with the images in native Docker and internally within an Amazon MWAA setup (which relies on Fargate.) This commit involves multiple small changes to make this possible: - Introduced a `/healthcheck.sh` script which is used by Fargate to monitor health status. This script currently always return success status (0 code) just to make the integration possible. In the future, we need to: - Improve this script to do some real checks. - Move this script to a better location (scripts shouldn't be placed at the root.) - Supported reading database credentials from a JSON-formatted environment variable, `MWAA__DB__CREDENTIALS`, containing the username and password. This is needed because Amazon MWAA employs Secrets Manager to pass the credentials safely to the Fargate container in a JSON-formatted object. During the work on this, I temporarily downgraded the Airflow version to 2.7.2 since this a version we internally support, which should make the testing easier.
Mercury2699
previously approved these changes
Feb 21, 2024
Mercury2699
reviewed
Feb 21, 2024
Mercury2699
reviewed
Feb 22, 2024
To make the setup work without having to have an actual SQS account, I made the necessary changes to use a local SQS queue server served by elasticmq.
To aim for higher quality of the code, I added pydoctsyle to our quality checks. This will enforce documenting all code.
Mercury2699
previously approved these changes
Feb 26, 2024
dhegberg
reviewed
Feb 28, 2024
Mercury2699
reviewed
Feb 28, 2024
mayushko26
reviewed
Feb 29, 2024
rafidka
added a commit
that referenced
this pull request
Apr 19, 2024
* Checked for major version in verify_python_version * More documentation in `generate_base_dockerfile` * Bumped version to 2.9.0 * Support passing SSL mode for Postgres connection. * Downgraded to Python 3.11.9 since we don't want to go to Python 3.12 before sufficient adoption. * Remove version pinning for Amazon providers since this is covered by the Airflow constraints file. * Update the `requirements.txt` used for development. Removed all but the requirements we want, and left the rest for pip to intsall automatically. This makes updating the file easier. * `db_lock` method: renamed `timeout` to `timeout_ms` for clarity. * Check for both `pip install` and `pip3 install` in `pip_install_check.py`.
rafidka
added a commit
that referenced
this pull request
Apr 19, 2024
* Checked for major version in verify_python_version * More documentation in `generate_base_dockerfile` * Bumped version to 2.9.0 * Support passing SSL mode for Postgres connection. * Downgraded to Python 3.11.9 since we don't want to go to Python 3.12 before sufficient adoption. * Remove version pinning for Amazon providers since this is covered by the Airflow constraints file. * Update the `requirements.txt` used for development. Removed all but the requirements we want, and left the rest for pip to intsall automatically. This makes updating the file easier. * `db_lock` method: renamed `timeout` to `timeout_ms` for clarity. * Check for both `pip install` and `pip3 install` in `pip_install_check.py`.
rafidka
added a commit
that referenced
this pull request
Apr 19, 2024
* Checked for major version in verify_python_version * More documentation in `generate_base_dockerfile` * Bumped version to 2.9.0 * Support passing SSL mode for Postgres connection. * Downgraded to Python 3.11.9 since we don't want to go to Python 3.12 before sufficient adoption. * Remove version pinning for Amazon providers since this is covered by the Airflow constraints file. * Update the `requirements.txt` used for development. Removed all but the requirements we want, and left the rest for pip to intsall automatically. This makes updating the file easier. * `db_lock` method: renamed `timeout` to `timeout_ms` for clarity. * Check for both `pip install` and `pip3 install` in `pip_install_check.py`.
* Checked for major version in verify_python_version * More documentation in `generate_base_dockerfile` * Bumped version to 2.9.0 * Support passing SSL mode for Postgres connection. * Downgraded to Python 3.11.9 since we don't want to go to Python 3.12 before sufficient adoption. * Remove version pinning for Amazon providers since this is covered by the Airflow constraints file. * Update the `requirements.txt` used for development. Removed all but the requirements we want, and left the rest for pip to intsall automatically. This makes updating the file easier. * `db_lock` method: renamed `timeout` to `timeout_ms` for clarity. * Check for both `pip install` and `pip3 install` in `pip_install_check.py`. * Support an allowlist in `pip_install_check.py` in case some scripts need to use `pip install` directly, e.g. the script to install Python since it needs to update `pip`.
dhegberg
approved these changes
Apr 22, 2024
Mercury2699
approved these changes
Apr 22, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue #, if available: Multiple issues specified by the multiple commits.
Description of changes: 9 Commits, which can be seen below.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.