-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Github workflow to populate the persistent source schema #715
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,110 @@ | ||
# See [Persistent Source Schema](/GLOSSARY.md#persistent-source-schema) | ||
# Populating the source schema via this workflow ensures that it's done with the same settings as the tests. | ||
|
||
name: Reload Test Data in SQL Engines | ||
|
||
# We don't want multiple workflows trying to create the same table. | ||
concurrency: | ||
group: POPULATE_PERSISTENT_SOURCE_SCHEMA | ||
cancel-in-progress: true | ||
Comment on lines
+8
to
+9
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am very curious to see how this works. |
||
|
||
on: | ||
pull_request: | ||
types: [labeled] | ||
Comment on lines
+11
to
+13
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please add There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added, but iterating on this PR has been a huge pain as it can't be tested locally. |
||
workflow_dispatch: | ||
|
||
env: | ||
# Unclear on how to make 'Reload Test Data in SQL Engines' a constant here as it does not work here. | ||
PYTHON_VERSION: "3.8" | ||
|
||
jobs: | ||
snowflake-populate: | ||
environment: DW_INTEGRATION_TESTS | ||
if: > | ||
github.event.action == 'workflow_dispatch' | ||
|| (github.event.action == 'labeled' && github.event.label.name == 'Reload Test Data in SQL Engines') | ||
name: Snowflake | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Check-out the repo | ||
uses: actions/checkout@v3 | ||
|
||
- name: Populate w/Python ${{ env.PYTHON_VERSION }} | ||
uses: ./.github/actions/run-mf-tests | ||
with: | ||
python-version: ${{ env.PYTHON_VERSION }} | ||
mf_sql_engine_url: ${{ secrets.MF_SNOWFLAKE_URL }} | ||
mf_sql_engine_password: ${{ secrets.MF_SNOWFLAKE_PWD }} | ||
parallelism: 1 | ||
make-target: "populate-persistent-source-schema-snowflake" | ||
|
||
redshift-populate: | ||
environment: DW_INTEGRATION_TESTS | ||
name: Redshift | ||
if: > | ||
github.event.action == 'workflow_dispatch' | ||
|| (github.event.action == 'labeled' && github.event.label.name == 'Reload Test Data in SQL Engines') | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Check-out the repo | ||
uses: actions/checkout@v3 | ||
|
||
- name: Populate w/Python ${{ env.PYTHON_VERSION }} | ||
uses: ./.github/actions/run-mf-tests | ||
with: | ||
python-version: ${{ env.PYTHON_VERSION }} | ||
mf_sql_engine_url: ${{ secrets.MF_REDSHIFT_URL }} | ||
mf_sql_engine_password: ${{ secrets.MF_REDSHIFT_PWD }} | ||
parallelism: 1 | ||
make-target: "populate-persistent-source-schema-redshift" | ||
|
||
bigquery-populate: | ||
environment: DW_INTEGRATION_TESTS | ||
name: BigQuery | ||
if: > | ||
github.event.action == 'workflow_dispatch' | ||
|| (github.event.action == 'labeled' && github.event.label.name == 'Reload Test Data in SQL Engines') | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Check-out the repo | ||
uses: actions/checkout@v3 | ||
|
||
- name: Populate w/Python ${{ env.PYTHON_VERSION }} | ||
uses: ./.github/actions/run-mf-tests | ||
with: | ||
python-version: ${{ env.PYTHON_VERSION }} | ||
MF_SQL_ENGINE_URL: ${{ secrets.MF_BIGQUERY_URL }} | ||
MF_SQL_ENGINE_PASSWORD: ${{ secrets.MF_BIGQUERY_PWD }} | ||
parallelism: 1 | ||
make-target: "populate-persistent-source-schema-bigquery" | ||
|
||
databricks-populate: | ||
environment: DW_INTEGRATION_TESTS | ||
name: Databricks SQL Warehouse | ||
if: > | ||
github.event.action == 'workflow_dispatch' | ||
|| (github.event.action == 'labeled' && github.event.label.name == 'Reload Test Data in SQL Engines') | ||
runs-on: ubuntu-latest | ||
steps: | ||
- name: Check-out the repo | ||
uses: actions/checkout@v3 | ||
|
||
- name: Populate w/Python ${{ env.PYTHON_VERSION }} | ||
uses: ./.github/actions/run-mf-tests | ||
with: | ||
python-version: ${{ env.PYTHON_VERSION }} | ||
mf_sql_engine_url: ${{ secrets.MF_DATABRICKS_SQL_WAREHOUSE_URL }} | ||
mf_sql_engine_password: ${{ secrets.MF_DATABRICKS_PWD }} | ||
parallelism: 1 | ||
make-target: "populate-persistent-source-schema-databricks" | ||
|
||
remove-label: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh if this works we should TOTALLY add it to the sql engine tests...... I've got some updates I want to make over there so I can do that once this is in. |
||
name: Remove Label After Populating Test Data | ||
runs-on: ubuntu-latest | ||
needs: [ snowflake-populate, redshift-populate, bigquery-populate, databricks-populate] | ||
if: github.event.action == 'labeled' && github.event.label.name == 'Reload Test Data in SQL Engines' | ||
steps: | ||
- name: Remove Label | ||
uses: actions-ecosystem/action-remove-labels@v1 | ||
with: | ||
labels: 'Reload Test Data in SQL Engines' |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
# Glossary | ||
|
||
## Persistent source schema | ||
Many tests generate and execute SQL that depend on tables containing test data. By default, a | ||
pytest fixture creates a temporary schema and populates it with the tables that are required by | ||
the tests. This schema is referred to the source schema. Creating the source schema (and | ||
the associated tables) can be a slow process for some SQL engines. Since these tables generally | ||
do not change often, functionality was added to use a source schema that is assumed to already | ||
exist when running tests and persists between runs (a persistent source schema). In addition, | ||
functionality was added to create the persistent source schema based on table definitions in the | ||
repo. Because the name of the source schema is generated based on the hash of the data that's | ||
supposed to be in the schema, the creating and populating the persistent source schema should | ||
not be done concurrently as there are race conditions when creating tables and inserting data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I guess if this ever happens in different schemas the last one in will be wrong anyway.