Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix config change detection not working for multiple sortkey in materialized views #841

Conversation

lvitti
Copy link
Contributor

@lvitti lvitti commented Jun 10, 2024

resolves #838

Problem

Having multiple sort keys and without changes between the model and the Materialized View, the change check between model and view ends up returning as a result that it has changes. This is because svv_table_info.sortkey1 is being used instead of use the list of sort key that the Materialized View have

Solution

We can get the list of SortKeys using the mv_tbl related table

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc) or this PR has already received feedback and approval from Product or DX

Steps To Reproduce:

  1. Make sure to delete the MV first.
    drop materialized view if exists dev.public.mv_double_sort

  2. Add an mv to the project with more than 1 sort key:

-- models/mv_double_sort.sql
{{
    config(
        materialized = 'materialized_view',
        dist = 'b',
        sort = ['b', 'a'],
        on_configuration_change = 'fail'
    )
}}

select a, b, c from foo

  1. $ dbt --debug build -s my_double_sort Should create the materialized view
  2. Run again $ dbt --debug build -s my_double_sort and should refresh the materialized view instead of drop and recreate the view

@lvitti lvitti requested a review from a team as a code owner June 10, 2024 12:59
Copy link

cla-bot bot commented Jun 10, 2024

Thanks for your pull request, and welcome to our community! We require contributors to sign our Contributor License Agreement and we don't seem to have your signature on file. Check out this article for more information on why we have a CLA.

In order for us to review and merge your code, please submit the Individual Contributor License Agreement form attached above above. If you have questions about the CLA, or if you believe you've received this message in error, please reach out through a comment on this PR.

CLA has not been signed by users: @lvitti

@cla-bot cla-bot bot added the cla:yes label Jun 10, 2024
@lvitti
Copy link
Contributor Author

lvitti commented Jun 12, 2024

@mikealfare mikealfare self-assigned this Aug 8, 2024
@mikealfare mikealfare added the community This PR is from a community member label Aug 8, 2024
Copy link
Contributor

@mikealfare mikealfare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall pretty good. Nice tests too. I only had a few things, with two things blocking:

  • lazy load agate (blocking)
  • consider sort key order in the sort key (blocking)
  • style change for the walrus operator
  • name change to columns
  • remove schema and table

dbt/adapters/redshift/relation_configs/sort.py Outdated Show resolved Hide resolved
dbt/adapters/redshift/relation_configs/sort.py Outdated Show resolved Hide resolved
JOIN pg_attribute a ON a.attrelid = c.oid
WHERE
n.nspname ilike '{{ relation.schema }}'
AND c.relname LIKE 'mv_tbl__{{ relation.identifier }}__%'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you tried this with multiple config changes for the materialized view? Is there only ever one relation of the form mv_tbl__{{ relation.identifier }}__%? Do we need to look at the latest version? I'm asking because I'm genuinely not sure how Redshift manages that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the way we're using it, they have only one relationship.
However, I'll do a deeper analysis regarding that and get back to you with feedback.
Good observation, thanks

tests/unit/test_relation.py Outdated Show resolved Hide resolved
Copy link
Contributor

@mikealfare mikealfare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is almost where it needs to be. The only issue I really noticed is the indirect comparison of None to an integer.

Please also run pre-commit; the code quality check is failing due to some whitespace issues.

logs
check yaml...............................................................Passed
check json...........................................(no files to check)Skipped
fix end of files.........................................................Passed
trim trailing whitespace.................................................Failed
- hook id: trailing-whitespace
- exit code: 1
- files were modified by this hook

Fixing dbt/adapters/redshift/relation_configs/sort.py

check for case conflicts.................................................Passed
Check for dbt-core usage in an adapter...................................Passed
black....................................................................Failed
- hook id: black
- files were modified by this hook

reformatted dbt/adapters/redshift/relation_configs/sort.py
reformatted dbt/adapters/redshift/relation_configs/materialized_view.py

All done! ✨ 🍰 ✨
2 files reformatted, 67 files left unchanged.

flake8...................................................................Passed
mypy.....................................................................Passed
pre-commit hook(s) made changes.
If you are seeing this message in CI, reproduce locally with: `pre-commit run --all-files`.
To run `pre-commit` as part of git workflow, use `pre-commit install`.
All changes made by hooks:
diff --git a/dbt/adapters/redshift/relation_configs/materialized_view.py b/dbt/adapters/redshift/relation_configs/materialized_view.py
index e65a7cf..e51fb61 100644
--- a/dbt/adapters/redshift/relation_configs/materialized_view.py
+++ b/dbt/adapters/redshift/relation_configs/materialized_view.py
@@ -206,7 +206,7 @@ class RedshiftMaterializedViewConfig(RedshiftRelationConfigBase, RelationConfigV
                 {"dist": RedshiftDistConfig.parse_relation_results(materialized_view)}
             )
 
-        if columns:= relation_results.get("columns"):
+        if columns := relation_results.get("columns"):
             sort_columns = [row for row in columns.rows if row.get("sort_key_position") > 0]
             if sort_columns:
                 config_dict.update(
diff --git a/dbt/adapters/redshift/relation_configs/sort.py b/dbt/adapters/redshift/relation_configs/sort.py
index 5bdb4c9..f38d5a1 100644
--- a/dbt/adapters/redshift/relation_configs/sort.py
+++ b/dbt/adapters/redshift/relation_configs/sort.py
@@ -137,7 +137,6 @@ class RedshiftSortConfig(RedshiftRelationConfigBase, RelationConfigValidationMix
 
         return config_dict
 
-
     @classmethod
     def parse_relation_results(cls, relation_results_entry: "agate.MappedSequence") -> dict:
         """
@@ -161,7 +160,7 @@ class RedshiftSortConfig(RedshiftRelationConfigBase, RelationConfigValidationMix
         Returns: a standard dictionary describing this `RedshiftSortConfig` instance
         """
         sort_config = []
-        
+
         sorted_columns = sorted(relation_results_entry, key=lambda x: x["sort_key_position"])
         for column in sorted_columns:
             if column.get("sort_key_position"):
@@ -169,6 +168,7 @@ class RedshiftSortConfig(RedshiftRelationConfigBase, RelationConfigValidationMix
 
         return {"sortkey": sort_config}
 
+
 @dataclass(frozen=True, eq=True, unsafe_hash=True)
 class RedshiftSortConfigChange(RelationConfigChange, RelationConfigValidationMixin):
     context: RedshiftSortConfig

tests/unit/test_materialized_view.py Show resolved Hide resolved
fix sort_key_position for non sort_key column
Copy link
Contributor

@mikealfare mikealfare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lvitti!

@mikealfare mikealfare enabled auto-merge (squash) August 9, 2024 21:13
@mikealfare mikealfare merged commit 8bbdbf2 into dbt-labs:main Aug 9, 2024
23 of 24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes community This PR is from a community member ok to test
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug] Materialized view config change identification not working with multiple sort keys
3 participants