-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#2479] Allow unique_id to take a list #4618
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works with both Snowflake + BigQuery when I run locally. As a follow-up to merging this PR, we should look to add a simple integration tests to both of those plugins:
{{ config(
materialized = 'incremental',
unique_key = ['id1', 'id2']
)}}
select 1 as id1, 2 as id2
I think first step in getting the tests to pass should be rebasing against / merging in main
. When I run locally, I see: Running with dbt=0.21.0-rc1
.
DBT_INTERNAL_SOURCE.{{ unique_key }} = DBT_INTERNAL_DEST.{{ unique_key }} | ||
{% endset %} | ||
{% do predicates.append(unique_key_match) %} | ||
{% if unique_key is sequence and unique_key is not mapping and unique_key is not string %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This update to default__get_merge_sql
will work for Snowflake + BigQuery out of the box. For other databases:
- Postgres + Redshift don't support
merge
, they useget_delete_insert_merge_sql
instead. We'd need to refactor thedelete
statement to support multiple unique keys, which is tricky enough that I think we can open a new issue for it. No need to block this PR in the meantime. - Spark reimplements this as
spark__get_merge_sql
, so we'll need to make a similar change over there if we want to support the same functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the Spark change here, @gshank maybe you can pair with @McKnight-42 or @VersusFacit on making that change.
@McKnight-42 @VersusFacit - this behavior difference is something good to document in the specific adapter pages if it isn't there yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll also want integration tests for Snowflake + BigQuery, to ensure this change actually achieves the desired result
`unique_key` can be a string or a list.
extend the functionality to support both single and multiple keys Signed-off-by: triedandtested-dev (Bryan Dunkley) <[email protected]>
Signed-off-by: triedandtested-dev (Bryan Dunkley) <[email protected]>
Signed-off-by: triedandtested-dev (Bryan Dunkley) <[email protected]>
72885a1
to
3299e57
Compare
I'm not quite sure what the flow should be here. Should this PR be merged first and then the separate adapter changes be addressed? |
@gshank I'm not opposed to doing those changes in a separate PR as long as this change doesn't break anything w/o those other changes. If we do a separate PR, can you please open a tracking issue so we don't lose track of doing the work? |
@leah I think they would have to be separate PRs, because they're different repositories, right? You can't put changes from multiple repos into the same PR. I don't know if this change would break anything in the other repos, and I'm not sure how to test them against a dbt-core branch. |
@gshank that is correct but I just didn't know if we needed to coordinate with the other repos. I don't want to merge this change if it causes the other repos to start failing tests and instead coordinate to merge all the PR's the same day if that makes sense. |
This is a really good callout in general. In this particular case, I don't believe that this change will (or should) cause failing tests in those other repos. (If it does, that failing test will be meaningful, and indicate we've probably done something wrong here!) It is incumbent on us to follow up with:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting some context, and reading up LGTM
@@ -2,6 +2,7 @@ | |||
|
|||
### Features | |||
- New Dockerfile to support specific db adapters and platforms. See docker/README.md for details ([#4495](https://github.com/dbt-labs/dbt-core/issues/4495), [#4487](https://github.com/dbt-labs/dbt-core/pull/4487)) | |||
- Allow unique_key to take a list ([#2479](https://github.com/dbt-labs/dbt-core/issues/2479), [#4618](https://github.com/dbt-labs/dbt-core/pull/4618)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure to add the contributor from the original PR to the changelog for v1.1:
- [@triedandtested-dev](https://github.com/triedandtested-dev) ([#4618](https://github.com/dbt-labs/dbt-core/pull/4618))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I noticed that this list option is missing from the docs here:
https://docs.getdbt.com/reference/resource-configs/unique_key#use-a-combination-of-two-columns-as-a-unique-key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I noticed that this list option is missing from the docs here:
https://docs.getdbt.com/reference/resource-configs/unique_key#use-a-combination-of-two-columns-as-a-unique-key
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call @cdabel !
Just opened an issue for that here: dbt-labs/docs.getdbt.com#4642
I encounter an error when running an incremental bigquery model with multiple unique keys. The error I am getting is:
This is caused by the compiled code:
Config used is:
Am I doing something wrong or is this dbt code that fails with bigquery with tese settings? |
Heya @HeddeCrisp ! It looks like you have the right syntax. Your error is familiar and strikes me as what used to happen before we merged this. I did a little incremental model test with the following config: {{
config(
materialized='incremental',
unique_key=['id', 'first_name', 'last_name']
)
}} And my dbt version is:
So this to me implies there's not an error on our end with this feature but rather you might be using a dbt bigquery version that didn't have this feature active yet. What does |
Receiving an error when I pass a list
|
dbt-labs/dbt-core#4618 Signed-off-by: xtyuns <[email protected]>
dbt-labs/dbt-core#4618 Signed-off-by: xtyuns <[email protected]>
dbt-labs/dbt-core#4618 Signed-off-by: xtyuns <[email protected]>
dbt-labs/dbt-core#4618 Signed-off-by: xtyuns <[email protected]>
resolves #2479
Description
This derives from pull request #4159.
Checklist
CHANGELOG.md
and added information about my change