Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model Contract Alters Numeric Precision #10430

Closed
2 tasks done
dbernett-amplify opened this issue Jul 11, 2024 · 4 comments
Closed
2 tasks done

Model Contract Alters Numeric Precision #10430

dbernett-amplify opened this issue Jul 11, 2024 · 4 comments
Labels
bug Something isn't working model_contracts

Comments

@dbernett-amplify
Copy link

dbernett-amplify commented Jul 11, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

  • I have a column that is explicitly defined as number(3,1) in my upstream model.

  • I select that column in my incremental model.

  • My incremental models specifies on_schema_change = 'fail'.

  • My incremental has a contract that defines the data_type for that column as "number".

  • If that incremental model has the contract disabled...

    • When I build it as a full refresh, the column comes out as number(3,1).
    • On a subsequent incremental build, it also comes out as number(3,1).
  • If that incremental model has the contract enabled...

    • When I built it as a full refresh, the column comes out as number(38,0).
    • On a subsequent incremental build, the on_schema_change yells about the column type having changed to number(3,1) and the model fails.

We have observed this behavior in several different models recently.

Eventually, I realized that I need to explicitly specify "number(3,1)" as the data_type in the contract in order to get it to come out as number(3,1) on that initial full-refresh, and to then not break on the subsequent incremental build, but it feels non-obvious that just specifying "number" as the data type in the contract is going to change the precision from whatever dbt would otherwise make it to (38,0).

On a related note, it would be helpful if the documentation specified what the allowed values are for the data_type argument in a contract. I often feel like I'm guessing at what it wants.

Expected Behavior

I would expect that dbt will make the precision whatever it should be according to the sql, and that a contract that specifies "number" as the data_type will just check that the data is in fact some type of number. Not that the contract will actually change the data type.

It's also worth noting that if I mis-specify the column in the contract as "number(2,0)", it does not throw an error. It just changes the column to number(2,0) and I lose my decimal. Again, I would expect that the contract is just describing the data types specified in the sql and that if they mismatch, I'll see an error, not that the contract is actually changing the data types from what they are according to the sql.

Steps To Reproduce

I'm working in the Cloud IDE running dbt 1.7 and using Snowflake.

Relevant log output

No response

Environment

- OS:
- Python:
- dbt:

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

@dbernett-amplify dbernett-amplify added bug Something isn't working triage labels Jul 11, 2024
@dbernett-amplify dbernett-amplify changed the title Incremental Model with Contract Alters Numeric Precision Model Contract Alters Numeric Precision Jul 11, 2024
@dbeatty10 dbeatty10 self-assigned this Jul 11, 2024
@dbeatty10
Copy link
Contributor

Thanks for reaching out about this @dbernett-amplify !

I was able to to reproduce the same error messages as you -- see below for the simple project files I used.

Project files & commands

models/_models.yml

models:
  - name: incremental_model
    config:
      contract:
        enforced: true
    columns:
      - name: my_number
        data_type: number

models/upstream_model.sql

select cast(1.2 as number(3,1)) as my_number

models/incremental_model.sql

{{
    config(
        materialized='incremental',
        on_schema_change = 'fail',
    )
}}

select * from {{ ref("upstream_model") }}

Commands:

dbt build --full-refresh
dbt build

It sounds like there are two things that were different than you expected:

  1. When a dbt contract is enforced, the data_type is set within the DDL
  2. Needing to specify a scale/precision within data_type for numeric data types

1st difference

The 1st difference was an intentional choice we made during the implementation of model contracts, and it is described here:

When building a model with a defined contract, dbt will do two things differently:

  1. dbt will run a "preflight" check to ensure that the model's query will return a set of columns with names and data types matching the ones you have defined. This check is agnostic to the order of columns specified in your model (SQL) or YAML spec.
  2. dbt will include the column names, data types, and constraints in the DDL statements it submits to the data platform, which will be enforced while building or updating the model's table.

2nd difference

The 2nd difference Is documented here, but I agree that we could improve the visibility in both the documentation and the CLI output.

For example, we raise a warning whenever there's a numeric data type in an enforced contract that is missing scale/precision.

But it's 1) buried in the logs and, 2) not clear what users should do as a result:

image

Summary

So based on the above, I think dbt is behaving as we intend it to, and this isn't a bug. So I'm going to close this issue accordingly. But I think there's room for improvement in our product documentation and warning and error messages.

I just raised this PR to make some updates to the docs -- open to any suggestions you have for improving the documentation or warning/error messages!

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 12, 2024
@dbeatty10 dbeatty10 removed the triage label Jul 12, 2024
@dbeatty10 dbeatty10 removed their assignment Jul 12, 2024
mirnawong1 pushed a commit to dbt-labs/docs.getdbt.com that referenced this issue Jul 12, 2024
[Preview](https://docs-getdbt-com-git-dbeatty10-patch-1-dbt-labs.vercel.app/reference/resource-configs/contract)

## What are you changing in this pull request and why?

The
[`contracts`](https://docs.getdbt.com/reference/resource-configs/contract#example)
page includes some important information as it relates to size,
precision, and scale when enforcing contracts, but it's hard to pick out
visually.

I want to be send hyperlinks to folks that points to just the right
section. For example, in this issue:
dbt-labs/dbt-core#10430.

## Checklist
- [x] Review the [Content style
guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md)
so my content adheres to these guidelines.
@dbernett-amplify
Copy link
Author

@dbeatty10 thanks! I appreciate your help here.

One thing worth calling out is that the docs say:

Note that you need to specify a varchar size or numeric scale, otherwise dbt relies on default values. For example, if a numeric type defaults to a precision of 38 and a scale of 0, then the numeric column stores 0 digits to the right of the decimal (it only stores whole numbers), which might cause it to fail contract enforcement.

That's actually not true. It doesn't fail contract enforcement. If your specification in the contract doesn't match what's actually in the column, then it just coerces your data to whatever the contract says. Which in many ways is a bigger deal because that happens silently. So probably worth calling that out in the docs?

I'd also suggest swapping the order of the two paragraphs in that section so that you're telling people first about all the stuff they DO have to worry about and second about what they don't have to worry about. Otherwise, people might stop after the first sentence or two and assume that the gist of the section is that precision, scale, etc. don't matter.

@dbernett-amplify
Copy link
Author

@dbeatty10 and then I guess one other follow up thought I have is that it feels a little inconsistent that if you specify just "number" as the data_type then the initial build throws off only a warning, but a subsequent incremental build will throw a failure.

@dbernett-amplify
Copy link
Author

@dbeatty10 also wanted to follow up with another slightly different issue.

  • In this new case, the precision for the column in question was not specified the sql anywhere (in the model or any upstream model or anything) (whereas in the earlier case it was).
  • In the model contract, I specified "number(38,0)"
  • It built fine on the first full build and did in fact come out as number(38,0).
  • On a subsequent incremental build, it threw an error, saying that that column is actually "number(19,0)".

So if you specify a different precision in the contract from what snowflake would infer from the data, it will coerce the data to your specified prevision on the full-refresh, but on the subsequent incremental build it will not coerce it, and will throw an error (if you've set on_schema_change = 'fail').

So basically to make an incremental model with on_schema_change work with a model contract you have to both:

  1. Specify it as a number with a precision and scale in the contract
  2. Explicitly cast it to a number with that same precision and scale in the sql

To be honest, that feels like a lot of work. So would be great if that could change. But if not, then that should at least be called out in the docs.

Thanks again for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working model_contracts
Projects
None yet
Development

No branches or pull requests

2 participants