Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-1376] [Bug] Invalid Number Precision error for incremental model in 1.3 #6107

Closed
2 tasks done
kellyldougan opened this issue Oct 19, 2022 · 23 comments
Closed
2 tasks done
Assignees
Labels
regression Team:Adapters Issues designated for the adapter area of the code

Comments

@kellyldougan
Copy link

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I have an incremental model that works in version 1.2 but not in 1.3. The model has a config block of

    config(
        materialized='incremental',
        strategy='delete+insert',
        pre_hook= "{% if is_incremental() %}
                 TRUNCATE IF EXISTS {{ this }}
                 {% endif %}",
        on_schema_change='sync_all_columns',
    )
}}

and when I run the model in 1.3 I get the following error:
SQL compilation error: cannot change column COST_PCT from type NUMBER(38,1) to NUMBER(38,0) because changing the scale of a number is not supported.

Expected Behavior

I'm expecting this model to run incrementally without errors

Steps To Reproduce

Using version 1.2, in Snowflake, create a seed test_seed with the following data

col_1,col_2
1,2
3,4
4,5
0,100
40,10

and create an incremental model as follows:

{{
    config(
        materialized='incremental',
        strategy='delete+insert',
        pre_hook= "{% if is_incremental() %}
                 TRUNCATE IF EXISTS {{ this }}
                 {% endif %}",
        on_schema_change='sync_all_columns'
    )
}}

with seed_ref as (
select 
    to_number(col_1) as col_1,
    to_number(col_2) as col_2
from {{ ref('test_seed') }}
)

select
    round(col_1 / col_2 * 100, 1) as cost_pct,
    round(col_1 * (col_1 / col_2), 2) as cost_dollars
from seed_ref

rerun the incremental model in version 1.3 and you will get the following error:

Invalid number precision: 39. Must be between 0 and 38.

Relevant log output

No response

Environment

- OS: Cloud IDE
- Python:n/a
- dbt: 1.3

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

@kellyldougan kellyldougan added bug Something isn't working triage labels Oct 19, 2022
@github-actions github-actions bot changed the title [Bug] Invalid Number Precision error for incremental model in 1.3 [CT-1376] [Bug] Invalid Number Precision error for incremental model in 1.3 Oct 19, 2022
@jtcohen6 jtcohen6 added the Team:Adapters Issues designated for the adapter area of the code label Oct 20, 2022
@joshuataylor
Copy link
Contributor

joshuataylor commented Oct 23, 2022

Also found this, no idea why this has changed in 1.3?

Workaround because I didn't want to rebuild the model was to cast the failing column to ::number(38, 0).

This wouldn't have worked anyway...

https://docs.snowflake.com/en/sql-reference/data-types-numeric.html#number
Numbers up to 38 digits, with an optional precision and scale:

@celine-geelen
Copy link

I can confirm the same issue as @kellyldougan. Reverting to version 1.2.2 resolved the issue for now.

@leahwicz leahwicz added regression and removed bug Something isn't working labels Oct 27, 2022
@rloredo
Copy link

rloredo commented Nov 7, 2022

+1
It seems related to on_schema_change parameter.

The workaround I've found is to disable the on_schema_change = 'sync_all_columns' and use the default that is ignore.

@dbeatty10
Copy link
Contributor

dbeatty10 commented Nov 15, 2022

@joshuataylor noted here (and affirmed by @rloredo above) that:

This seems to be for sync_all_columns, ignore works.

@VersusFacit
Copy link
Contributor

Hi folks! I've been trying to replicate this, but can't seem to trigger the behavior detailed here. There's a few people who seem to have reported this for 1.3. Any evidence it's also on other versions as of now? Do we have a simplest-possible example that triggers this?

Working on this in the background but would love to see your responses to hone in on this faster.

@rloredo
Copy link

rloredo commented Nov 15, 2022

Hey @VersusFacit the steps to reproduce above give me the expected result.
First, create the seed and incremental model with 1.2. Then, update to 1.3 and everything will fail.
Even if you run a full refresh of the model in 1.3 and then run a normal run, it will fail.
If you change the on_schema_change='sync_all_columns' part or omit it, it will work.

Note that this is only failing in Snowflake (at least all of us here are using SF), and for models that were first created using a 1.2 version.

@dbeatty10 dbeatty10 self-assigned this Nov 16, 2022
@dbeatty10
Copy link
Contributor

I have not yet been able to reproduce this personally yet. I will spin up a repo showing the steps I tried to see if anyone else can reproduce, and I'll share a link in a follow-up comment.

Here's where I suspect the issue is arising (which was introduced in dbt-labs/dbt-snowflake#93).

p.s. if this is confirmed to only affect Snowflake, we'll transfer this issue to the dbt-snowflake repo.

@dbeatty10
Copy link
Contributor

Could someone try the following and see if it works? e.g., @kellyldougan, @rloredo or anyone else that was able to reproduce this.

It contains two changes, namely:

  • configure incremental_strategy instead of strategy
  • add col_1 as the unique_key to force creation of a temporary table rather than a view in dbt-snowflake 1.3
{{
    config(
        materialized='incremental',
        incremental_strategy='delete+insert',
        unique_key='col_1',
        pre_hook= "{% if is_incremental() %}
                 TRUNCATE IF EXISTS {{ this }}
                 {% endif %}",
        on_schema_change='sync_all_columns'
    )
}}

with
seed_ref as (

    select 
        to_number(col_1) as col_1,
        to_number(col_2) as col_2
    from {{ ref('test_seed') }}

)

select
    col_1,
    round(col_1 / col_2 * 100, 1) as cost_pct,
    round(col_1 * (col_1 / col_2), 2) as cost_dollars
from seed_ref

dbeatty10 added a commit to dbeatty10/dbt-sandcastles that referenced this issue Nov 16, 2022
@dbeatty10
Copy link
Contributor

As noted earlier, I was not able to trigger the Invalid number precision error in my local environment.

Here's the repo and instructions I used to try to reproduce this:
https://github.com/dbeatty10/dbt-sandcastles/tree/b79e3fc673885ac4076dc420fc91f4bc8be2cbe9

Could someone try out the instructions in this repo and let me know if you are able to reproduce it in your own environment or not?

@rloredo
Copy link

rloredo commented Nov 16, 2022

@dbeatty10 I can't access the link you shared. I can try tomorrow at work.

@dbeatty10
Copy link
Contributor

@rloredo I accidentally had the repo as private 😅 -- I've made it public now so hopefully you can access it.

@rloredo
Copy link

rloredo commented Nov 17, 2022

@dbeatty10 I couldn't reproduce it using the steps you shared.
However, I tried to change the on_schema_change parameter in the failed model and got the same error.

Let me explain a little bit what it looks like:
In snowflake,
A schema called enriched
A transient table called reservations
A column called net_price with the type NUMBER(38,6)

When I update to 1.3 and dbt run -s reservations there's a query

alter table "MY_DWH"."ENRICHED"."RESERVATIONS" alter "NET_PRICE" set data type NUMBER(38,2);

That fails with the message

SQL compilation error: cannot change column NET_PRICE from type NUMBER(38,6) to NUMBER(38,2) because changing the scale of a number is not supported.

So, to reproduce it, I think we need to mimic these 38,6 and 38,2 schema changes. What I don't know is why that query is being triggered if we didn't change any definition in the model...

I hope this helps a little bit on how to reproduce it.

@dbeatty10
Copy link
Contributor

What happens if you add the following macro to your project and then try to re-run using 1.3?

{% macro dbt_snowflake_get_tmp_relation_type(strategy, unique_key, language) %}
    -- always table
    {{ return('table') }}
{% endmacro %}

Does it still give that error message?

This will help confirm/deny if a new feature for incremental models introduced in 1.3 is affecting your models or not.

@rloredo
Copy link

rloredo commented Nov 18, 2022

@dbeatty10 I tried this macro.
Now instead of a view, it creates a temporary table for the incremental step. However, after that step is done, it's still calling the other query I mentioned above (change the number precision).
So I think is a step that comes after that, the queries that are being done look something like this (newest on top, using the macro. Without the macro instead of temporary table is a view)

Screenshot 2022-11-18 at 08 30 13

@dbeatty10
Copy link
Contributor

Thanks for trying that out -- this helps eliminate one possibility.

You see those describe table commands?
When you use dbt-core/snowflake 1.2, it must be returning number(38, 6), but they must be changing to number(38, 2) when you switch to dbt-core/snowflake 1.3.

When the incremental model sees the change from number(38, 6) to number(38, 2), it tries to do the alter table step to update the data type (which fails) 💥

Key question

The key question for us to figure out is:

  • why is the column type number(38, 6) in 1.2, but then number(38, 2) in 1.3?

What is the expression for your column that is triggering the error?

Is it really involving rounding like round(col_1 / col_2 * 100, 2) as net_price? Or is the expression taking some other form?

Out of curiosity, are you using any macros to transform the net_price within the reservations (either directly or indirectly)?

Within Snowflake, the "scale and precision of the output of an arithmetic operation depends on the scale and precision of the input(s)." Each type of arithmetic operation has different rules. For example, dividing a number(38, 0) by a number(38, x) will yield a number(38, 6) per these rules.

Examining the data types

If you look in your logs/dbt.log file, you will see separate output and commands like the following for dbt-core 1.2:

[info ] [MainThread]: Running with dbt=1.2
create or replace temporary table DEV_RLOREDO.enriched.enr_reservations__dbt_tmp  as
...

then:

describe table DEV_RLOREDO.enriched.enr_reservations__dbt_tmp
...
describe table DEV_RLOREDO.enriched.enr_reservations

and finally:

drop view if exists DEV_RLOREDO.enriched.enr_reservations__dbt_tmp cascade

Then it without that macro I gave you, it might be slightly different for dbt-core 1.3 (to use a temporary view instead of a temporary table):

[info ] [MainThread]: Running with dbt=1.3
create or replace  view DEV_RLOREDO.enriched.enr_reservations__dbt_tmp
...

then:

describe table DEV_RLOREDO.enriched.enr_reservations__dbt_tmp
...
describe table DEV_RLOREDO.enriched.enr_reservations

and finally:

drop view if exists DEV_RLOREDO.enriched.enr_reservations__dbt_tmp cascade

You could copy these sections of your log file and run them in the Snowflake web UI to inspect the data types via the describe commands.

One more question

Is the SQL logic that you see in the logs for the temporary table/view exactly the same when you run in 1.3 vs. 1.2? Or is it slightly different in some way?

create or replace temporary table DEV_RLOREDO.enriched.enr_reservations__dbt_tmp  as

[SQL logic to examine for differences]

@rloredo
Copy link

rloredo commented Nov 18, 2022

So... now I have the error in both versions.
It seems that it's caused by adding a round() into a column that was originally a number with more precision.

The original table has number(38,6) while the new (view in 1.3, temporary table in 1.2) with rounding is (38,2).
The sync_all_columns tries to update the type (without luck because is not supported). This makes sense, because the schema changed.

Running a full refresh should create a new table with column type number(38,2), and it does. Solving the issue of syncing.
I don't know why when I tried it before it didn't work (perhaps the rush of need to fix it??)

What I can get from this is that the behaviour of dbt sync_all_columns is the expected one, the precision of the type changed and the column should be synced. However, the sync is not accepted in Snowflake, because changing the scale of a number is not supported

It could be argued that the command should avoid updating number precision (is a parameter of a type and not a type), and this way we would avoid the problem, BUT we will be failing to preserve the correct precision of the number type. Also, a "sync_all_but_not_number" could be a not-elegant solution...

I guess that the fix for the error is full refreshing the model and if that doesn't work, drop the table and "force" the full refresh.

Thanks @dbeatty10 for all your time triaging this!

@dbeatty10
Copy link
Contributor

You're welcome @rloredo!

@kellyldougan, @joshuataylor and/or @celine-geelen
We will end up closing this issue if we can't replicate the issue -- we'll open it back up again if we are able to replicate it later on.

Eager to troubleshoot this further if there are no changes to the column definitions and you can replicate that:

  1. 1.2 works for you
  2. switching to 1.3 does not
  3. but switching back to 1.2 still works

I put up an example repo with instructions here, but I was unable to replicate any 1.2 -> 1.3 issues.

Through discussion and troubleshooting with @rloredo, this appears to be explained by the definition of a column changing rather than a difference between dbt-snowflake 1.2 and 1.3.

Namely, Snowflake allows the precision of a number column to be increased or decreased, but it does not allow the scale to be either increased or decreased. Read more here.

But we're open to the possibility that the type of error can have multiple causes -- we just haven't seen any evidence of this yet (and we'd need some!).

Example

If you have a column in a sync_all_columns incremental model defined like:

pizza_slices / people as slices_per_person

Assuming pizza_slices and people are both integers (e.g. number(38, 0), then the result of the division will be number(38, 6) per Snowflake's division rules.

But the scale will decrease down to number(38, 2) and raise an error when running dbt if the definition is changed to this:

round(pizza_slices / people, 2) as slices_per_person

Any increase or decrease of the scale of a number within an incremental model using a config of on_schema_change='sync_all_columns' will result in an error like SQL compilation error: cannot change column ... from type ... to ... because changing the scale of a number is not supported..

Solutions

In that case, you'd have at least two options to solve this:

  1. Add the applicable cast (like @joshuataylor described here)
  2. Use --full-refresh to rebuild the table from scratch

@joshuataylor
Copy link
Contributor

joshuataylor commented Nov 18, 2022

Apologies for the delay on getting back to you. I was able to replicate this on our account using the steps outlined in the first post. Could it be related to weird SF feature flags, region based rollouts or a myriad of other fun things? I was also getting it with latest main branch.

I'll dig into this over the weekend and record the exact replication issues (if possible).

We're on us-east-1.

@dbeatty10
Copy link
Contributor

Thanks for replicating @joshuataylor 🙏

Could you try using the custom macro described here to override certain behavior in 1.3? I'd be very curious if this does/doesn't work for you.

Could it be related to weird SF feature flags, region based rollouts or a myriad of other fun things?

Yes, it could! (But hopefully not. 🤞) I did do some initial exploration to see if any relevant feature flags jumped out, but came up empty. I didn't dig into any release notes to see if describe table/view functionality is changing.

Ideally, we'd be able to modify this example so that I can replicate as well. I've only tested using the AWS_US_WEST_2 region.

Thanks for your help getting to the bottom of this.

@joshuataylor
Copy link
Contributor

I'm so confused by this, as I was hitting this every time about 2 weeks ago!

I was doing the exact steps in the initial ticket and could replicate it 100% of the time (us-east-1 Snowflake).

It's not happening now, and I'm so confused why! The only thing I can think of was some quirky bug that was being hit with Snowflake that they maybe fixed?

@dbeatty10
Copy link
Contributor

🤷 Yeah, definitely a head-scratcher!

Obviously believe what you were experiencing, especially since there were several of you reporting this at the same time.

My best guess at the moment is that describe table/view within Snowflake was returning numeric data types with lower scale for views specifically.

If so, that would explain why it worked again once you went back to dbt-core 1.2, because 1.2 always uses a temporary table rather than a temporary view to determine data types whereas dbt-core 1.3 will preferentially use views when it can (for performance optimization).

@dbeatty10 dbeatty10 removed the triage label Nov 23, 2022
@kellyldougan
Copy link
Author

I'm no longer running into this issue, seems like Snowflake pushed a fix? Going to close this for now.

@dbeatty10
Copy link
Contributor

I'm no longer running into this issue, seems like Snowflake pushed a fix? Going to close this for now.

Sounds good @kellyldougan thanks again for reporting this. We can re-open in the future if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
regression Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

No branches or pull requests

9 participants