Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Unit Testing] Allow explicit precision testing in unit tests #9884

Open
1 task done
Tracked by #8283
emmyoop opened this issue Apr 9, 2024 · 5 comments
Open
1 task done
Tracked by #8283

[Unit Testing] Allow explicit precision testing in unit tests #9884

emmyoop opened this issue Apr 9, 2024 · 5 comments
Labels
enhancement New feature or request unit tests Issues related to built-in dbt unit testing functionality user docs [docs.getdbt.com] Needs better documentation

Comments

@emmyoop
Copy link
Member

emmyoop commented Apr 9, 2024

Housekeeping

  • I am a maintainer of dbt-core

Short description

We cannot currently unit test precision. The expected outputs get rounded to be the expected datatypes and therefore are not correctly parsed.

from @MichelleArk in #9627

I think it would likely be in get_expected_sql, since that applies casting to the 'expected' structure based on the sql of the model being tested.

I think there are a couple ways we can approach a fix here:

Explicit option for users to opt out of the automatic casting of expect values. E.g. something called '_dbt_skip_casting' or '_dbt_use_literal'. I think this option should only be valid on the expect rows as opposed to input ones, since there isn't any benefit to being able to test your model given a different input schema as those are fixed in production.

Acceptance criteria

  • A unit test can be written to test precision

Suggested Tests

model.sql

with

source as (

    select * from {{ source('stripe',  'payment') }}

),

staged as (

    select
        id as payment_id,
        orderid as order_id,
        paymentmethod as payment_method,
        status,
        -- amount is stored in cents, convert it to dollars
        round(amount / 100, 2) as amount,
        created as created_at
    from source

)

select * from staged

failing unit test yaml

unit_tests:
  - name: test_dollar_conversion # this is the unique name of the test
    model: stg_payments # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: source('stripe',  'payment')
        rows:
         - {id: 2, amount: 50}
         - {id: 3, amount: 33.33}
    expect: # the expected output given the inputs above
      rows:
        - {payment_id: 2, amount: 0.5}
        - {payment_id: 3, amount: 0.3333}

passing unit test yaml

unit_tests:
  - name: test_dollar_conversion # this is the unique name of the test
    model: stg_payments # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: source('stripe',  'payment')
        rows:
         - {id: 2, amount: 50}
         - {id: 3, amount: 33.33}
    expect: # the expected output given the inputs above
      rows:
        - {payment_id: 2, amount: 0.50}
        - {payment_id: 3, amount: 0.33}

Impact to Other Teams

none

Will backports be required?

no

Context

Outcome of spike in #9627

This will required a change to the macros in dbt-adapters and tests in dbt-core. The tests in dbt-core will not pass until a new version of dbt-adapters is released.

@emmyoop emmyoop added the user docs [docs.getdbt.com] Needs better documentation label Apr 9, 2024
@emmyoop emmyoop added the High Severity bug with significant impact that should be resolved in a reasonable timeframe label Apr 15, 2024
@graciegoheen graciegoheen added bug Something isn't working pre-release Bug not yet in a stable release labels Apr 17, 2024
@graciegoheen graciegoheen added this to the v1.8 milestone Apr 17, 2024
@ChenyuLInx
Copy link
Contributor

@graciegoheen Can you take a look at spike outcome here and see whether you think this ticket is worth pursuing?

@ChenyuLInx
Copy link
Contributor

We should document it do not work if we are not moving forward on this ticket.

@graciegoheen graciegoheen removed this from the v1.8 milestone May 6, 2024
@graciegoheen graciegoheen added enhancement New feature or request and removed bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe labels May 7, 2024
@graciegoheen graciegoheen added this to the v1.9 milestone May 7, 2024
@graciegoheen
Copy link
Contributor

I think:

  • we should try to support this
  • I'm not convinced the suggested interface
expect:
      rows:
        - {payment_id: 3, dollar: .333}            # expected
        - {payment_id: 4, dollar: 0.333323, _dbt_skip_casting: ["dollar"]}

is the best place to put this configuration

The configuration is:

  • only relevant for expect (?)
  • specific to X amount of columns

I'll sync with unit testing squad on what interface would be best here

@graciegoheen graciegoheen removed the pre-release Bug not yet in a stable release label May 7, 2024
@dbeatty10
Copy link
Contributor

What would be the consequences if we cast the expected output to the widest scale/precision/character size for the relevant data type? i.e., numeric without precision and scale rather than numeric(10, 2)?

It might give a failing unit test the way that we are after.

My understanding is that we use "wide" data types by default when enforcing model contracts. I'm not sure if there's prior art we could re-use here.

Code example

models/source_data.sql

select 2 as id, 50 as amount union all
select 3 as id, 33.33 as amount

models/stg_payments.sql

with

source as (

    select * from {{ ref("source_data") }}

),

staged as (

    select
        id as payment_id,
        -- amount is stored in cents, convert it to dollars
        cast(round(amount / 100, 2) as numeric(10, 2)) as amount
    from source

)

select * from staged

models/_unit.yml

unit_tests:
  - name: test_dollar_conversion
    model: stg_payments
    given:
      - input: ref("source_data")
        rows:
         - {id: 2, amount: 50}
         - {id: 3, amount: 33.33}
    expect:
      rows:
        - {payment_id: 2, amount: 0.5}
        - {payment_id: 3, amount: 0.3333}
dbt build
cp target/run/my_project/models/_unit.yml/models/test_dollar_conversion.sql analyses/test_dollar_conversion__current.sql
cp analyses/test_dollar_conversion__current.sql analyses/test_dollar_conversion__proposed.sql

Then modify analyses/test_dollar_conversion__proposed.sql like this:

55c55
<     cast(0.5 as numeric(10,2))
---
>     cast(0.5 as numeric)
63c63
<     cast(0.3333 as numeric(10,2))
---
>     cast(0.3333 as numeric)
dbt show -s analyses/test_dollar_conversion__current.sql
dbt show -s analyses/test_dollar_conversion__proposed.sql

Experiment

I tried changing this line to be column.dtype instead of column.data_type.

It worked when I tried it, but only if I quoted the "0.50" portion.

    expect:
      rows:
        - {payment_id: 2, amount: "0.50"}
        - {payment_id: 3, amount: 0.33}

Otherwise, I got this error:

image

I didn't try any other adapters other than dbt-postgres. The behavior might depend on the way each Python driver renders the value to a string 🤷.

Using the sql format like this also worked along with the data_type -> dtype change:

    expect:
        format: sql
        rows: |
          select 2 as payment_id, cast(0.50 as numeric(10, 2)) as amount union all
          select 3 as payment_id, cast(0.33 as numeric(10, 2)) as amount

@KaneMorgan
Copy link

We ran into this in and had a related discussion on the dbt slack here

What would be the consequences if we cast the expected output to the widest scale/precision/character size for the relevant data type?

For Redshift at least casting to just Numeric results in (18,0)

An option to use the types defined in our contract for the model would have been a nice way to solve this from our perspective. Being not too familiar with the implementation I thought the contract was the source of the truth for the expected data types. But that may not be possible if the contract uses "wide" data types

@graciegoheen graciegoheen removed this from the v1.9 milestone Aug 31, 2024
@dbeatty10 dbeatty10 added the unit tests Issues related to built-in dbt unit testing functionality label Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request unit tests Issues related to built-in dbt unit testing functionality user docs [docs.getdbt.com] Needs better documentation
Projects
None yet
Development

No branches or pull requests

5 participants