[Unit Testing] Allow explicit precision testing in unit tests #9884

emmyoop · 2024-04-09T15:49:45Z

Housekeeping

I am a maintainer of dbt-core

Short description

We cannot currently unit test precision. The expected outputs get rounded to be the expected datatypes and therefore are not correctly parsed.

from @MichelleArk in #9627

I think it would likely be in get_expected_sql, since that applies casting to the 'expected' structure based on the sql of the model being tested.

I think there are a couple ways we can approach a fix here:

Explicit option for users to opt out of the automatic casting of expect values. E.g. something called '_dbt_skip_casting' or '_dbt_use_literal'. I think this option should only be valid on the expect rows as opposed to input ones, since there isn't any benefit to being able to test your model given a different input schema as those are fixed in production.

Acceptance criteria

A unit test can be written to test precision

Suggested Tests

model.sql

with

source as (

    select * from {{ source('stripe',  'payment') }}

),

staged as (

    select
        id as payment_id,
        orderid as order_id,
        paymentmethod as payment_method,
        status,
        -- amount is stored in cents, convert it to dollars
        round(amount / 100, 2) as amount,
        created as created_at
    from source

)

select * from staged

failing unit test yaml

unit_tests:
  - name: test_dollar_conversion # this is the unique name of the test
    model: stg_payments # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: source('stripe',  'payment')
        rows:
         - {id: 2, amount: 50}
         - {id: 3, amount: 33.33}
    expect: # the expected output given the inputs above
      rows:
        - {payment_id: 2, amount: 0.5}
        - {payment_id: 3, amount: 0.3333}

passing unit test yaml

unit_tests:
  - name: test_dollar_conversion # this is the unique name of the test
    model: stg_payments # name of the model I'm unit testing
    given: # the mock data for your inputs
      - input: source('stripe',  'payment')
        rows:
         - {id: 2, amount: 50}
         - {id: 3, amount: 33.33}
    expect: # the expected output given the inputs above
      rows:
        - {payment_id: 2, amount: 0.50}
        - {payment_id: 3, amount: 0.33}

Impact to Other Teams

none

Will backports be required?

no

Context

Outcome of spike in #9627

This will required a change to the macros in dbt-adapters and tests in dbt-core. The tests in dbt-core will not pass until a new version of dbt-adapters is released.

The text was updated successfully, but these errors were encountered:

ChenyuLInx · 2024-05-06T16:29:37Z

@graciegoheen Can you take a look at spike outcome here and see whether you think this ticket is worth pursuing?

ChenyuLInx · 2024-05-06T16:30:01Z

We should document it do not work if we are not moving forward on this ticket.

graciegoheen · 2024-05-07T00:46:08Z

I think:

we should try to support this
I'm not convinced the suggested interface

expect:
      rows:
        - {payment_id: 3, dollar: .333}            # expected
        - {payment_id: 4, dollar: 0.333323, _dbt_skip_casting: ["dollar"]}

is the best place to put this configuration

The configuration is:

only relevant for expect (?)
specific to X amount of columns

I'll sync with unit testing squad on what interface would be best here

dbeatty10 · 2024-05-07T15:50:01Z

What would be the consequences if we cast the expected output to the widest scale/precision/character size for the relevant data type? i.e., numeric without precision and scale rather than numeric(10, 2)?

It might give a failing unit test the way that we are after.

My understanding is that we use "wide" data types by default when enforcing model contracts. I'm not sure if there's prior art we could re-use here.

Code example

models/source_data.sql

select 2 as id, 50 as amount union all
select 3 as id, 33.33 as amount

models/stg_payments.sql

with

source as (

    select * from {{ ref("source_data") }}

),

staged as (

    select
        id as payment_id,
        -- amount is stored in cents, convert it to dollars
        cast(round(amount / 100, 2) as numeric(10, 2)) as amount
    from source

)

select * from staged

models/_unit.yml

unit_tests:
  - name: test_dollar_conversion
    model: stg_payments
    given:
      - input: ref("source_data")
        rows:
         - {id: 2, amount: 50}
         - {id: 3, amount: 33.33}
    expect:
      rows:
        - {payment_id: 2, amount: 0.5}
        - {payment_id: 3, amount: 0.3333}

dbt build
cp target/run/my_project/models/_unit.yml/models/test_dollar_conversion.sql analyses/test_dollar_conversion__current.sql
cp analyses/test_dollar_conversion__current.sql analyses/test_dollar_conversion__proposed.sql

Then modify analyses/test_dollar_conversion__proposed.sql like this:

55c55
<     cast(0.5 as numeric(10,2))
---
>     cast(0.5 as numeric)
63c63
<     cast(0.3333 as numeric(10,2))
---
>     cast(0.3333 as numeric)

dbt show -s analyses/test_dollar_conversion__current.sql
dbt show -s analyses/test_dollar_conversion__proposed.sql

Experiment

I tried changing this line to be column.dtype instead of column.data_type.

It worked when I tried it, but only if I quoted the "0.50" portion.

    expect:
      rows:
        - {payment_id: 2, amount: "0.50"}
        - {payment_id: 3, amount: 0.33}

Otherwise, I got this error:

I didn't try any other adapters other than dbt-postgres. The behavior might depend on the way each Python driver renders the value to a string 🤷.

Using the sql format like this also worked along with the data_type -> dtype change:

    expect:
        format: sql
        rows: |
          select 2 as payment_id, cast(0.50 as numeric(10, 2)) as amount union all
          select 3 as payment_id, cast(0.33 as numeric(10, 2)) as amount

KaneMorgan · 2024-07-31T16:40:48Z

We ran into this in and had a related discussion on the dbt slack here

What would be the consequences if we cast the expected output to the widest scale/precision/character size for the relevant data type?

For Redshift at least casting to just Numeric results in (18,0)

An option to use the types defined in our contract for the model would have been a nice way to solve this from our perspective. Being not too familiar with the implementation I thought the contract was the source of the truth for the expected data types. But that may not be possible if the contract uses "wide" data types

emmyoop added the user docs [docs.getdbt.com] Needs better documentation label Apr 9, 2024

emmyoop mentioned this issue Apr 9, 2024

[Spike+] can't unit test precision #9627

Closed

2 tasks

emmyoop added the High Severity bug with significant impact that should be resolved in a reasonable timeframe label Apr 15, 2024

graciegoheen added bug Something isn't working pre-release Bug not yet in a stable release labels Apr 17, 2024

graciegoheen added this to the v1.8 milestone Apr 17, 2024

graciegoheen mentioned this issue Apr 17, 2024

Docs Changes Needed from dbt-core Issue #9627 dbt-labs/docs.getdbt.com#5254

Closed

graciegoheen removed this from the v1.8 milestone May 6, 2024

graciegoheen mentioned this issue May 6, 2024

[CT-2911] [Epic] Unit testing dbt models #8283

Closed

graciegoheen added enhancement New feature or request and removed bug Something isn't working High Severity bug with significant impact that should be resolved in a reasonable timeframe labels May 7, 2024

graciegoheen added this to the v1.9 milestone May 7, 2024

graciegoheen removed the pre-release Bug not yet in a stable release label May 7, 2024

graciegoheen removed this from the v1.9 milestone Aug 31, 2024

mchonofsky mentioned this issue Sep 20, 2024

[Feature] Support epsilon / tolerance for float values in unit test expected values. #10470

Open

3 tasks

dbeatty10 added the unit tests Issues related to built-in dbt unit testing functionality label Sep 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Unit Testing] Allow explicit precision testing in unit tests #9884

[Unit Testing] Allow explicit precision testing in unit tests #9884

emmyoop commented Apr 9, 2024

ChenyuLInx commented May 6, 2024

ChenyuLInx commented May 6, 2024

graciegoheen commented May 7, 2024

dbeatty10 commented May 7, 2024

Code example

KaneMorgan commented Jul 31, 2024

[Unit Testing] Allow explicit precision testing in unit tests #9884

[Unit Testing] Allow explicit precision testing in unit tests #9884

Comments

emmyoop commented Apr 9, 2024

Housekeeping

Short description

Acceptance criteria

Suggested Tests

Impact to Other Teams

Will backports be required?

Context

ChenyuLInx commented May 6, 2024

ChenyuLInx commented May 6, 2024

graciegoheen commented May 7, 2024

dbeatty10 commented May 7, 2024

Code example

Experiment

KaneMorgan commented Jul 31, 2024