[Bug] All test failures are stored even when a limit config is set #373

hvassard · 2024-01-26T10:58:34Z

Is this a new bug in dbt-core?

I believe this is a new bug in dbt-core
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

Hi there, firstly thanks for the amazing job you're doing, I am huge fan of dbt !

I use dbt core with dbt-redshift.

I wrote a test on a simple seed to check the behavior of the store_failures test config when used in addition to the limit config.

The seed I test (seeds/customers.csv) looks like this and has 100 rows (from ID 1 to 100):

ID,FIRST_NAME,LAST_NAME
1,Michael,P.
2,Shawn,M.
3,Kathleen,P.
4,Jimmy,C.
5,Katherine,R.
6,Sarah,R.
7,Martin,M.
8,Frank,R.
9,Jennifer,F.
10,Henry,W.

Download file here : customers.csv

The test definition (seeds/_seeds.yml) looks like this:
TLDR : The "ID" column must be strictly lower than 50. Store the failures. But stop if more than 20 failures.

version: 2

seeds:
  - name: customers
    +quote_columns: false
    columns:
      - name: ID
        tests:
          - dbt_utils.accepted_range:
              name: my_test_with_limit_and_store_failures
              max_value: 50
              inclusive: true
              config:
                store_failures: true
                warn_if: "<5"
                error_if: ">=5"
                limit: 20

What I experience when running dbt test -s customers :

The test log seems OK
But my test audit table generated in my warehouse contains all the failed row (50) and not 20 (the limit specified).

Expected Behavior

I guess only the 20 first row must be stored in the audit table (based on the store_failure config documentation pictured here)

Steps To Reproduce

python3 -m venv .venv. 
.venv/bin/activate; pip install -r requirements.txt
dbt deps
dbt seed
dbt test -s customers

Where requirements.txt is

dbt-core==1.7.6
dbt-redshift==1.7.2

Relevant log output

No response

Environment

- OS:Ubuntu 22.04.3 LTS
- Python: 3.9.7
- dbt: 1.7.6

Which database adapter are you using with dbt?

redshift

Additional Context

I noticed this SQL file generated in the target folder if it can help

The text was updated successfully, but these errors were encountered:

dbeatty10 · 2024-01-26T16:44:51Z

Hi there, firstly thanks for the amazing job you're doing, I am huge fan of dbt !

❤️

Thanks for the kind words @hvassard.

I see what you are saying. It looks to me like the solution would be to move this further up in the logic.

Side note: check out store_failures_as if you haven't already! It allows storing failures either as a table or a view (or not at all via ephemeral).

More detail

The logic that is used to store the failures is located in a file like target/compiled/my_project/seeds/_seeds.yml/my_test_with_limit_and_store_failures.sql:

with meet_condition as(
  select *
  from "db"."feature_456"."customers"
),

validation_errors as (
  select *
  from meet_condition
  where
    -- never true, defaults to an empty result set. Exists to ensure any combo of the `or` clauses below succeeds
    1 = 2
    -- records with a value <= max_value are permitted. The `not` flips this to find records that don't meet the rule.
    or not ID <= 5
)

select *
from validation_errors

In my case, it does a "create table as" into this table:

"db"."feature_456_dbt_test__audit"."my_test_with_limit_and_store_failures"

And then the actual test logic is in target/run/my_project/seeds/_seeds.yml/my_test_with_limit_and_store_failures.sql :

select
      count(*) as failures,
      count(*) != 0 as should_warn,
      count(*) != 0 as should_error
    from (
      
        select *
        from "db"."feature_456_dbt_test__audit"."my_test_with_limit_and_store_failures"
    
      limit 2
    ) dbt_internal_test

So you can see the first one does not have the limit clause applied, whereas the second one does.

Potential solution

So the solution might be to move this to here and here instead.

dbeatty10 · 2024-12-06T21:25:22Z

Transferred from dbt-core #9463 to #373

dbeatty10 · 2024-12-06T23:38:27Z

Reprex

Create these files:

seeds/customers.csv

id,first_name,last_name
1,Michael,P.
2,Shawn,M.
3,Kathleen,P.
4,Jimmy,C.
5,Katherine,R.
6,Sarah,R.

seeds/_seeds.yml

seeds:
  - name: customers
    columns:
      - name: id
        tests:
          - dbt_utils.accepted_range:
              name: my_test_with_limit_and_store_failures
              max_value: 1
              inclusive: true
              config:
                store_failures_as: table
                limit: 2

Run this command:

dbt build -s customers --full-refresh

Inspect the table and see that it has 5 results when according to the docs, it should only have 2:

When true, store_failures saves all records (up to limit) that failed the test. Failures are saved in a new table with the name of the test. By default, store_failures uses the schema {{ profile.schema }}_dbt_test__audit, but you can configure the schema suffix to a different value.

hvassard added bug Something isn't working triage labels Jan 26, 2024

dbeatty10 self-assigned this Jan 26, 2024

dbeatty10 removed the triage label Jan 26, 2024

dbeatty10 transferred this issue from dbt-labs/dbt-core Dec 6, 2024

This was referenced Dec 6, 2024

Respect the limit config when storing test failures #375

Draft

Respect the limit config when storing test failures #376

Draft

dbeatty10 removed their assignment Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] All test failures are stored even when a limit config is set #373

[Bug] All test failures are stored even when a limit config is set #373

hvassard commented Jan 26, 2024

dbeatty10 commented Jan 26, 2024

dbeatty10 commented Dec 6, 2024

dbeatty10 commented Dec 6, 2024

[Bug] All test failures are stored even when a limit config is set #373

[Bug] All test failures are stored even when a limit config is set #373

Comments

hvassard commented Jan 26, 2024

Is this a new bug in dbt-core?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Which database adapter are you using with dbt?

Additional Context

dbeatty10 commented Jan 26, 2024

More detail

Potential solution

dbeatty10 commented Dec 6, 2024

dbeatty10 commented Dec 6, 2024

Reprex