Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Regression] BigQuery: Unit tests using sources with sharded tables no longer work after CTE naming change #11075

Open
2 tasks done
antitoine opened this issue Nov 29, 2024 · 3 comments
Labels
bug Something isn't working regression unit tests Issues related to built-in dbt unit testing functionality

Comments

@antitoine
Copy link

antitoine commented Nov 29, 2024

Is this a regression in a recent version of dbt-core?

  • I believe this is a regression in dbt-core functionality
  • I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

Unit tests fail due to an invalid subquery name, as soon as the identifier contains an asterisk (*).

Expected/Previous Behavior

I expect the CTE name to use the name and not theidentifier field (as before).
Alternatively :

  • specify the CTE name
  • escape CTE name
  • enclose the CTE name in quotes

Steps To Reproduce

  1. Setup a schema.yml for referencing a sharded table:
version: 2

sources:
  - name: my_dataset
    database: my_database
    tables:
      - name: my_table
        identifier: my_table_*
  1. Use it in a model named my_model.sql:
{{
    config(
        materialized = 'incremental',
        incremental_strategy = 'insert_overwrite',
        on_schema_change = 'append_new_columns'
    )
}}

SELECT t.my_field
FROM {{ source('my_dataset', 'my_table') }} t
  1. Add unit tests on it:
version: 2
unit_tests:
  - name: my_failing_cte_test
    description: "Not working test because of CTE name"
    model: my_model
    given:
      - input: source('my_dataset', 'my_table')
        rows:
          - { my_field: 10 }
          - { my_field: 20 }
    expect:
      rows:
          - { my_field: 10 }
          - { my_field: 20 }

Relevant log output

Generated query:

create or replace table `my-project`.`testing`.`my_failing_cte_test__dbt_tmp__fcd73216-184a-43b9-a5e7-a3759c4ea2f7`
OPTIONS(
  expiration_timestamp=TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 12 hour)
)
as (
  select * from (
    with __dbt__cte__my_table_* as ( -- <<<<<<<<<<<<< ISSUE HERE
...
  An error occurred during execution of unit test 'my_failing_cte_test'. There may be an error in the unit test definition: check the data types.
   Database Error
    Syntax error: Expected keyword AS but got "*" at [19:41].

Environment

- OS: Debian Bullseye
- Python: 3.11.3
- dbt (working version): 1.9.0-a1
- dbt (regression version): 1.9.0-b1

Which database adapter are you using with dbt?

bigquery

Additional Context

I think the regression was introduced by this feature:

More exactly by this PR: https://github.com/dbt-labs/dbt-core/pull/10290/files#diff-256fed5ccb449b0e82952615a14120463e22688ddd6454f891463d7f7623551aR374

@antitoine antitoine added bug Something isn't working regression triage labels Nov 29, 2024
@Kayrnt
Copy link
Contributor

Kayrnt commented Nov 29, 2024

A workaround is to wrap the source in an ephemeral and then materialize that ephemeral model as SQL fixture in the unit test.

@dbeatty10 dbeatty10 added the unit tests Issues related to built-in dbt unit testing functionality label Dec 2, 2024
@dbeatty10 dbeatty10 changed the title [Regression] Unit tests no longer work after CTE naming change [Regression] BigQuery: Unit tests using sources with sharded tables no longer work after CTE naming change Dec 4, 2024
@dbeatty10
Copy link
Contributor

dbeatty10 commented Dec 4, 2024

@antitoine I see what you are saying about sources that use the * syntax for sharded tables in BigQuery.

The solution here could be similar to that of a separate issue related to sources in unit tests: #10433

There are some ideas of how to solve this in #10433 (comment) and #10433 (comment).

Workaround

Thanks for describing a workaround @Kayrnt 🧠

Would you be willing to share a simple code example to demonstrate it?

@Kayrnt
Copy link
Contributor

Kayrnt commented Dec 4, 2024

Workaround

Keeping @antitoine example, we create a new model (my_model_ephemeral.sql):

{{
    config(
        materialized = 'ephemeral',
    )
}}

SELECT t.my_field
FROM {{ source('my_dataset', 'my_table') }} t

Then in my_model.sql, we update it to:

{{
    config(
        materialized = 'incremental',
        incremental_strategy = 'insert_overwrite',
        on_schema_change = 'append_new_columns'
    )
}}

SELECT t.my_field
FROM {{ ref('my_model_ephemeral') }} t

and update the test accordingly:

version: 2
unit_tests:
  - name: my_failing_cte_test
    description: "working test because of ephemeral layer trick"
    model: my_model
    given:
      - input: ref('my_model_ephemeral')
      - format: sql
        rows: |
          SELECT 10 AS my_field
          UNION ALL
          SELECT 20 AS my_field
    expect:
      rows:
          - { my_field: 10 }
          - { my_field: 20 }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working regression unit tests Issues related to built-in dbt unit testing functionality
Projects
None yet
Development

No branches or pull requests

3 participants