[Bug] --empty flag not working on Pseudo-columns #1243

christopherekfeldt · 2024-05-16T07:54:52Z

Is this a new bug in dbt-bigquery?

I believe this is a new bug in dbt-bigquery
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When trying out the empty flag on my models I get failures on all models that uses the pseudo-column "_PARTITIONTIME" in their logic. Here is my query, it has worked perfectly fine prior.

{% set src_cpc_raw = source('customer_preference_center', 'customer_preference') -%}

select
    customerId,
    customerId_token,
    preference,
    preferenceInd,
    createTS,
    updateTS,
    operator,
    ingstn_ts,
    ingestion_dt
from (
    select
        customerId,
        customerId_token,
        centralPreferences.preference,
        centralPreferences.preferenceInd,
        parse_timestamp("%Y-%m-%d %k:%M:%E6S", createTS) as createTS,
        parse_timestamp("%Y-%m-%d %k:%M:%E6S", updateTS) as updateTS,
        centralPreferences.operator,
        ingstn_ts,
        _PARTITIONTIME as ingestion_dt
    from
        {{ src_cpc_raw }},
        unnest(centralPreferences) as centralPreferences
)
{% if is_incremental() %}
    where date(ingestion_dt) >= date_sub("{{ latest_partition_filter(src_cpc_raw) }}", interval 1 day)
{% endif %}
qualify row_number() over (partition by customerId_token, preference order by updateTS desc, ingstn_ts desc) = 1

But now it has swapped out the logic with a subquery that doesn't take the pseudo column into consideration:

/* {"app": "dbt", "dbt_version": "1.8.0", "profile_name": "etlapp", "target_name": "lab", "node_id": "model.batch_framework_module.harm_customer_preference_center__centralpreference"} */   

    create or replace table `ad25-p-datalab-fg2h`.`dbt_christopher`.`harm_customer_preference_center__centralpreference`
      
    
    cluster by ingstn_ts

    OPTIONS(
      description="""Incremental model for central preferences""",
    
      labels=[('batchfw_status', 'managed')]
    )
    as (
      select
    customerId,
    customerId_token,
    preference,
    preferenceInd,
    createTS,
    updateTS,
    operator,
    ingstn_ts,
    ingestion_dt
from (
    select
        customerId,
        customerId_token,
        centralPreferences.preference,
        centralPreferences.preferenceInd,
        parse_timestamp("%Y-%m-%d %k:%M:%E6S", createTS) as createTS,
        parse_timestamp("%Y-%m-%d %k:%M:%E6S", updateTS) as updateTS,
        centralPreferences.operator,
        ingstn_ts,
        _PARTITIONTIME as ingestion_dt
    from
        (select * from `ab73-np-rawlay-dev-3324`.`customer_preference_center`.`customer_preference` where false limit 0),
        unnest(centralPreferences) as centralPreferences
)

qualify row_number() over (partition by customerId_token, preference order by updateTS desc, ingstn_ts desc) = 1
    );

Giving the error in BigQuery: "Unrecognized name: _PARTITIONTIME at [37:9]"

Expected Behavior

I expect the subquery to work with pseudo-columns as well.

Steps To Reproduce

Use similiar SQL logic.
Run dbt build -s model_name --empty

Relevant log output

No response

Environment

- OS: ubuntu:rolling
- Python: 3.9.11
- dbt-core: 1.8.0
- dbt-bigquery: 1.8.0

Additional Context

No response

The text was updated successfully, but these errors were encountered:

jtcohen6 · 2024-05-16T11:30:31Z

@christopherekfeldt Thanks for the report!

The mechanism we're using for --empty is to wrap the source() and ref() calls in a subquery with select * ... where false limit 0. This * doesn't pass along pseudo-columns.

The first idea that came to mind:

we first access BQ metadata to figure out if the source/ref relation is an ingestion-time partitioned table
if it is, we include the pseudo-column — but even then, it must be aliased, so your subsequent query (referencing it as _PARTITIONTIME) will still fail

select *, _PARTITIONTIME as partition_time
from dbt_jcohen.myingestiontable
where false limit 0

Other ideas:

Append where false limit 0 without wrapping in a subquery (but this won't play nice with other where statements, unnest, etc)
Allow you to opt out this particular source() from the default --empty subquery, but access flags.EMPTY to apply your own conditional filter

jtcohen6 · 2024-05-16T12:28:30Z

In the meantime, you can at least avoid the error by specifying .render() on any refs/sources that you don't want dbt to turn into where false limit 0 subqueries.

If we added support for flags.EMPTY, then you could write something like:

{% set src_cpc_raw = source('customer_preference_center', 'customer_preference') -%}

    select
        ...,
        _PARTITIONTIME as ingestion_dt
    from
        {{ src_cpc_raw.render() }},   -- this will be rendered simply into `project.dataset.identifier` (no subquery)
        unnest(centralPreferences) as centralPreferences
     where 1=1
{% if flags.EMPTY %}
    and false limit 0                 -- instead, I manually add the "empty limit" here
{% endif %}
{% if is_incremental() %}
    and date(ingestion_dt) >= date_sub("{{ latest_partition_filter(src_cpc_raw) }}", interval 1 day)
{% endif %}
qualify row_number() over (partition by customerId_token, preference order by updateTS desc, ingstn_ts desc) = 1

github-christophe-oudar · 2024-06-27T14:51:36Z

My suggestion to solve this issue is related to dbt-labs/dbt-core#8560:
we need to be able to override the rendering from sources/refs.

For sources, we could have a way to add a parameter to the macro to add those metadata fields and for refs, since it would be related to "time_ingestion_partitioning": True,, we should be able to detect them by ourselves.

github-actions · 2024-12-25T02:01:52Z

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

github-actions · 2025-01-01T02:07:33Z

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

christopherekfeldt added type:bug Something isn't working triage:product labels May 16, 2024

jtcohen6 mentioned this issue May 16, 2024

Document known limitations to --empty flag dbt-labs/docs.getdbt.com#5520

Open

1 task

jtcohen6 removed the triage:product label May 16, 2024

jtcohen6 mentioned this issue May 16, 2024

[Feature] --empty flag should be exposed in flags variable dbt-labs/dbt-core#10152

Closed

2 tasks

github-actions bot added the Stale label Dec 25, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] --empty flag not working on Pseudo-columns #1243

[Bug] --empty flag not working on Pseudo-columns #1243

christopherekfeldt commented May 16, 2024 •

edited

Loading

jtcohen6 commented May 16, 2024 •

edited

Loading

jtcohen6 commented May 16, 2024 •

edited

Loading

github-christophe-oudar commented Jun 27, 2024

github-actions bot commented Dec 25, 2024

github-actions bot commented Jan 1, 2025

[Bug] --empty flag not working on Pseudo-columns #1243

[Bug] --empty flag not working on Pseudo-columns #1243

Comments

christopherekfeldt commented May 16, 2024 • edited Loading

Is this a new bug in dbt-bigquery?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Additional Context

jtcohen6 commented May 16, 2024 • edited Loading

jtcohen6 commented May 16, 2024 • edited Loading

github-christophe-oudar commented Jun 27, 2024

github-actions bot commented Dec 25, 2024

github-actions bot commented Jan 1, 2025

christopherekfeldt commented May 16, 2024 •

edited

Loading

jtcohen6 commented May 16, 2024 •

edited

Loading

jtcohen6 commented May 16, 2024 •

edited

Loading