Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] --empty flag not working on Pseudo-columns #1243

Closed
2 tasks done
christopherekfeldt opened this issue May 16, 2024 · 5 comments
Closed
2 tasks done

[Bug] --empty flag not working on Pseudo-columns #1243

christopherekfeldt opened this issue May 16, 2024 · 5 comments
Labels
Stale type:bug Something isn't working

Comments

@christopherekfeldt
Copy link

christopherekfeldt commented May 16, 2024

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When trying out the empty flag on my models I get failures on all models that uses the pseudo-column "_PARTITIONTIME" in their logic. Here is my query, it has worked perfectly fine prior.

{% set src_cpc_raw = source('customer_preference_center', 'customer_preference') -%}

select
    customerId,
    customerId_token,
    preference,
    preferenceInd,
    createTS,
    updateTS,
    operator,
    ingstn_ts,
    ingestion_dt
from (
    select
        customerId,
        customerId_token,
        centralPreferences.preference,
        centralPreferences.preferenceInd,
        parse_timestamp("%Y-%m-%d %k:%M:%E6S", createTS) as createTS,
        parse_timestamp("%Y-%m-%d %k:%M:%E6S", updateTS) as updateTS,
        centralPreferences.operator,
        ingstn_ts,
        _PARTITIONTIME as ingestion_dt
    from
        {{ src_cpc_raw }},
        unnest(centralPreferences) as centralPreferences
)
{% if is_incremental() %}
    where date(ingestion_dt) >= date_sub("{{ latest_partition_filter(src_cpc_raw) }}", interval 1 day)
{% endif %}
qualify row_number() over (partition by customerId_token, preference order by updateTS desc, ingstn_ts desc) = 1

But now it has swapped out the logic with a subquery that doesn't take the pseudo column into consideration:

/* {"app": "dbt", "dbt_version": "1.8.0", "profile_name": "etlapp", "target_name": "lab", "node_id": "model.batch_framework_module.harm_customer_preference_center__centralpreference"} */   

    create or replace table `ad25-p-datalab-fg2h`.`dbt_christopher`.`harm_customer_preference_center__centralpreference`
      
    
    cluster by ingstn_ts

    OPTIONS(
      description="""Incremental model for central preferences""",
    
      labels=[('batchfw_status', 'managed')]
    )
    as (
      select
    customerId,
    customerId_token,
    preference,
    preferenceInd,
    createTS,
    updateTS,
    operator,
    ingstn_ts,
    ingestion_dt
from (
    select
        customerId,
        customerId_token,
        centralPreferences.preference,
        centralPreferences.preferenceInd,
        parse_timestamp("%Y-%m-%d %k:%M:%E6S", createTS) as createTS,
        parse_timestamp("%Y-%m-%d %k:%M:%E6S", updateTS) as updateTS,
        centralPreferences.operator,
        ingstn_ts,
        _PARTITIONTIME as ingestion_dt
    from
        (select * from `ab73-np-rawlay-dev-3324`.`customer_preference_center`.`customer_preference` where false limit 0),
        unnest(centralPreferences) as centralPreferences
)

qualify row_number() over (partition by customerId_token, preference order by updateTS desc, ingstn_ts desc) = 1
    );

Giving the error in BigQuery: "Unrecognized name: _PARTITIONTIME at [37:9]"

Expected Behavior

I expect the subquery to work with pseudo-columns as well.

Steps To Reproduce

  1. Use similiar SQL logic.
  2. Run dbt build -s model_name --empty

Relevant log output

No response

Environment

- OS: ubuntu:rolling
- Python: 3.9.11
- dbt-core: 1.8.0
- dbt-bigquery: 1.8.0

Additional Context

No response

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 16, 2024

@christopherekfeldt Thanks for the report!

The mechanism we're using for --empty is to wrap the source() and ref() calls in a subquery with select * ... where false limit 0. This * doesn't pass along pseudo-columns.

The first idea that came to mind:

  • we first access BQ metadata to figure out if the source/ref relation is an ingestion-time partitioned table
  • if it is, we include the pseudo-column — but even then, it must be aliased, so your subsequent query (referencing it as _PARTITIONTIME) will still fail
select *, _PARTITIONTIME as partition_time
from dbt_jcohen.myingestiontable
where false limit 0

Other ideas:

  • Append where false limit 0 without wrapping in a subquery (but this won't play nice with other where statements, unnest, etc)
  • Allow you to opt out this particular source() from the default --empty subquery, but access flags.EMPTY to apply your own conditional filter

@jtcohen6
Copy link
Contributor

jtcohen6 commented May 16, 2024

In the meantime, you can at least avoid the error by specifying .render() on any refs/sources that you don't want dbt to turn into where false limit 0 subqueries.

If we added support for flags.EMPTY, then you could write something like:

{% set src_cpc_raw = source('customer_preference_center', 'customer_preference') -%}

    select
        ...,
        _PARTITIONTIME as ingestion_dt
    from
        {{ src_cpc_raw.render() }},   -- this will be rendered simply into `project.dataset.identifier` (no subquery)
        unnest(centralPreferences) as centralPreferences
     where 1=1
{% if flags.EMPTY %}
    and false limit 0                 -- instead, I manually add the "empty limit" here
{% endif %}
{% if is_incremental() %}
    and date(ingestion_dt) >= date_sub("{{ latest_partition_filter(src_cpc_raw) }}", interval 1 day)
{% endif %}
qualify row_number() over (partition by customerId_token, preference order by updateTS desc, ingstn_ts desc) = 1

@github-christophe-oudar
Copy link
Contributor

My suggestion to solve this issue is related to dbt-labs/dbt-core#8560:
we need to be able to override the rendering from sources/refs.

For sources, we could have a way to add a parameter to the macro to add those metadata fields and for refs, since it would be related to "time_ingestion_partitioning": True,, we should be able to detect them by ourselves.

Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Dec 25, 2024
Copy link
Contributor

github-actions bot commented Jan 1, 2025

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stale type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants