You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have searched the existing issues, and I could not find an existing issue for this bug
Current Behavior
In bigquery, I’m trying to read only the partitions not yet processed downstream. Bigquery is tricky for this because it’s implementation is … particular.
Avoiding a full column scan comes down to schema design (harder to change) or adding a predicate like where ingest_at > (select date_sub(current_timestamp(), interval 4 day)
I’ve tried writing jinja for this but it quickly got complex and testing that jinja is a whole new problem.
Bigquery has a procedural language that enables:
declare most_recent_watermark
default (
selectmax(ingest_at)
from {{ this }}
where ingest_at > (
select
date_sub(current_timestamp(),
interval 4 day
)
)
)
select*From {{ ref(“my_table”) }}
where
ingest_at > most_recent_watermark and
ingest_at > (
select
date_sub(current_timestamp(),
interval 4 day
)
)
In the bigquery console, stubbing the jinja relation functions, this does exactly what I want.
In a dbt model, it compiles but on run it throws an error:
syntax error: Unexpected keyword DECLARE at [15:1]
Is there a workaround?
Expected Behavior
When I copy/paste the compiled query into the bigquery console it does exactly what I want.
Steps To Reproduce
declare most_recent_watermark
default (
selectmax(ingest_at)
from {{ this }}
where ingest_at > (
select
date_sub(current_timestamp(),
interval 4 day
)
)
)
select*From {{ ref(“my_table”) }}
where
ingest_at > most_recent_watermark and
ingest_at > (
select
date_sub(current_timestamp(),
interval 4 day
)
)
Since I think sql_header allows you to use DECLARE, I'm going to close this issue as not needed. I've also opened up an issue to clarify the docs: dbt-labs/docs.getdbt.com#4890
Example
Something like this might work for you the way you hoped for:
{%- call set_sql_header(config) -%}
declare most_recent_watermark
default (
selectmax(ingest_at)
from {{ this }}
where ingest_at > (
select
date_sub(current_timestamp(),
interval 4 day
)
)
);
{%- endcall %}
select*from {{ ref("my_table") }}
where
ingest_at > most_recent_watermark and
ingest_at > (
select
date_sub(current_timestamp(),
interval 4 day
)
)
Is this a new bug in dbt-bigquery?
Current Behavior
In bigquery, I’m trying to read only the partitions not yet processed downstream. Bigquery is tricky for this because it’s implementation is … particular.
Avoiding a full column scan comes down to schema design (harder to change) or adding a predicate like where ingest_at > (select date_sub(current_timestamp(), interval 4 day)
I’ve tried writing jinja for this but it quickly got complex and testing that jinja is a whole new problem.
Bigquery has a procedural language that enables:
In the bigquery console, stubbing the jinja relation functions, this does exactly what I want.
In a dbt model, it compiles but on run it throws an error:
Is there a workaround?
Expected Behavior
When I copy/paste the compiled query into the bigquery console it does exactly what I want.
Steps To Reproduce
Relevant log output
See above
Environment
Additional Context
No response
The text was updated successfully, but these errors were encountered: