Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

“Syntax error: Unexpected keyword DECLARE” on valid query string #1096

Closed
2 tasks done
dbrtly opened this issue Feb 7, 2024 · 1 comment
Closed
2 tasks done
Labels
type:bug Something isn't working

Comments

@dbrtly
Copy link
Contributor

dbrtly commented Feb 7, 2024

Is this a new bug in dbt-bigquery?

  • I believe this is a new bug in dbt-bigquery
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

In bigquery, I’m trying to read only the partitions not yet processed downstream. Bigquery is tricky for this because it’s implementation is … particular.

Avoiding a full column scan comes down to schema design (harder to change) or adding a predicate like where ingest_at > (select date_sub(current_timestamp(), interval 4 day)

I’ve tried writing jinja for this but it quickly got complex and testing that jinja is a whole new problem.

Bigquery has a procedural language that enables:

declare most_recent_watermark 
default (
  select max(ingest_at)
  from {{ this }}
  where ingest_at > (
    select 
      date_sub(current_timestamp(),
      interval 4 day
    )
  )
)


select *
From {{ ref(“my_table”) }}
where 
  ingest_at > most_recent_watermark and
  ingest_at > (
    select 
      date_sub(current_timestamp(),
      interval 4 day
    )
  )

In the bigquery console, stubbing the jinja relation functions, this does exactly what I want.

In a dbt model, it compiles but on run it throws an error:

syntax error: Unexpected keyword DECLARE at [15:1]

Is there a workaround?

Expected Behavior

When I copy/paste the compiled query into the bigquery console it does exactly what I want.

Steps To Reproduce

declare most_recent_watermark 
default (
  select max(ingest_at)
  from {{ this }}
  where ingest_at > (
    select 
      date_sub(current_timestamp(),
      interval 4 day
    )
  )
)


select *
From {{ ref(“my_table”) }}
where 
  ingest_at > most_recent_watermark and
  ingest_at > (
    select 
      date_sub(current_timestamp(),
      interval 4 day
    )
  )

Relevant log output

See above

Environment

- OS: windows
- Python: 3.10
- dbt-core: 1.7.7
- dbt-bigquery: 1.6.9

Additional Context

No response

@dbrtly dbrtly added type:bug Something isn't working triage:product labels Feb 7, 2024
@dbeatty10
Copy link
Contributor

Thanks for raising this @dbrtly!

Returning the largest partition

Does #286 / #285 look like it would solve the problem you are going for?

Using sql_header

In general, sql_header is how you can use DECLARE, like described here. But one limitation is that it can't use the ref or source macros.

Using {{ this }} happens to work (as long as a custom generate_alias_name isn't used).

Since I think sql_header allows you to use DECLARE, I'm going to close this issue as not needed. I've also opened up an issue to clarify the docs: dbt-labs/docs.getdbt.com#4890

Example

Something like this might work for you the way you hoped for:

{%- call set_sql_header(config) -%}

declare most_recent_watermark
default (
  select max(ingest_at)
  from {{ this }}
  where ingest_at > (
    select 
    date_sub(current_timestamp(),
    interval 4 day
    )
  )
);

{%- endcall %}

select *
from {{ ref("my_table") }}
where 
  ingest_at > most_recent_watermark and
  ingest_at > (
    select 
      date_sub(current_timestamp(),
      interval 4 day
    )
  )

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Feb 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants