Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-544] [Feature] Support Dynamic Tables as a materialization (following base Materialized View ergonomics) #603

Closed
Fleid opened this issue May 10, 2023 · 18 comments · Fixed by #615 or #684
Assignees
Labels
Milestone

Comments

@Fleid
Copy link
Contributor

Fleid commented May 10, 2023

For more context about Dynamic Tables (DTs), see this post.

materialized: dynamic_table

This issue is the local implementation of the core feature: Let's add Materialized View as a materialization, finally.
The original spec is the baseline. In every aspect it's the same feature, with the exceptions described below.

Alias

The local alias for this materialization is dynamic_table, not materialized_view. The later is not supported and should return an error (to avoid confusion with Snowflake's MVs)

Schema management

Metadata caching/querying should be updated to add the following sources in order to support Dynamic Tables:

  • table level via SHOW DYNAMIC TABLES - doc
  • column level via DESCRIBE DYNAMIC TABLE - doc

Configurations

Available parameters:

  • LAG TARGET_LAG : string (<num> { seconds | minutes | hours | days })
  • WAREHOUSE : string
  • REFRESH_STRATEGY : {no_wait,refreshed}
  • From models: query_tag

Refresh strategy is described below.

Out of scope for the moment (not clear yet if applicable): copy_grants, persist_docs, clustering management

Lifecycles

  • Lifecycle (dbt run) :
    • If necessary backup (exists with different materialization)
    • If necessary CREATE OR REPLACE DYNAMIC TABLE (doesn't exist) and apply refresh_strategy Else
      • Check status (see below)
        • If FAILED then CREATE OR REPLACE DYNAMIC TABLE and apply refresh_strategy Else
          • If the options of the object in the database are different from the current model, depending on on_configuration_change
          • skip : Do nothing, aka stage changes until the next full refresh
          • fail : Fail the run
          • apply : [default] Apply the changes
  • Lifecycle (dbt run --full-refresh) :
    • If necessary backup (exists with different materialization OR exists and full-refresh)
    • CREATE OR REPLACE DYNAMIC TABLE and apply refresh_strategy

Status check

The refresh status of a DT is obtained via the following query:

SELECT TOP 1 state
FROM TABLE(INFORMATION_SCHEMA.DYNAMIC_TABLE_REFRESH_HISTORY(NAME => '...'))
ORDER BY refresh_version DESC;

The return code is a string among:

SCHEDULED, EXECUTING, SUCCEEDED, FAILED, CANCELLED, or SKIPPED

Refresh strategy

To ensure downstream models are able to query the DT in a proper state, dbt may have to wait for the DT to be successfully refreshed before moving on. The refresh_strategy setting offers 2 options:

Refresh strategy Behavior
no_wait default: dbt will not wait, it will proceed as soon as the DDL statement is issued
refresh optional: dbt will wait until the DT is in the SUCCEEDED state, polling every few seconds its status

In order to avoid waiting indefinitely, in refresh mode, the polling will stop and the run will fail when:

  • after the TARGET_LAG period has lapsed, any state other than SCHEDULED, EXECUTING is returned
  • after 2 TARGET_LAG periods have lapsed
@Fleid Fleid added the type:enhancement New feature or request label May 10, 2023
@Fleid Fleid added this to the 1.6 milestone May 10, 2023
@Fleid Fleid self-assigned this May 10, 2023
@github-actions github-actions bot changed the title [Feature] Support Dynamic Tables as a materialization (following base Materialized View ergonomics) [ADAP-544] [Feature] Support Dynamic Tables as a materialization (following base Materialized View ergonomics) May 10, 2023
@mikealfare mikealfare self-assigned this May 16, 2023
@mikealfare
Copy link
Contributor

@Fleid Do we actually store the 'lag' parameter in Snowflake anywhere? If not, I don't know how I'd track that and apply the effects of on_configuration_change.

Alternatively, it seems like this might not be something that we would need to store, and really is just a way to augment how dbt deploys the dynamic table changes. In other words, if nothing else has changed, in particular the refresh_strategy, then lag doesn't matter. If that's the case, then I think I'm good. Just let me know if that's the case.

@amychen1776
Copy link

@mikealfare You can see the target lag via SHOW DYNAMIC TABLES and information_schema.dynamic_table_graph_history

@sfc-gh-dsotolongo
Copy link

LAG : string ( { seconds | minutes | hours | days })

Snowflake is renaming this to TARGET_LAG (though LAG will still work for compatibility)

refresh | dbt will wait until the DT is in the SUCCEEDED state, polling every few seconds its status

DTs also support a manual refresh operation (in 7.19, early June). Invoking that would be more convenient than waiting for the schedule to kick it off.

@Fleid
Copy link
Contributor Author

Fleid commented May 31, 2023

Hey @sfc-gh-dsotolongo! Thanks for stopping by ;)
The forced refresh mode is covered there.

@mikealfare we should rename LAG to TARGET_LAG if that's possible. Doesn't need to happen by beta3 if they have a compatibility mode.

@mikealfare
Copy link
Contributor

This got closed via automation but there is still more development on it.

@mikealfare
Copy link
Contributor

Yay automation

@mikealfare mikealfare reopened this Jul 6, 2023
@arnoN7
Copy link

arnoN7 commented Jul 31, 2023

I tried the dynamic table implementation it works great! Thanks for doing it! If i'm not wrong the parameter for the warehouse is snowflake_warehouse not warehouse isn't it ?

@mikealfare
Copy link
Contributor

If i'm not wrong the parameter for the warehouse is snowflake_warehouse not warehouse isn't it ?

That's correct. We already had a parameter for warehouse (called snowflake_warehouse); we elected to stick with that one versus keep two floating around.

@cherns31
Copy link

cherns31 commented Aug 7, 2023

Hi. Is there any documentation on how to create DTs using dbt?

@amychen1776
Copy link

@cherns31 We have our our configuration docs and we have a developer blog

@cherns31
Copy link

cherns31 commented Aug 10, 2023

It's weird bc I don't find this topic at the configuration docs.

@martanthony
Copy link

@cherns31 Are you viewing the docs for an older version of dbt? There's a dropdown at the top of the page that needs to be set to 'v1.6' for the Dynamic Tables documentation to be visible at the link @amychen1776 sent.

@cherns31
Copy link

Oh yes, my bad 😊

@sfc-gh-pkommini
Copy link

Is there any official documentation on dbt docs? Did the materialization make it to a release?

@mikealfare
Copy link
Contributor

@sfc-gh-pkommini These docs may need to be updated with the 1.7.0 release that went out yesterday, but here's a link. Dynamic tables have been in since dbt-snowflake>=1.6.0.

@wylbee
Copy link

wylbee commented Nov 9, 2023

Apologies if I am missing it, but did/will the forced refresh strategy or something equivalent get implemented? Seeing the link from this thread leading to a closed issue, and in general would be interested in a way to manually ensure DTs are refreshed as part of a dbt run.

@mikealfare
Copy link
Contributor

@wylbee I created an issue to capture this and break it out of this feature: #859. At the moment, I don't believe a Snowflake user can create a dynamic table (through dbt or otherwise) without also executing the logic and populating the table. But if Snowflake ever adds that functionality, then this feature becomes possible.

@mikealfare
Copy link
Contributor

The remaining functionality described in the feature above has been broken out into separate stories: #667, #858, and #859. With that context, I'm closing this feature as complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment