diff --git a/website/docs/reference/resource-configs/snowflake-configs.md b/website/docs/reference/resource-configs/snowflake-configs.md index a59bc8dee00..809667ef685 100644 --- a/website/docs/reference/resource-configs/snowflake-configs.md +++ b/website/docs/reference/resource-configs/snowflake-configs.md @@ -9,6 +9,397 @@ To-do: - use the reference doc structure for this article / split into separate articles ---> +## Iceberg table format + +The dbt-snowflake adapter supports the Iceberg table format. It is available for three of the Snowflake materializations: + +- [Table](/docs/build/materializations#table) +- [Incremental](/docs/build/materializations#incremental) +- [Dynamic](#dynamic-tables) + +For now, to create Iceberg tables, you must implement a [behavior flag](/reference/global-configs/behavior-changes) due to performance impact related to using Iceberg tables. Snowflake does not support `is_iceberg` on the `Show Objects` query, which dbt depends on for metadata. + +To use Iceberg, set the `enable_iceberg_materializations` flag to `True` in your dbt_project.yml: + + + +```yaml + +flags: + enable_iceberg_materializations: True + +``` + + + + +The following configurations are supported: + +| Field | Type | Required | Description | Sample input | +|-------|------|----------|-------------|-------| +|Table Format | String | Yes | Configures the object table format. | `iceberg` | +|External volume | String | Yes | Specifies the identifier (name) of the external volume where the Iceberg table stores its metadata files and data in Parquet format. If you don’t specify this parameter, the Iceberg table defaults to the external volume for the schema, database, or account. The schema takes precedence over the database, and the database takes precedence over the account.| `my_s3_bucket` | +|Base location Subpath |String | No | Defines the directory path for Snowflake to write the table data and metadata files. If you change this after creating the table, it will write a new table but leave the existing one. | `jaffle_marketing_folder` | + +### Example configuration + +To configure an Iceberg table materialization in dbt, refer to the example configuration: + + + +```sql + +{{ + config( + materialized = "table", + table_format="iceberg", + external_volume="s3_iceberg_snow", + ) +}} + +select * from {{ ref('raw_orders') }} + +``` + + + +### Base location + +Snowflake's `CREATE ICEBERG TABLE` DDL requires that a `base_location` be provided. dbt defines this parameter on the user's behalf to streamline usage and enforce basic isolation of table data within the `EXTERNAL VOLUME`. The default behavior in dbt is to provide a `base_location` string of the form: `_dbt/{SCHEMA_NAME}/{MODEL_NAME}` + +#### Base Location Subpath +We recommend using dbt's auto-generated `base_location`. However, if you need to customize the resulting `base_location`, dbt allows users to configure a `base_location_subpath`. When specified, the subpath concatenates to the end of the previously described pattern for `base_location` string generation. + +For example, `config(base_location_subpath="prod")` will generate a `base_location` of the form `_dbt/{SCHEMA_NAME}/{MODEL_NAME}`. + +A theoretical (but not recommended) use case is re-using an `EXTERNAL VOLUME` while maintaining isolation across development and production environments. We recommend against this as storage permissions should configured on the external volume and underlying storage, not paths that any analytics engineer can modify. + +#### Rationale + +dbt manages `base_location` on behalf of users to enforce best practices. With Snowflake-managed Iceberg format tables, the user owns and maintains the data storage of the tables in an external storage solution (the declared `external volume`). The `base_ location` parameter declares where to write the data within the external volume. The Snowflake Iceberg catalog keeps track of your Iceberg table regardless of where the data lives within the `external volume` declared and the `base_location` provided. However, Snowflake permits passing anything into the `base_location` field, including an empty string, even reusing the same path across multiple tables. This behavior could result in future technical debt because it will limit the ability to: + +- Navigate the underlying object store (S3/Azure blob) +- Read Iceberg tables via an object-store integration +- Grant schema-specific access to tables via object store +- Use a crawler pointed at the tables within the external volume to build a new catalog with another tool + +To maintain best practices, we enforce an input. Currently, we do not support overriding the default `base location` input but will consider it based on user feedback. + +In summary, dbt-snowflake does not support arbitrary definition of `base_location` for Iceberg tables. Instead, dbt, by default, writes your tables within a `_dbt/{SCHEMA_NAME}/{TABLE_NAME}` prefix to ensure easier object-store observability and auditability. + +### Limitations + +There are some limitations to the implementation you need to be aware of: + +- Using Iceberg tables with dbt, the result is that your query is materialized in Iceberg. However, often, dbt creates intermediary objects as temporary and transient tables for certain materializations, such as incremental ones. It is not possible to configure these temporary objects also to be Iceberg-formatted. You may see non-Iceberg tables created in the logs to support specific materializations, but they will be dropped after usage. +- You cannot incrementally update a preexisting incremental model to be an Iceberg table. To do so, you must fully rebuild the table with the `--full-refresh` flag. + + +## Dynamic tables + +The Snowflake adapter supports [dynamic tables](https://docs.snowflake.com/en/user-guide/dynamic-tables-about). +This materialization is specific to Snowflake, which means that any model configuration that +would normally come along for the ride from `dbt-core` (e.g. as with a `view`) may not be available +for dynamic tables. This gap will decrease in future patches and versions. +While this materialization is specific to Snowflake, it very much follows the implementation +of [materialized views](/docs/build/materializations#Materialized-View). +In particular, dynamic tables have access to the `on_configuration_change` setting. +Dynamic tables are supported with the following configuration parameters: + + + +| Parameter | Type | Required | Default | Change Monitoring Support | +|--------------------|------------|----------|-------------|---------------------------| +| [`on_configuration_change`](/reference/resource-configs/on_configuration_change) | `` | no | `apply` | n/a | +| [`target_lag`](#target-lag) | `` | yes | | alter | +| [`snowflake_warehouse`](#configuring-virtual-warehouses) | `` | yes | | alter | + + + + +| Parameter | Type | Required | Default | Change Monitoring Support | +|--------------------|------------|----------|-------------|---------------------------| +| [`on_configuration_change`](/reference/resource-configs/on_configuration_change) | `` | no | `apply` | n/a | +| [`target_lag`](#target-lag) | `` | yes | | alter | +| [`snowflake_warehouse`](#configuring-virtual-warehouses) | `` | yes | | alter | +| [`refresh_mode`](#refresh-mode) | `` | no | `AUTO` | refresh | +| [`initialize`](#initialize) | `` | no | `ON_CREATE` | n/a | + + + + + + + + + + + +```yaml +models: + [](/reference/resource-configs/resource-path): + [+](/reference/resource-configs/plus-prefix)[materialized](/reference/resource-configs/materialized): dynamic_table + [+](/reference/resource-configs/plus-prefix)[on_configuration_change](/reference/resource-configs/on_configuration_change): apply | continue | fail + [+](/reference/resource-configs/plus-prefix)[target_lag](#target-lag): downstream | + [+](/reference/resource-configs/plus-prefix)[snowflake_warehouse](#configuring-virtual-warehouses): + +``` + + + + + + + + + + +```yaml +version: 2 + +models: + - name: [] + config: + [materialized](/reference/resource-configs/materialized): dynamic_table + [on_configuration_change](/reference/resource-configs/on_configuration_change): apply | continue | fail + [target_lag](#target-lag): downstream | + [snowflake_warehouse](#configuring-virtual-warehouses): + +``` + + + + + + + + + + +```jinja + +{{ config( + [materialized](/reference/resource-configs/materialized)="dynamic_table", + [on_configuration_change](/reference/resource-configs/on_configuration_change)="apply" | "continue" | "fail", + [target_lag](#target-lag)="downstream" | " seconds | minutes | hours | days", + [snowflake_warehouse](#configuring-virtual-warehouses)="", + +) }} + +``` + + + + + + + + + + + + + + + + + +```yaml +models: + [](/reference/resource-configs/resource-path): + [+](/reference/resource-configs/plus-prefix)[materialized](/reference/resource-configs/materialized): dynamic_table + [+](/reference/resource-configs/plus-prefix)[on_configuration_change](/reference/resource-configs/on_configuration_change): apply | continue | fail + [+](/reference/resource-configs/plus-prefix)[target_lag](#target-lag): downstream | + [+](/reference/resource-configs/plus-prefix)[snowflake_warehouse](#configuring-virtual-warehouses): + [+](/reference/resource-configs/plus-prefix)[refresh_mode](#refresh-mode): AUTO | FULL | INCREMENTAL + [+](/reference/resource-configs/plus-prefix)[initialize](#initialize): ON_CREATE | ON_SCHEDULE + +``` + + + + + + + + + + +```yaml +version: 2 + +models: + - name: [] + config: + [materialized](/reference/resource-configs/materialized): dynamic_table + [on_configuration_change](/reference/resource-configs/on_configuration_change): apply | continue | fail + [target_lag](#target-lag): downstream | + [snowflake_warehouse](#configuring-virtual-warehouses): + [refresh_mode](#refresh-mode): AUTO | FULL | INCREMENTAL + [initialize](#initialize): ON_CREATE | ON_SCHEDULE + +``` + + + + + + + + + + +```jinja + +{{ config( + [materialized](/reference/resource-configs/materialized)="dynamic_table", + [on_configuration_change](/reference/resource-configs/on_configuration_change)="apply" | "continue" | "fail", + [target_lag](#target-lag)="downstream" | " seconds | minutes | hours | days", + [snowflake_warehouse](#configuring-virtual-warehouses)="", + [refresh_mode](#refresh-mode)="AUTO" | "FULL" | "INCREMENTAL", + [initialize](#initialize)="ON_CREATE" | "ON_SCHEDULE", + +) }} + +``` + + + + + + + + + +Learn more about these parameters in Snowflake's [docs](https://docs.snowflake.com/en/sql-reference/sql/create-dynamic-table): + +### Target lag + +Snowflake allows two configuration scenarios for scheduling automatic refreshes: +- **Time-based** — Provide a value of the form ` { seconds | minutes | hours | days }`. For example, if the dynamic table needs to be updated every 30 minutes, use `target_lag='30 minutes'`. +- **Downstream** — Applicable when the dynamic table is referenced by other dynamic tables. In this scenario, `target_lag='downstream'` allows for refreshes to be controlled at the target, instead of at each layer. + +Learn more about `target_lag` in Snowflake's [docs](https://docs.snowflake.com/en/user-guide/dynamic-tables-refresh#understanding-target-lag). + + + +### Refresh mode + +Snowflake allows three options for refresh mode: +- **AUTO** — Enforces an incremental refresh of the dynamic table by default. If the `CREATE DYNAMIC TABLE` statement does not support the incremental refresh mode, the dynamic table is automatically created with the full refresh mode. +- **FULL** — Enforces a full refresh of the dynamic table, even if the dynamic table can be incrementally refreshed. +- **INCREMENTAL** — Enforces an incremental refresh of the dynamic table. If the query that underlies the dynamic table can’t perform an incremental refresh, dynamic table creation fails and displays an error message. + +Learn more about `refresh_mode` in [Snowflake's docs](https://docs.snowflake.com/en/user-guide/dynamic-tables-refresh). + +### Initialize + +Snowflake allows two options for initialize: +- **ON_CREATE** — Refreshes the dynamic table synchronously at creation. If this refresh fails, dynamic table creation fails and displays an error message. +- **ON_SCHEDULE** — Refreshes the dynamic table at the next scheduled refresh. + +Learn more about `initialize` in [Snowflake's docs](https://docs.snowflake.com/en/user-guide/dynamic-tables-refresh). + + + +### Limitations + +As with materialized views on most data platforms, there are limitations associated with dynamic tables. Some worth noting include: + +- Dynamic table SQL has a [limited feature set](https://docs.snowflake.com/en/user-guide/dynamic-tables-tasks-create#query-constructs-not-currently-supported-in-dynamic-tables). +- Dynamic table SQL cannot be updated; the dynamic table must go through a `--full-refresh` (DROP/CREATE). +- Dynamic tables cannot be downstream from: materialized views, external tables, streams. +- Dynamic tables cannot reference a view that is downstream from another dynamic table. + +Find more information about dynamic table limitations in Snowflake's [docs](https://docs.snowflake.com/en/user-guide/dynamic-tables-tasks-create#dynamic-table-limitations-and-supported-functions). + +For dbt limitations, these dbt features are not supported: +- [Model contracts](/docs/collaborate/govern/model-contracts) +- [Copy grants configuration](/reference/resource-configs/snowflake-configs#copying-grants) + + + +#### Changing materialization to and from "dynamic_table" + +Version `1.6.x` does not support altering the materialization from a non-dynamic table be a dynamic table and vice versa. +Re-running with the `--full-refresh` does not resolve this either. +The workaround is manually dropping the existing model in the warehouse prior to calling `dbt run`. +This only needs to be done once for the conversion. + +For example, assume for the example model below, `my_model`, has already been materialized to the underlying data platform via `dbt run`. +If the model config is updated to `materialized="dynamic_table"`, dbt will return an error. +The workaround is to execute `DROP TABLE my_model` on the data warehouse before trying the model again. + + + +```yaml + +{{ config( + materialized="table" # or any model type (e.g. view, incremental) +) }} + +``` + + + + + +## Temporary tables + +Incremental table merges for Snowflake prefer to utilize a `view` rather than a `temporary table`. The reasoning is to avoid the database write step that a temporary table would initiate and save compile time. + +However, some situations remain where a temporary table would achieve results faster or more safely. The `tmp_relation_type` configuration enables you to opt in to temporary tables for incremental builds. This is defined as part of the model configuration. + +To guarantee accuracy, an incremental model using the `delete+insert` strategy with a `unique_key` defined requires a temporary table; trying to change this to a view will result in an error. + +Defined in the project YAML: + + + +```yaml +name: my_project + +... + +models: + : + +tmp_relation_type: table | view ## If not defined, view is the default. + +``` + + + +In the configuration format for the model SQL file: + + + +```yaml + +{{ config( + tmp_relation_type="table | view", ## If not defined, view is the default. +) }} + +``` + + + + ## Transient tables Snowflake supports the creation of [transient tables](https://docs.snowflake.net/manuals/user-guide/tables-temp-transient.html). Snowflake does not preserve a history for these tables, which can result in a measurable reduction of your Snowflake storage costs. Transient tables participate in time travel to a limited degree with a retention period of 1 day by default with no fail-safe period. Weigh these tradeoffs when deciding whether or not to configure your dbt models as `transient`. **By default, all Snowflake tables created by dbt are `transient`.** @@ -299,46 +690,7 @@ models: -## Temporary tables - -Incremental table merges for Snowflake prefer to utilize a `view` rather than a `temporary table`. The reasoning is to avoid the database write step that a temporary table would initiate and save compile time. -However, some situations remain where a temporary table would achieve results faster or more safely. The `tmp_relation_type` configuration enables you to opt in to temporary tables for incremental builds. This is defined as part of the model configuration. - -To guarantee accuracy, an incremental model using the `delete+insert` strategy with a `unique_key` defined requires a temporary table; trying to change this to a view will result in an error. - -Defined in the project YAML: - - - -```yaml -name: my_project - -... - -models: - : - +tmp_relation_type: table | view ## If not defined, view is the default. - -``` - - - -In the configuration format for the model SQL file: - - - -```yaml - -{{ config( - tmp_relation_type="table | view", ## If not defined, view is the default. -) }} - -``` - - - -## Dynamic tables The Snowflake adapter supports [dynamic tables](https://docs.snowflake.com/en/user-guide/dynamic-tables-about). This materialization is specific to Snowflake, which means that any model configuration that