Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add materialized view configuration for BigQuery #4578

Merged
merged 12 commits into from
Dec 13, 2023
186 changes: 186 additions & 0 deletions website/docs/reference/resource-configs/bigquery-configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -718,3 +718,189 @@ Views with this configuration will be able to select from objects in `project_1.

The `grant_access_to` config is not thread-safe when multiple views need to be authorized for the same dataset. The initial `dbt run` operation after a new `grant_access_to` config is added should therefore be executed in a single thread. Subsequent runs using the same configuration will not attempt to re-apply existing access grants, and can make use of multiple threads.

<VersionBlock firstVersion="1.7">

## Materialized views

The BigQuery adapter supports [materialized views](https://cloud.google.com/bigquery/docs/materialized-views-intro)
with the following configuration parameters:

| Parameter | Type | Required | Default | Change Monitoring Support |
|--------------------------------------------------------|------------------------|----------|---------|---------------------------|
| `on_configuration_change` | `<string>` | no | `apply` | n/a |
| [`cluster_by`](#clustering-clause) | `[<string>]` | no | `none` | drop/create |
| [`partition_by`](#partition-clause) | `{<dictionary>}` | no | `none` | drop/create |
| [`enable_refresh`](#auto-refresh) | `<boolean>` | no | `true` | alter |
| [`refresh_interval_minutes`](#auto-refresh) | `<float>` | no | `30` | alter |
| [`max_staleness`](#auto-refresh) (in Preview) | `<interval>` | no | `none` | alter |
| `description` | `<string>` | no | `none` | alter |
| [`labels`](#specifying-labels) | `{<string>: <string>}` | no | `none` | alter |
| [`hours_to_expiration`](#controlling-table-expiration) | `<integer>` | no | `none` | alter |
| [`kms_key_name`](#using-kms-encryption) | `<string>` | no | `none` | alter |

<Tabs
groupId="config-languages"
defaultValue="project-yaml"
values={[
{ label: 'Project file', value: 'project-yaml', },
{ label: 'Property file', value: 'property-yaml', },
{ label: 'Config block', value: 'config', },
]
}>


<TabItem value="project-yaml">

<File name='dbt_project.yml'>
directly to the materialized view in place.
mikealfare marked this conversation as resolved.
Show resolved Hide resolved

```yaml
models:
[<resource-path>](/reference/resource-configs/resource-path):
[+](/reference/resource-configs/plus-prefix)[materialized](/reference/resource-configs/materialized): materialized_view
[+](/reference/resource-configs/plus-prefix)on_configuration_change: apply | continue | fail
[+](/reference/resource-configs/plus-prefix)[cluster_by](#clustering-clause): <field-name> | [<field-name>]
[+](/reference/resource-configs/plus-prefix)[partition_by](#partition-clause):
- field: <field-name>
- data_type: timestamp | date | datetime | int64
# only if `data_type` is not 'int64'
- granularity: hour | day | month | year
# only if `data_type` is 'int64'
- range:
- start: <integer>
- end: <integer>
- interval: <integer>
[+](/reference/resource-configs/plus-prefix)[enable_refresh](#auto-refresh): true | false
[+](/reference/resource-configs/plus-prefix)[refresh_interval_minutes](#auto-refresh): <float>
[+](/reference/resource-configs/plus-prefix)[max_staleness](#auto-refresh): <interval>
[+](/reference/resource-configs/plus-prefix)description: <string>
[+](/reference/resource-configs/plus-prefix)[labels](#specifying-labels): {<label-name>: <label-value>}
[+](/reference/resource-configs/plus-prefix)[hours_to_expiration](#acontrolling-table-expiration): <integer>
[+](/reference/resource-configs/plus-prefix)[kms_key_name](##using-kms-encryption): <path-to-key>
```

</File>

</TabItem>


<TabItem value="property-yaml">

<File name='models/properties.yml'>

```yaml
version: 2

models:
- name: [<model-name>]
config:
[materialized](/reference/resource-configs/materialized): materialized_view
on_configuration_change: apply | continue | fail
[cluster_by](#clustering-clause): <field-name> | [<field-name>]
[partition_by](#partition-clause):
- field: <field-name>
- data_type: timestamp | date | datetime | int64
# only if `data_type` is not 'int64'
- granularity: hour | day | month | year
# only if `data_type` is 'int64'
- range:
- start: <integer>
- end: <integer>
- interval: <integer>
[enable_refresh](#auto-refresh): true | false
[refresh_interval_minutes](#auto-refresh): <float>
[max_staleness](#auto-refresh): <interval>
description: <string>
[labels](#specifying-labels): {<label-name>: <label-value>}
[hours_to_expiration](#acontrolling-table-expiration): <integer>
[kms_key_name](##using-kms-encryption): <path-to-key>
```

</File>

</TabItem>


<TabItem value="config">

<File name='models/<model_name>.sql'>

```jinja
{{ config(
[materialized](/reference/resource-configs/materialized)='materialized_view',
on_configuration_change="apply" | "continue" | "fail",
[cluster_by](#clustering-clause)="<field-name>" | ["<field-name>"],
[partition_by](#partition-clause)={
"field": "<field-name>",
"data_type": "timestamp" | "date" | "datetime" | "int64",

# only if `data_type` is not 'int64'
"granularity": "hour" | "day" | "month" | "year,

# only if `data_type` is 'int64'
"range": {
"start": <integer>,
"end": <integer>,
"interval": <integer>,
}
},

# auto-refresh options
[enable_refresh](#auto-refresh)= true | false,
[refresh_interval_minutes](#auto-refresh)=<float>,
[max_staleness](#auto-refresh)="<interval>",

# additional options
description="<description>",
[labels](#specifying-labels)={
"<label-name>": "<label-value>",
},
[hours_to_expiration](#acontrolling-table-expiration)=<integer>,
[kms_key_name](##using-kms-encryption)="<path_to_key>",
) }}
```

</File>

</TabItem>

</Tabs>

Many of these parameters correspond to their table counterparts and have been linked above.
The set of parameters which are unique to materialized views covers auto-refresh functionality, which is covered [below](#auto-refresh).
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved

Find more information about these parameters in the BigQuery docs:
- [CREATE MATERIALIZED VIEW statement](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#create_materialized_view_statement)
- [materialized_view_option_list](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#materialized_view_option_list)

### Auto-refresh

| Parameter | Type | Required | Default | Change Monitoring Support |
|------------------------------|--------------|----------|---------|---------------------------|
| `enable_refresh` | `<boolean>` | no | `true` | alter |
| `refresh_interval_minutes` | `<float>` | no | `30` | alter |
| `max_staleness` (in Preview) | `<interval>` | no | `none` | alter |

BigQuery supports [automatic refresh](https://cloud.google.com/bigquery/docs/materialized-views-manage#automatic_refresh) configuration for materialized views.
By default, a materialized view will automatically refresh within 5 minutes of changes in the base table, but no more frequently than every 30 minutes.
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
BigQuery only officially supports configuration of the frequency (the "30 minutes" part);
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
however, there is a feature in preview that allows for configuration of the staleness (the "5 minutes" part).
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
dbt will monitor these parameters for changes and apply them using an `ALTER` statement.

Find more information about these parameters in the BigQuery docs:
- [materialized_view_option_list](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#materialized_view_option_list)
- [max_staleness](https://cloud.google.com/bigquery/docs/materialized-views-create#max_staleness)

### Limitations

As with most data platforms, there are limitations associated with materialized views. Some worth noting include:

- Materialized view SQL has a [limited feature set](https://cloud.google.com/bigquery/docs/materialized-views-create#supported-mvs)
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
- Materialized view SQL cannot be updated; the materialized view must go through a `--full-refresh` (DROP/CREATE)
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
- The `partition_by` clause on a materialized view must match that of the underlying base table
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
- While materialized views can have descriptions, materialized view *columns* cannot
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved
- Recreating/dropping the base table requires recreating/dropping the materialized view
matthewshaver marked this conversation as resolved.
Show resolved Hide resolved

Find more information about materialized view limitations in Google's BigQuery [docs](https://cloud.google.com/bigquery/docs/materialized-views-intro#limitations).

</VersionBlock>
Loading