Skip to content

Commit

Permalink
Merge branch 'current' into add-env-img
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Oct 28, 2024
2 parents 611f9ad + 757b242 commit 78e9d7f
Show file tree
Hide file tree
Showing 9 changed files with 147 additions and 15 deletions.
2 changes: 1 addition & 1 deletion website/docs/docs/build/data-tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ having total_amount < 0

The name of this test is the name of the file: `assert_total_payment_amount_is_positive`.

To add a data test to your project, add a `.yml` file to your `tests` directory, for example, `tests/schema.yml` with the following content:
To add a description to a singular test in your project, add a `.yml` file to your `tests` directory, for example, `tests/schema.yml` with the following content:

<File name='tests/schema.yml'>

Expand Down
3 changes: 2 additions & 1 deletion website/docs/docs/build/packages.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,10 @@ In dbt, libraries like these are called _packages_. dbt's packages are so powerf
* Models to understand [Redshift](https://hub.getdbt.com/dbt-labs/redshift/latest/) privileges.
* Macros to work with data loaded by [Stitch](https://hub.getdbt.com/dbt-labs/stitch_utils/latest/).

dbt _packages_ are in fact standalone dbt projects, with models and macros that tackle a specific problem area. As a dbt user, by adding a package to your project, the package's models and macros will become part of your own project. This means:
dbt _packages_ are in fact standalone dbt projects, with models, macros, and other resources that tackle a specific problem area. As a dbt user, by adding a package to your project, all of the package's resources will become part of your own project. This means:
* Models in the package will be materialized when you `dbt run`.
* You can use `ref` in your own models to refer to models from the package.
* You can use `source` to refer to sources in the package.
* You can use macros in the package in your own project.
* It's important to note that defining and installing dbt packages is different from [defining and installing Python packages](/docs/build/python-models#using-pypi-packages)

Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/cloud/manage-access/audit-log.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ On the audit log page, you will see a list of various events and their associate

### Event details

Click the event card to see the details about the activity that triggered the event. This view provides important details, including when it happened and what type of event was triggered. For example, if someone changes the settings for a job, you can use the event details to see which job was changed (type of event: `job_definition.Changed`), by whom (person who triggered the event: `actor`), and when (time it was triggered: `created_at_utc`). For types of events and their descriptions, see [Events in audit log](#events-in-audit-log).
Click the event card to see the details about the activity that triggered the event. This view provides important details, including when it happened and what type of event was triggered. For example, if someone changes the settings for a job, you can use the event details to see which job was changed (type of event: `job_definition.Changed`), by whom (person who triggered the event: `actor`), and when (time it was triggered: `created_at_utc`). For types of events and their descriptions, see [Events in audit log](#audit-log-events).

The event details provide the key factors of an event:

Expand Down
6 changes: 6 additions & 0 deletions website/docs/docs/get-started-dbt.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,12 @@ Learn more about [dbt Cloud features](/docs/cloud/about-cloud/dbt-cloud-feature
link="https://docs.getdbt.com/guides/starburst-galaxy"
icon="starburst"/>

<Card
title="Quickstart for dbt Cloud and Teradata"
body="Discover and use dbt Cloud with Teradata to enhance your data transformation workflows."
link="https://docs.getdbt.com/guides/teradata"
icon="teradata"/>

</div>

## dbt Core
Expand Down
9 changes: 8 additions & 1 deletion website/docs/reference/global-configs/behavior-changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ id: "behavior-changes"
sidebar: "Behavior changes"
---

import StateModified from '/snippets/_state-modified-compare.md';

Most flags exist to configure runtime behaviors with multiple valid choices. The right choice may vary based on the environment, user preference, or the specific invocation.

Another category of flags provides existing projects with a migration window for runtime behaviors that are changing in newer releases of dbt. These flags help us achieve a balance between these goals, which can otherwise be in tension, by:
Expand Down Expand Up @@ -83,13 +85,18 @@ Set the `skip_nodes_if_on_run_start_fails` flag to `True` to skip all selected r

### Source definitions for state:modified

:::info

<StateModified features={'/snippets/_state-modified-compare.md'}/>

:::

The flag is `False` by default.

Set `state_modified_compare_more_unrendered_values` to `True` to reduce false positives during `state:modified` checks (especially when configs differ by target environment like `prod` vs. `dev`).

Setting the flag to `True` changes the `state:modified` comparison from using rendered values to unrendered values instead. It accomplishes this by persisting `unrendered_config` during model parsing and `unrendered_database` and `unrendered_schema` configs during source parsing.


### Package override for built-in materialization

Setting the `require_explicit_package_overrides_for_builtin_materializations` flag to `True` prevents this automatic override.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@
title: "Caveats to state comparison"
---

import StateModified from '/snippets/_state-modified-compare.md';

The [`state:` selection method](/reference/node-selection/methods#the-state-method) is a powerful feature, with a lot of underlying complexity. Below are a handful of considerations when setting up automated jobs that leverage state comparison.

### Seeds
Expand Down Expand Up @@ -48,6 +50,8 @@ dbt test -s "state:modified" --exclude "test_name:relationships"

To reduce false positives during `state:modified` selection due to env-aware logic, you can set the `state_modified_compare_more_unrendered_values` [behavior flag](/reference/global-configs/behavior-changes#behavior-change-flags) to `True`.

<StateModified features={'/snippets/_state-modified-compare.md'}/>

</VersionBlock>

<VersionBlock lastVersion="1.8">
Expand Down
109 changes: 108 additions & 1 deletion website/docs/reference/resource-configs/databricks-configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,6 +65,107 @@ We do not yet have a PySpark API to set tblproperties at table creation, so this

</VersionBlock>

<VersionBlock firstVersion="1.9">

### Python submission methods

In dbt v1.9 and higher, or in [Versionless](/docs/dbt-versions/versionless-cloud) dbt Cloud, you can use these four options for `submission_method`:

* `all_purpose_cluster`: Executes the python model either directly using the [command api](https://docs.databricks.com/api/workspace/commandexecution) or by uploading a notebook and creating a one-off job run
* `job_cluster`: Creates a new job cluster to execute an uploaded notebook as a one-off job run
* `serverless_cluster`: Uses a [serverless cluster](https://docs.databricks.com/en/jobs/run-serverless-jobs.html) to execute an uploaded notebook as a one-off job run
* `workflow_job`: Creates/updates a reusable workflow and uploaded notebook, for execution on all-purpose, job, or serverless clusters.
:::caution
This approach gives you maximum flexibility, but will create persistent artifacts in Databricks (the workflow) that users could run outside of dbt.
:::

We are currently in a transitionary period where there is a disconnect between old submission methods (which were grouped by compute), and the logically distinct submission methods (command, job run, workflow).

As such, the supported config matrix is somewhat complicated:

| Config | Use | Default | `all_purpose_cluster`* | `job_cluster` | `serverless_cluster` | `workflow_job` |
| --------------------- | -------------------------------------------------------------------- | ------------------ | ---------------------- | ------------- | -------------------- | -------------- |
| `create_notebook` | if false, use Command API, otherwise upload notebook and use job run | `false` |||||
| `timeout` | maximum time to wait for command/job to run | `0` (No timeout) |||||
| `job_cluster_config` | configures a [new cluster](https://docs.databricks.com/api/workspace/jobs/submit#tasks-new_cluster) for running the model | `{}` |||||
| `access_control_list` | directly configures [access control](https://docs.databricks.com/api/workspace/jobs/submit#access_control_list) for the job | `{}` |||||
| `packages` | list of packages to install on the executing cluster | `[]` |||||
| `index_url` | url to install `packages` from | `None` (uses pypi) |||||
| `additional_libs` | directly configures [libraries](https://docs.databricks.com/api/workspace/jobs/submit#tasks-libraries) | `[]` |||||
| `python_job_config` | additional configuration for jobs/workflows (see table below) | `{}` |||||
| `cluster_id` | id of existing all purpose cluster to execute against | `None` |||||
| `http_path` | path to existing all purpose cluster to execute against | `None` |||||

\* Only `timeout` and `cluster_id`/`http_path` are supported when `create_notebook` is false

With the introduction of the `workflow_job` submission method, we chose to segregate further configuration of the python model submission under a top level configuration named `python_job_config`. This keeps configuration options for jobs and workflows namespaced in such a way that they do not interfere with other model config, allowing us to be much more flexible with what is supported for job execution.

The support matrix for this feature is divided into `workflow_job` and all others (assuming `all_purpose_cluster` with `create_notebook`==true).
Each config option listed must be nested under `python_job_config`:

| Config | Use | Default | `workflow_job` | All others |
| -------------------------- | ----------------------------------------------------------------------------------------------------------------------- | ------- | -------------- | ---------- |
| `name` | The name to give (or used to look up) the created workflow | `None` |||
| `grants` | A simplified way to specify access control for the workflow | `{}` |||
| `existing_job_id` | Id to use to look up the created workflow (in place of `name`) | `None` |||
| `post_hook_tasks` | [Tasks](https://docs.databricks.com/api/workspace/jobs/create#tasks) to include after the model notebook execution | `[]` |||
| `additional_task_settings` | Additional [task config](https://docs.databricks.com/api/workspace/jobs/create#tasks) to include in the model task | `{}` |||
| [Other job run settings](https://docs.databricks.com/api/workspace/jobs/submit) | Config will be copied into the request, outside of the model task | `None` |||
| [Other workflow settings](https://docs.databricks.com/api/workspace/jobs/create) | Config will be copied into the request, outside of the model task | `None` |||

This example uses the new configuration options in the previous table:

<File name='schema.yml'>

```yaml
models:
- name: my_model
config:
submission_method: workflow_job

# Define a job cluster to create for running this workflow
# Alternately, could specify cluster_id to use an existing cluster, or provide neither to use a serverless cluster
job_cluster_config:
spark_version: "15.3.x-scala2.12"
node_type_id: "rd-fleet.2xlarge"
runtime_engine: "{{ var('job_cluster_defaults.runtime_engine') }}"
data_security_mode: "{{ var('job_cluster_defaults.data_security_mode') }}"
autoscale: { "min_workers": 1, "max_workers": 4 }

python_job_config:
# These settings are passed in, as is, to the request
email_notifications: { on_failure: ["[email protected]"] }
max_retries: 2

name: my_workflow_name

# Override settings for your model's dbt task. For instance, you can
# change the task key
additional_task_settings: { "task_key": "my_dbt_task" }

# Define tasks to run before/after the model
# This example assumes you have already uploaded a notebook to /my_notebook_path to perform optimize and vacuum
post_hook_tasks:
[
{
"depends_on": [{ "task_key": "my_dbt_task" }],
"task_key": "OPTIMIZE_AND_VACUUM",
"notebook_task":
{ "notebook_path": "/my_notebook_path", "source": "WORKSPACE" },
},
]

# Simplified structure, rather than having to specify permission separately for each user
grants:
view: [{ "group_name": "marketing-team" }]
run: [{ "user_name": "[email protected]" }]
manage: []
```
</File>
</VersionBlock>
## Incremental models
dbt-databricks plugin leans heavily on the [`incremental_strategy` config](/docs/build/incremental-strategy). This config tells the incremental materialization how to build models in runs beyond their first. It can be set to one of four values:
Expand Down Expand Up @@ -556,9 +657,15 @@ Databricks adapter ... using compute resource <name of compute>.

Materializing a python model requires execution of SQL as well as python.
Specifically, if your python model is incremental, the current execution pattern involves executing python to create a staging table that is then merged into your target table using SQL.
<VersionBlock lastVersion="1.8">
The python code needs to run on an all purpose cluster, while the SQL code can run on an all purpose cluster or a SQL Warehouse.
</VersionBlock>
<VersionBlock firstVersion="1.9">
The python code needs to run on an all purpose cluster (or serverless cluster, see [Python Submission Methods](#python-submission-methods)), while the SQL code can run on an all purpose cluster or a SQL Warehouse.
</VersionBlock>
When you specify your `databricks_compute` for a python model, you are currently only specifying which compute to use when running the model-specific SQL.
If you wish to use a different compute for executing the python itself, you must specify an alternate `http_path` in the config for the model. Please note that declaring a separate SQL compute and a python compute for your python dbt models is optional. If you wish to do this:
If you wish to use a different compute for executing the python itself, you must specify an alternate compute in the config for the model.
For example:

<File name="model.py">

Expand Down
3 changes: 3 additions & 0 deletions website/snippets/_state-modified-compare.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
You need to build the state directory using dbt v1.9 or higher, or [Versionless](/docs/dbt-versions/versionless-cloud) dbt Cloud, and you need to set `state_modified_compare_more_unrendered_values` to `true` within your dbt_project.yml.

If the state directory was built with an older dbt version or if the `state_modified_compare_more_unrendered_values` behavior change flag was either not set or set to `false`, you need to rebuild the state directory to avoid false positives during state comparison with `state:modified`.
24 changes: 14 additions & 10 deletions website/snippets/core-versions-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,19 @@

| dbt Core | Initial release | Support level and end date |
|:-------------------------------------------------------------:|:---------------:|:-------------------------------------:|
| [**v1.8**](/docs/dbt-versions/core-upgrade/upgrading-to-v1.8) | May 9 2024 | <b>Active &mdash; May 8, 2025</b> |
| [**v1.7**](/docs/dbt-versions/core-upgrade/upgrading-to-v1.7) | Nov 2, 2023 | Critical &mdash; Nov 1, 2024 |
| [**v1.6**](/docs/dbt-versions/core-upgrade/upgrading-to-v1.6) | Jul 31, 2023 | End of Life* ⚠️ |
| [**v1.5**](/docs/dbt-versions/core-upgrade/upgrading-to-v1.5) | Apr 27, 2023 | End of Life* ⚠️ |
| [**v1.4**](/docs/dbt-versions/core-upgrade/Older%20versions/upgrading-to-v1.4) | Jan 25, 2023 | End of Life* ⚠️ |
| [**v1.3**](/docs/dbt-versions/core-upgrade/Older%20versions/upgrading-to-v1.3) | Oct 12, 2022 | End of Life* ⚠️ |
| [**v1.2**](/docs/dbt-versions/core-upgrade/Older%20versions/upgrading-to-v1.2) | Jul 26, 2022 | End of Life* ⚠️ |
| [**v1.1**](/docs/dbt-versions/core-upgrade/Older%20versions/upgrading-to-v1.1) | Apr 28, 2022 | End of Life* ⚠️ |
| [**v1.0**](/docs/dbt-versions/core-upgrade/Older%20versions/upgrading-to-v1.0) | Dec 3, 2021 | End of Life* ⚠️ |
| [**v1.8**](/docs/dbt-versions/core-upgrade/upgrading-to-v1.8) | May 9 2024 | <b>Active Support &mdash; May 8, 2025</b> |
| [**v1.7**](/docs/dbt-versions/core-upgrade/upgrading-to-v1.7) | Nov 2, 2023 | <div align="left">**dbt Core and dbt Cloud Developer & Team customers:** Critical Support until Nov 1, 2024 <br /> **dbt Cloud Enterprise customers:** Critical Support until further notice <sup>1</sup></div> |
| [**v1.6**](/docs/dbt-versions/core-upgrade/upgrading-to-v1.6) | Jul 31, 2023 | End of Life ⚠️ |
| [**v1.5**](/docs/dbt-versions/core-upgrade/upgrading-to-v1.5) | Apr 27, 2023 | End of Life ⚠️ |
| [**v1.4**](/docs/dbt-versions/core-upgrade/Older%20versions/upgrading-to-v1.4) | Jan 25, 2023 | End of Life ⚠️ |
| [**v1.3**](/docs/dbt-versions/core-upgrade/Older%20versions/upgrading-to-v1.3) | Oct 12, 2022 | End of Life ⚠️ |
| [**v1.2**](/docs/dbt-versions/core-upgrade/Older%20versions/upgrading-to-v1.2) | Jul 26, 2022 | End of Life ⚠️ |
| [**v1.1**](/docs/dbt-versions/core-upgrade/Older%20versions/upgrading-to-v1.1) | Apr 28, 2022 | End of Life ⚠️ |
| [**v1.0**](/docs/dbt-versions/core-upgrade/Older%20versions/upgrading-to-v1.0) | Dec 3, 2021 | End of Life ⚠️ |
| **v0.X** ⛔️ | (Various dates) | Deprecated ⛔️ | Deprecated ⛔️ |

_*All versions of dbt Core since v1.0 are available in dbt Cloud until further notice. Versions that are EOL do not receive any fixes. For the best support, we recommend upgrading to a version released within the past 12 months._
All functionality in dbt Core since the v1.7 release is available in dbt Cloud, early and continuously, by selecting ["Versionless"](https://docs.getdbt.com/docs/dbt-versions/versionless-cloud).

<sup>1</sup> Starting in November 2024, "Versionless" will be required for the Developer and Teams plans on dbt Cloud. After that point, accounts on older versions will be migrated to "Versionless."

For customers of dbt Cloud Enterprise, dbt v1.7 will continue to be available as an option while dbt Labs rolls out a mechanism for "extended" upgrades. In the meantime, dbt Labs strongly recommends migrating any environments that are still running on older unsupported versions to "Versionless" dbt or dbt v1.7.

0 comments on commit 78e9d7f

Please sign in to comment.