From 869a56d0977712da238ac240e014f883a678f7b4 Mon Sep 17 00:00:00 2001
From: Ben Cassell <98852248+benc-db@users.noreply.github.com>
Date: Thu, 30 Nov 2023 12:58:50 -0800
Subject: [PATCH 1/7] Update databricks-configs.md to cover compute per model
---
.../resource-configs/databricks-configs.md | 144 ++++++++++++++++++
1 file changed, 144 insertions(+)
diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md
index a3b00177967..8b09eb5326c 100644
--- a/website/docs/reference/resource-configs/databricks-configs.md
+++ b/website/docs/reference/resource-configs/databricks-configs.md
@@ -361,6 +361,150 @@ insert into analytics.replace_where_incremental
+
+
+## Selecting compute per model
+
+Beginning in version 1.7.2, you can assign which compute to use on a per-model basis.
+To take advantage of this capability, you will need to add compute blocks to your profile:
+
+
+
+```yaml
+
+:
+ target: # this is the default target
+ outputs:
+ :
+ type: databricks
+ catalog: [optional catalog name if you are using Unity Catalog]
+ schema: [schema name] # Required
+ host: [yourorg.databrickshost.com] # Required
+
+ ### This path is used as the default compute
+ http_path: [/sql/your/http/path] # Required
+
+ ### New compute section
+ compute:
+
+ ### Name that you will use to refer to an alternate compute
+ AltCompute:
+ http_path: [‘/sql/your/http/path’] # Required of each alternate compute
+
+ ### A third named compute, use whatever name you like
+ Compute2:
+ http_path: [‘/some/other/path’] # Required of each alternate compute
+ ...
+
+ : # additional targets
+ ...
+ ### For each target, you need to define the same compute,
+ ### but you can specify different paths
+ compute:
+
+ ### Name that you will use to refer to an alternate compute
+ Compute1:
+ http_path: [‘/sql/your/http/path’] # Required of each alternate compute
+
+ ### A third named compute, use whatever name you like
+ Compute2:
+ http_path: [‘/some/other/path’] # Required of each alternate compute
+ ...
+
+```
+
+
+
+The new compute section is a map of user chosen names to objects with an http_path property.
+Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models.
+
+:::note
+
+You need to use the same set of names for compute across your outputs, though you may supply different http_paths, allowing you to use different computes in different deployment scenarios.
+
+:::
+
+### Specifying the compute for models
+
+As with many other configuaration options, you can specify the compute for a model in multiple ways, using `databricks_compute`.
+In your `dbt_project.yml`, the selected compute can be specified for all the models in a given directory:
+
+
+
+```yaml
+
+...
+
+models:
+ +databricks_compute: "Compute1" # use the `Compute1` warehouse/cluster for all models in the project...
+ my_project:
+ clickstream:
+ +databricks_compute: "Compute2" # ...except for the models in the `clickstream` folder, which will use `Compute2`.
+
+snapshots:
+ +databricks_compute: "Compute1" # all Snapshot models are configured to use `Compute1`.
+
+```
+
+
+
+For an individual model the compute can be specified in the model config in your schema file.
+
+
+
+```yaml
+
+models:
+ - name: table_model
+ config:
+ databricks_compute: Compute1
+ columns:
+ - name: id
+ data_type: int
+
+```
+
+
+
+
+Alternatively the warehouse can be specified in the config block of a model's SQL file.
+
+
+
+```sql
+
+{{
+ config(
+ materialized='table',
+ databricks_compute='Compute1'
+ )
+}}
+select * from {{ ref('seed') }}
+
+```
+
+
+
+:::note
+
+In the absence of a specified compute, we will default to the compute specified by http_path in the top level of the output section in your profile.
+This is also the compute that will be used for tasks not associated with a particular model, such as gathering metadata for all tables in a schema.
+
+:::
+
+To validate that the specified compute is being used, look for lines in your dbt.log like:
+
+```
+Databricks adapter ... using default compute resource.
+```
+
+or
+
+```
+Databricks adapter ... using compute resource .
+```
+
+
## Persisting model descriptions
From 51ffdc274a6308e044e07cb0ae741ac83482c993 Mon Sep 17 00:00:00 2001
From: Ben Cassell <98852248+benc-db@users.noreply.github.com>
Date: Thu, 30 Nov 2023 13:30:49 -0800
Subject: [PATCH 2/7] Update
website/docs/reference/resource-configs/databricks-configs.md
Co-authored-by: Amy Chen <46451573+amychen1776@users.noreply.github.com>
---
website/docs/reference/resource-configs/databricks-configs.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md
index 8b09eb5326c..e4027edaf2e 100644
--- a/website/docs/reference/resource-configs/databricks-configs.md
+++ b/website/docs/reference/resource-configs/databricks-configs.md
@@ -388,7 +388,7 @@ To take advantage of this capability, you will need to add compute blocks to you
compute:
### Name that you will use to refer to an alternate compute
- AltCompute:
+ Compute1:
http_path: [‘/sql/your/http/path’] # Required of each alternate compute
### A third named compute, use whatever name you like
From 7389ad9fd15212192a79c6a92cc9e7c325fb6825 Mon Sep 17 00:00:00 2001
From: Ben Cassell <98852248+benc-db@users.noreply.github.com>
Date: Thu, 30 Nov 2023 13:31:49 -0800
Subject: [PATCH 3/7] Update
website/docs/reference/resource-configs/databricks-configs.md
Co-authored-by: Amy Chen <46451573+amychen1776@users.noreply.github.com>
---
.../docs/reference/resource-configs/databricks-configs.md | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md
index e4027edaf2e..bb226045788 100644
--- a/website/docs/reference/resource-configs/databricks-configs.md
+++ b/website/docs/reference/resource-configs/databricks-configs.md
@@ -423,7 +423,14 @@ Each compute is keyed by a name which is used in the model definition/configurat
You need to use the same set of names for compute across your outputs, though you may supply different http_paths, allowing you to use different computes in different deployment scenarios.
:::
+To configure this inside of dbt Cloud, use the [extended attributes feature](/docs/dbt-cloud-environments#extended-attributes-) on the desired environments. You can input them as such.
+```yaml
+compute:
+ Compute1:
+ http_path:[`/some/other/path']
+ Compute2:
+ http_path:[`/some/other/path']
### Specifying the compute for models
As with many other configuaration options, you can specify the compute for a model in multiple ways, using `databricks_compute`.
From 2434bda1646fcf63bf81d7bb61d8bdcee67323a9 Mon Sep 17 00:00:00 2001
From: Ben Cassell <98852248+benc-db@users.noreply.github.com>
Date: Thu, 30 Nov 2023 13:32:03 -0800
Subject: [PATCH 4/7] Update
website/docs/reference/resource-configs/databricks-configs.md
Co-authored-by: Amy Chen <46451573+amychen1776@users.noreply.github.com>
---
website/docs/reference/resource-configs/databricks-configs.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md
index bb226045788..800ba7430f0 100644
--- a/website/docs/reference/resource-configs/databricks-configs.md
+++ b/website/docs/reference/resource-configs/databricks-configs.md
@@ -416,7 +416,7 @@ To take advantage of this capability, you will need to add compute blocks to you
The new compute section is a map of user chosen names to objects with an http_path property.
-Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models.
+Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models. We recommend choosing a name that is easily recognized as to what compute resources you're using, such as what the compute resource is named inside of the Databricks UI.
:::note
From 99a0345d161a6a956669cc1f60f20370937179a8 Mon Sep 17 00:00:00 2001
From: Ben Cassell <98852248+benc-db@users.noreply.github.com>
Date: Thu, 30 Nov 2023 16:29:52 -0800
Subject: [PATCH 5/7] Update databricks-configs.md - Adding python discussion.
---
.../resource-configs/databricks-configs.md | 25 ++++++++++++++++++-
1 file changed, 24 insertions(+), 1 deletion(-)
diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md
index 800ba7430f0..f218cec3172 100644
--- a/website/docs/reference/resource-configs/databricks-configs.md
+++ b/website/docs/reference/resource-configs/databricks-configs.md
@@ -363,7 +363,7 @@ insert into analytics.replace_where_incremental
-## Selecting compute per model
+### Selecting compute per model
Beginning in version 1.7.2, you can assign which compute to use on a per-model basis.
To take advantage of this capability, you will need to add compute blocks to your profile:
@@ -511,6 +511,29 @@ or
Databricks adapter ... using compute resource .
```
+### Selecting compute for a Python model
+
+Materializing a python model requires execution of SQL as well as python.
+Specifically, if your python model is incremental, the current execution pattern involves executing python to create a staging table that is then merged into your target table using SQL.
+The python code needs to run on an all purpose cluster, while the SQL code can run on an all purpose cluster or a SQL Warehouse.
+When you specify your `databricks_compute` for a python model, you are currently only specifying which compute to use when running the model-specific SQL.
+If you wish to use a different compute for executing the python itself, you must specify an alternate `http_path` in the config for the model:
+
+
+
+ ```python
+
+def model(dbt, session):
+ dbt.config(
+ http_path="sql/protocolv1/..."
+ )
+
+```
+
+
+
+If your default compute is a SQL Warehouse, you will need to specify an all purpose cluster `http_path` in this way.
+
## Persisting model descriptions
From c129674e9d9f0eca1421bc7f35228c5e5911f7dc Mon Sep 17 00:00:00 2001
From: Ben Cassell <98852248+benc-db@users.noreply.github.com>
Date: Thu, 30 Nov 2023 16:44:04 -0800
Subject: [PATCH 6/7] Update databricks-configs.md - clean up
---
.../resource-configs/databricks-configs.md | 16 ++++++++++++----
1 file changed, 12 insertions(+), 4 deletions(-)
diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md
index f218cec3172..49aa4dd3a84 100644
--- a/website/docs/reference/resource-configs/databricks-configs.md
+++ b/website/docs/reference/resource-configs/databricks-configs.md
@@ -363,9 +363,11 @@ insert into analytics.replace_where_incremental
-### Selecting compute per model
+## Selecting compute per model
-Beginning in version 1.7.2, you can assign which compute to use on a per-model basis.
+Beginning in version 1.7.2, you can assign which compute resource to use on a per-model basis.
+For SQL models, you can select a SQL Warehouse (serverless or provisioned) or an all purpose cluster.
+For details on how this feature interacts with python models, see [Specifying compute for Python models](#specifying-compute-for-python-models).
To take advantage of this capability, you will need to add compute blocks to your profile:
@@ -423,14 +425,20 @@ Each compute is keyed by a name which is used in the model definition/configurat
You need to use the same set of names for compute across your outputs, though you may supply different http_paths, allowing you to use different computes in different deployment scenarios.
:::
-To configure this inside of dbt Cloud, use the [extended attributes feature](/docs/dbt-cloud-environments#extended-attributes-) on the desired environments. You can input them as such.
+
+To configure this inside of dbt Cloud, use the [extended attributes feature](/docs/dbt-cloud-environments#extended-attributes-) on the desired environments.
+You can input like so:
```yaml
+
compute:
Compute1:
http_path:[`/some/other/path']
Compute2:
http_path:[`/some/other/path']
+
+```
+
### Specifying the compute for models
As with many other configuaration options, you can specify the compute for a model in multiple ways, using `databricks_compute`.
@@ -511,7 +519,7 @@ or
Databricks adapter ... using compute resource .
```
-### Selecting compute for a Python model
+### Specifying compute for Python models
Materializing a python model requires execution of SQL as well as python.
Specifically, if your python model is incremental, the current execution pattern involves executing python to create a staging table that is then merged into your target table using SQL.
From 1d6bf4997b88f7e99ef1c6fb10cd125a3370545f Mon Sep 17 00:00:00 2001
From: Ben Cassell <98852248+benc-db@users.noreply.github.com>
Date: Thu, 30 Nov 2023 16:48:30 -0800
Subject: [PATCH 7/7] Update databricks-configs.md - minor touch ups
---
.../docs/reference/resource-configs/databricks-configs.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md
index 49aa4dd3a84..8426846997c 100644
--- a/website/docs/reference/resource-configs/databricks-configs.md
+++ b/website/docs/reference/resource-configs/databricks-configs.md
@@ -418,7 +418,8 @@ To take advantage of this capability, you will need to add compute blocks to you
The new compute section is a map of user chosen names to objects with an http_path property.
-Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models. We recommend choosing a name that is easily recognized as to what compute resources you're using, such as what the compute resource is named inside of the Databricks UI.
+Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models.
+We recommend choosing a name that is easily recognized as the compute resources you're using, such as the name of the compute resource inside the Databricks UI.
:::note
@@ -426,8 +427,7 @@ You need to use the same set of names for compute across your outputs, though yo
:::
-To configure this inside of dbt Cloud, use the [extended attributes feature](/docs/dbt-cloud-environments#extended-attributes-) on the desired environments.
-You can input like so:
+To configure this inside of dbt Cloud, use the [extended attributes feature](/docs/dbt-cloud-environments#extended-attributes-) on the desired environments:
```yaml