From 869a56d0977712da238ac240e014f883a678f7b4 Mon Sep 17 00:00:00 2001 From: Ben Cassell <98852248+benc-db@users.noreply.github.com> Date: Thu, 30 Nov 2023 12:58:50 -0800 Subject: [PATCH 1/7] Update databricks-configs.md to cover compute per model --- .../resource-configs/databricks-configs.md | 144 ++++++++++++++++++ 1 file changed, 144 insertions(+) diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md index a3b00177967..8b09eb5326c 100644 --- a/website/docs/reference/resource-configs/databricks-configs.md +++ b/website/docs/reference/resource-configs/databricks-configs.md @@ -361,6 +361,150 @@ insert into analytics.replace_where_incremental + + +## Selecting compute per model + +Beginning in version 1.7.2, you can assign which compute to use on a per-model basis. +To take advantage of this capability, you will need to add compute blocks to your profile: + + + +```yaml + +: + target: # this is the default target + outputs: + : + type: databricks + catalog: [optional catalog name if you are using Unity Catalog] + schema: [schema name] # Required + host: [yourorg.databrickshost.com] # Required + + ### This path is used as the default compute + http_path: [/sql/your/http/path] # Required + + ### New compute section + compute: + + ### Name that you will use to refer to an alternate compute + AltCompute: + http_path: [‘/sql/your/http/path’] # Required of each alternate compute + + ### A third named compute, use whatever name you like + Compute2: + http_path: [‘/some/other/path’] # Required of each alternate compute + ... + + : # additional targets + ... + ### For each target, you need to define the same compute, + ### but you can specify different paths + compute: + + ### Name that you will use to refer to an alternate compute + Compute1: + http_path: [‘/sql/your/http/path’] # Required of each alternate compute + + ### A third named compute, use whatever name you like + Compute2: + http_path: [‘/some/other/path’] # Required of each alternate compute + ... + +``` + + + +The new compute section is a map of user chosen names to objects with an http_path property. +Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models. + +:::note + +You need to use the same set of names for compute across your outputs, though you may supply different http_paths, allowing you to use different computes in different deployment scenarios. + +::: + +### Specifying the compute for models + +As with many other configuaration options, you can specify the compute for a model in multiple ways, using `databricks_compute`. +In your `dbt_project.yml`, the selected compute can be specified for all the models in a given directory: + + + +```yaml + +... + +models: + +databricks_compute: "Compute1" # use the `Compute1` warehouse/cluster for all models in the project... + my_project: + clickstream: + +databricks_compute: "Compute2" # ...except for the models in the `clickstream` folder, which will use `Compute2`. + +snapshots: + +databricks_compute: "Compute1" # all Snapshot models are configured to use `Compute1`. + +``` + + + +For an individual model the compute can be specified in the model config in your schema file. + + + +```yaml + +models: + - name: table_model + config: + databricks_compute: Compute1 + columns: + - name: id + data_type: int + +``` + + + + +Alternatively the warehouse can be specified in the config block of a model's SQL file. + + + +```sql + +{{ + config( + materialized='table', + databricks_compute='Compute1' + ) +}} +select * from {{ ref('seed') }} + +``` + + + +:::note + +In the absence of a specified compute, we will default to the compute specified by http_path in the top level of the output section in your profile. +This is also the compute that will be used for tasks not associated with a particular model, such as gathering metadata for all tables in a schema. + +::: + +To validate that the specified compute is being used, look for lines in your dbt.log like: + +``` +Databricks adapter ... using default compute resource. +``` + +or + +``` +Databricks adapter ... using compute resource . +``` + + ## Persisting model descriptions From 51ffdc274a6308e044e07cb0ae741ac83482c993 Mon Sep 17 00:00:00 2001 From: Ben Cassell <98852248+benc-db@users.noreply.github.com> Date: Thu, 30 Nov 2023 13:30:49 -0800 Subject: [PATCH 2/7] Update website/docs/reference/resource-configs/databricks-configs.md Co-authored-by: Amy Chen <46451573+amychen1776@users.noreply.github.com> --- website/docs/reference/resource-configs/databricks-configs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md index 8b09eb5326c..e4027edaf2e 100644 --- a/website/docs/reference/resource-configs/databricks-configs.md +++ b/website/docs/reference/resource-configs/databricks-configs.md @@ -388,7 +388,7 @@ To take advantage of this capability, you will need to add compute blocks to you compute: ### Name that you will use to refer to an alternate compute - AltCompute: + Compute1: http_path: [‘/sql/your/http/path’] # Required of each alternate compute ### A third named compute, use whatever name you like From 7389ad9fd15212192a79c6a92cc9e7c325fb6825 Mon Sep 17 00:00:00 2001 From: Ben Cassell <98852248+benc-db@users.noreply.github.com> Date: Thu, 30 Nov 2023 13:31:49 -0800 Subject: [PATCH 3/7] Update website/docs/reference/resource-configs/databricks-configs.md Co-authored-by: Amy Chen <46451573+amychen1776@users.noreply.github.com> --- .../docs/reference/resource-configs/databricks-configs.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md index e4027edaf2e..bb226045788 100644 --- a/website/docs/reference/resource-configs/databricks-configs.md +++ b/website/docs/reference/resource-configs/databricks-configs.md @@ -423,7 +423,14 @@ Each compute is keyed by a name which is used in the model definition/configurat You need to use the same set of names for compute across your outputs, though you may supply different http_paths, allowing you to use different computes in different deployment scenarios. ::: +To configure this inside of dbt Cloud, use the [extended attributes feature](/docs/dbt-cloud-environments#extended-attributes-) on the desired environments. You can input them as such. +```yaml +compute: + Compute1: + http_path:[`/some/other/path'] + Compute2: + http_path:[`/some/other/path'] ### Specifying the compute for models As with many other configuaration options, you can specify the compute for a model in multiple ways, using `databricks_compute`. From 2434bda1646fcf63bf81d7bb61d8bdcee67323a9 Mon Sep 17 00:00:00 2001 From: Ben Cassell <98852248+benc-db@users.noreply.github.com> Date: Thu, 30 Nov 2023 13:32:03 -0800 Subject: [PATCH 4/7] Update website/docs/reference/resource-configs/databricks-configs.md Co-authored-by: Amy Chen <46451573+amychen1776@users.noreply.github.com> --- website/docs/reference/resource-configs/databricks-configs.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md index bb226045788..800ba7430f0 100644 --- a/website/docs/reference/resource-configs/databricks-configs.md +++ b/website/docs/reference/resource-configs/databricks-configs.md @@ -416,7 +416,7 @@ To take advantage of this capability, you will need to add compute blocks to you The new compute section is a map of user chosen names to objects with an http_path property. -Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models. +Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models. We recommend choosing a name that is easily recognized as to what compute resources you're using, such as what the compute resource is named inside of the Databricks UI. :::note From 99a0345d161a6a956669cc1f60f20370937179a8 Mon Sep 17 00:00:00 2001 From: Ben Cassell <98852248+benc-db@users.noreply.github.com> Date: Thu, 30 Nov 2023 16:29:52 -0800 Subject: [PATCH 5/7] Update databricks-configs.md - Adding python discussion. --- .../resource-configs/databricks-configs.md | 25 ++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md index 800ba7430f0..f218cec3172 100644 --- a/website/docs/reference/resource-configs/databricks-configs.md +++ b/website/docs/reference/resource-configs/databricks-configs.md @@ -363,7 +363,7 @@ insert into analytics.replace_where_incremental -## Selecting compute per model +### Selecting compute per model Beginning in version 1.7.2, you can assign which compute to use on a per-model basis. To take advantage of this capability, you will need to add compute blocks to your profile: @@ -511,6 +511,29 @@ or Databricks adapter ... using compute resource . ``` +### Selecting compute for a Python model + +Materializing a python model requires execution of SQL as well as python. +Specifically, if your python model is incremental, the current execution pattern involves executing python to create a staging table that is then merged into your target table using SQL. +The python code needs to run on an all purpose cluster, while the SQL code can run on an all purpose cluster or a SQL Warehouse. +When you specify your `databricks_compute` for a python model, you are currently only specifying which compute to use when running the model-specific SQL. +If you wish to use a different compute for executing the python itself, you must specify an alternate `http_path` in the config for the model: + + + + ```python + +def model(dbt, session): + dbt.config( + http_path="sql/protocolv1/..." + ) + +``` + + + +If your default compute is a SQL Warehouse, you will need to specify an all purpose cluster `http_path` in this way. + ## Persisting model descriptions From c129674e9d9f0eca1421bc7f35228c5e5911f7dc Mon Sep 17 00:00:00 2001 From: Ben Cassell <98852248+benc-db@users.noreply.github.com> Date: Thu, 30 Nov 2023 16:44:04 -0800 Subject: [PATCH 6/7] Update databricks-configs.md - clean up --- .../resource-configs/databricks-configs.md | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md index f218cec3172..49aa4dd3a84 100644 --- a/website/docs/reference/resource-configs/databricks-configs.md +++ b/website/docs/reference/resource-configs/databricks-configs.md @@ -363,9 +363,11 @@ insert into analytics.replace_where_incremental -### Selecting compute per model +## Selecting compute per model -Beginning in version 1.7.2, you can assign which compute to use on a per-model basis. +Beginning in version 1.7.2, you can assign which compute resource to use on a per-model basis. +For SQL models, you can select a SQL Warehouse (serverless or provisioned) or an all purpose cluster. +For details on how this feature interacts with python models, see [Specifying compute for Python models](#specifying-compute-for-python-models). To take advantage of this capability, you will need to add compute blocks to your profile: @@ -423,14 +425,20 @@ Each compute is keyed by a name which is used in the model definition/configurat You need to use the same set of names for compute across your outputs, though you may supply different http_paths, allowing you to use different computes in different deployment scenarios. ::: -To configure this inside of dbt Cloud, use the [extended attributes feature](/docs/dbt-cloud-environments#extended-attributes-) on the desired environments. You can input them as such. + +To configure this inside of dbt Cloud, use the [extended attributes feature](/docs/dbt-cloud-environments#extended-attributes-) on the desired environments. +You can input like so: ```yaml + compute: Compute1: http_path:[`/some/other/path'] Compute2: http_path:[`/some/other/path'] + +``` + ### Specifying the compute for models As with many other configuaration options, you can specify the compute for a model in multiple ways, using `databricks_compute`. @@ -511,7 +519,7 @@ or Databricks adapter ... using compute resource . ``` -### Selecting compute for a Python model +### Specifying compute for Python models Materializing a python model requires execution of SQL as well as python. Specifically, if your python model is incremental, the current execution pattern involves executing python to create a staging table that is then merged into your target table using SQL. From 1d6bf4997b88f7e99ef1c6fb10cd125a3370545f Mon Sep 17 00:00:00 2001 From: Ben Cassell <98852248+benc-db@users.noreply.github.com> Date: Thu, 30 Nov 2023 16:48:30 -0800 Subject: [PATCH 7/7] Update databricks-configs.md - minor touch ups --- .../docs/reference/resource-configs/databricks-configs.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md index 49aa4dd3a84..8426846997c 100644 --- a/website/docs/reference/resource-configs/databricks-configs.md +++ b/website/docs/reference/resource-configs/databricks-configs.md @@ -418,7 +418,8 @@ To take advantage of this capability, you will need to add compute blocks to you The new compute section is a map of user chosen names to objects with an http_path property. -Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models. We recommend choosing a name that is easily recognized as to what compute resources you're using, such as what the compute resource is named inside of the Databricks UI. +Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models. +We recommend choosing a name that is easily recognized as the compute resources you're using, such as the name of the compute resource inside the Databricks UI. :::note @@ -426,8 +427,7 @@ You need to use the same set of names for compute across your outputs, though yo ::: -To configure this inside of dbt Cloud, use the [extended attributes feature](/docs/dbt-cloud-environments#extended-attributes-) on the desired environments. -You can input like so: +To configure this inside of dbt Cloud, use the [extended attributes feature](/docs/dbt-cloud-environments#extended-attributes-) on the desired environments: ```yaml