diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md index a3b00177967..8426846997c 100644 --- a/website/docs/reference/resource-configs/databricks-configs.md +++ b/website/docs/reference/resource-configs/databricks-configs.md @@ -361,6 +361,188 @@ insert into analytics.replace_where_incremental + + +## Selecting compute per model + +Beginning in version 1.7.2, you can assign which compute resource to use on a per-model basis. +For SQL models, you can select a SQL Warehouse (serverless or provisioned) or an all purpose cluster. +For details on how this feature interacts with python models, see [Specifying compute for Python models](#specifying-compute-for-python-models). +To take advantage of this capability, you will need to add compute blocks to your profile: + + + +```yaml + +: + target: # this is the default target + outputs: + : + type: databricks + catalog: [optional catalog name if you are using Unity Catalog] + schema: [schema name] # Required + host: [yourorg.databrickshost.com] # Required + + ### This path is used as the default compute + http_path: [/sql/your/http/path] # Required + + ### New compute section + compute: + + ### Name that you will use to refer to an alternate compute + Compute1: + http_path: [‘/sql/your/http/path’] # Required of each alternate compute + + ### A third named compute, use whatever name you like + Compute2: + http_path: [‘/some/other/path’] # Required of each alternate compute + ... + + : # additional targets + ... + ### For each target, you need to define the same compute, + ### but you can specify different paths + compute: + + ### Name that you will use to refer to an alternate compute + Compute1: + http_path: [‘/sql/your/http/path’] # Required of each alternate compute + + ### A third named compute, use whatever name you like + Compute2: + http_path: [‘/some/other/path’] # Required of each alternate compute + ... + +``` + + + +The new compute section is a map of user chosen names to objects with an http_path property. +Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models. +We recommend choosing a name that is easily recognized as the compute resources you're using, such as the name of the compute resource inside the Databricks UI. + +:::note + +You need to use the same set of names for compute across your outputs, though you may supply different http_paths, allowing you to use different computes in different deployment scenarios. + +::: + +To configure this inside of dbt Cloud, use the [extended attributes feature](/docs/dbt-cloud-environments#extended-attributes-) on the desired environments: + +```yaml + +compute: + Compute1: + http_path:[`/some/other/path'] + Compute2: + http_path:[`/some/other/path'] + +``` + +### Specifying the compute for models + +As with many other configuaration options, you can specify the compute for a model in multiple ways, using `databricks_compute`. +In your `dbt_project.yml`, the selected compute can be specified for all the models in a given directory: + + + +```yaml + +... + +models: + +databricks_compute: "Compute1" # use the `Compute1` warehouse/cluster for all models in the project... + my_project: + clickstream: + +databricks_compute: "Compute2" # ...except for the models in the `clickstream` folder, which will use `Compute2`. + +snapshots: + +databricks_compute: "Compute1" # all Snapshot models are configured to use `Compute1`. + +``` + + + +For an individual model the compute can be specified in the model config in your schema file. + + + +```yaml + +models: + - name: table_model + config: + databricks_compute: Compute1 + columns: + - name: id + data_type: int + +``` + + + + +Alternatively the warehouse can be specified in the config block of a model's SQL file. + + + +```sql + +{{ + config( + materialized='table', + databricks_compute='Compute1' + ) +}} +select * from {{ ref('seed') }} + +``` + + + +:::note + +In the absence of a specified compute, we will default to the compute specified by http_path in the top level of the output section in your profile. +This is also the compute that will be used for tasks not associated with a particular model, such as gathering metadata for all tables in a schema. + +::: + +To validate that the specified compute is being used, look for lines in your dbt.log like: + +``` +Databricks adapter ... using default compute resource. +``` + +or + +``` +Databricks adapter ... using compute resource . +``` + +### Specifying compute for Python models + +Materializing a python model requires execution of SQL as well as python. +Specifically, if your python model is incremental, the current execution pattern involves executing python to create a staging table that is then merged into your target table using SQL. +The python code needs to run on an all purpose cluster, while the SQL code can run on an all purpose cluster or a SQL Warehouse. +When you specify your `databricks_compute` for a python model, you are currently only specifying which compute to use when running the model-specific SQL. +If you wish to use a different compute for executing the python itself, you must specify an alternate `http_path` in the config for the model: + + + + ```python + +def model(dbt, session): + dbt.config( + http_path="sql/protocolv1/..." + ) + +``` + + + +If your default compute is a SQL Warehouse, you will need to specify an all purpose cluster `http_path` in this way. + + ## Persisting model descriptions