Merge branch 'current' into patkearns10-sso-admin-password

dbt-labs · Dec 6, 2023 · 9ec03ef · 9ec03ef
2 parents 2a37313 + 5b8666d
commit 9ec03ef
Show file tree

Hide file tree

Showing 3 changed files with 199 additions and 1 deletion.
diff --git a/website/docs/docs/dbt-versions/release-notes/74-Dec-2023/external-attributes.md b/website/docs/docs/dbt-versions/release-notes/74-Dec-2023/external-attributes.md
@@ -0,0 +1,16 @@
+---
+title: "Update: Extended attributes is GA"
+description: "December 2023: The extended attributes feature is now GA in dbt Cloud. It enables you to override dbt adapter YAML attributes at the environment level."
+sidebar_label: "Update: Extended attributes is GA"
+sidebar_position: 10
+tags: [Dec-2023]
+date: 2023-12-06
+---
+
+The extended attributes feature in dbt Cloud is now GA! It allows for an environment level override on any YAML attribute that a dbt adapter accepts in its `profiles.yml`. You can provide a YAML snippet to add or replace any [profile](/docs/core/connect-data-platform/profiles.yml) value.
+
+To learn more, refer to [Extended attributes](/docs/dbt-cloud-environments#extended-attributes).
+
+The **Extended Atrributes** text box is available from your environment's settings page: 
+
+<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/extended-attributes.jpg" width="85%" title="Example of the Extended Attributes text box" />
diff --git a/website/docs/reference/resource-configs/databricks-configs.md b/website/docs/reference/resource-configs/databricks-configs.md
@@ -361,6 +361,188 @@ insert into analytics.replace_where_incremental
 </TabItem>
 </Tabs>
 
+<VersionBlock firstVersion="1.7">
+
+## Selecting compute per model
+
+Beginning in version 1.7.2, you can assign which compute resource to use on a per-model basis. 
+For SQL models, you can select a SQL Warehouse (serverless or provisioned) or an all purpose cluster.
+For details on how this feature interacts with python models, see [Specifying compute for Python models](#specifying-compute-for-python-models).
+To take advantage of this capability, you will need to add compute blocks to your profile:
+
+<File name='profile.yml'>
+
+```yaml
+
+<profile-name>:
+  target: <target-name> # this is the default target
+  outputs:
+    <target-name>:
+      type: databricks
+      catalog: [optional catalog name if you are using Unity Catalog]
+      schema: [schema name] # Required        
+      host: [yourorg.databrickshost.com] # Required
+
+      ### This path is used as the default compute
+      http_path: [/sql/your/http/path] # Required        
+
+      ### New compute section
+      compute:
+
+        ### Name that you will use to refer to an alternate compute
+       Compute1:
+          http_path: [‘/sql/your/http/path’] # Required of each alternate compute
+
+        ### A third named compute, use whatever name you like
+        Compute2:
+          http_path: [‘/some/other/path’] # Required of each alternate compute
+      ...
+
+    <target-name>: # additional targets
+      ...
+      ### For each target, you need to define the same compute,
+      ### but you can specify different paths
+      compute:
+
+        ### Name that you will use to refer to an alternate compute
+        Compute1:
+          http_path: [‘/sql/your/http/path’] # Required of each alternate compute
+
+        ### A third named compute, use whatever name you like
+        Compute2:
+          http_path: [‘/some/other/path’] # Required of each alternate compute
+      ...
+
+```
+
+</File>
+
+The new compute section is a map of user chosen names to objects with an http_path property.
+Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models. 
+We recommend choosing a name that is easily recognized as the compute resources you're using, such as the name of the compute resource inside the Databricks UI. 
+
+:::note
+
+You need to use the same set of names for compute across your outputs, though you may supply different http_paths, allowing you to use different computes in different deployment scenarios.
+
+:::
+
+To configure this inside of dbt Cloud, use the [extended attributes feature](/docs/dbt-cloud-environments#extended-attributes-) on the desired environments:
+
+```yaml
+
+compute:
+  Compute1:
+    http_path:[`/some/other/path']
+  Compute2:
+    http_path:[`/some/other/path']
+
+```
+
+### Specifying the compute for models
+
+As with many other configuaration options, you can specify the compute for a model in multiple ways, using `databricks_compute`.
+In your `dbt_project.yml`, the selected compute can be specified for all the models in a given directory:
+
+<File name='dbt_project.yml'>
+
+```yaml
+
+...
+
+models:
+  +databricks_compute: "Compute1"     # use the `Compute1` warehouse/cluster for all models in the project...
+  my_project:
+    clickstream:
+      +databricks_compute: "Compute2" # ...except for the models in the `clickstream` folder, which will use `Compute2`.
+
+snapshots:
+  +databricks_compute: "Compute1"     # all Snapshot models are configured to use `Compute1`.
+
+```
+
+</File>
+
+For an individual model the compute can be specified in the model config in your schema file.
+
+<File name='schema.yml'>
+
+```yaml
+
+models:
+  - name: table_model
+    config:
+      databricks_compute: Compute1
+    columns:
+      - name: id
+        data_type: int
+
+```
+
+</File>
+
+
+Alternatively the warehouse can be specified in the config block of a model's SQL file.
+
+<File name='model.sql'>
+
+```sql
+
+{{
+  config(
+    materialized='table',
+    databricks_compute='Compute1'
+  )
+}}
+select * from {{ ref('seed') }}
+
+```
+
+</File>
+
+:::note
+
+In the absence of a specified compute, we will default to the compute specified by http_path in the top level of the output section in your profile. 
+This is also the compute that will be used for tasks not associated with a particular model, such as gathering metadata for all tables in a schema.
+
+:::
+
+To validate that the specified compute is being used, look for lines in your dbt.log like:
+
+```
+Databricks adapter ... using default compute resource.
+```
+
+or
+
+```
+Databricks adapter ... using compute resource <name of compute>.
+```
+
+### Specifying compute for Python models
+
+Materializing a python model requires execution of SQL as well as python.
+Specifically, if your python model is incremental, the current execution pattern involves executing python to create a staging table that is then merged into your target table using SQL.
+The python code needs to run on an all purpose cluster, while the SQL code can run on an all purpose cluster or a SQL Warehouse.
+When you specify your `databricks_compute` for a python model, you are currently only specifying which compute to use when running the model-specific SQL.
+If you wish to use a different compute for executing the python itself, you must specify an alternate `http_path` in the config for the model:
+
+<File name="model.py">
+
+ ```python
+
+def model(dbt, session):
+    dbt.config(
+      http_path="sql/protocolv1/..."
+    )
+
+```
+
+</File>
+
+If your default compute is a SQL Warehouse, you will need to specify an all purpose cluster `http_path` in this way.
+
+</VersionBlock>
 
 ## Persisting model descriptions
 

diff --git a/website/docs/terms/data-wrangling.md b/website/docs/terms/data-wrangling.md
@@ -51,7 +51,7 @@ The cleaning stage involves using different functions so that the values in your
 - Removing appropriate duplicates or nulls you found in the discovery process
 - Eliminating unnecessary characters or spaces from values
 
-Certain cleaning steps, like removing rows with null values, are helpful to do at the beginning of the process because removing nulls and duplicates from the start can increase the performance of your downstream models.  In the cleaning step, it’s important to follow a standard for your transformations here. This means you should be following a consistent naming convention for your columns (especially for your <Term id="primary-key">primary keys</Term>) and casting to the same timezone and datatypes throughout your models. Examples include making sure all dates are in UTC time rather than source timezone-specific, all string in either lower or upper case, etc. 
+Certain cleaning steps, like removing rows with null values, are helpful to do at the beginning of the process because removing nulls and duplicates from the start can increase the performance of your downstream models.  In the cleaning step, it’s important to follow a standard for your transformations here. This means you should be following a consistent naming convention for your columns (especially for your <Term id="primary-key">primary keys</Term>) and casting to the same timezone and datatypes throughout your models. Examples include making sure all dates are in UTC time rather than source timezone-specific, all strings are in either lower or upper case, etc. 
 
 :::tip dbt to the rescue!
 If you're struggling to do all the cleaning on your own, remember that dbt packages ([dbt expectations](https://github.com/calogica/dbt-expectations), [dbt_utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/), and [re_data](https://www.getre.io/)) and their macros are also available to help you clean up your data.