Skip to content

Commit

Permalink
Merge branch 'current' into runleonarun-patch-12
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Dec 7, 2023
2 parents ea0f2d8 + 1a835bb commit d639b3a
Show file tree
Hide file tree
Showing 86 changed files with 308 additions and 12 deletions.
2 changes: 1 addition & 1 deletion contributing/content-style-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -284,7 +284,7 @@ If the list starts getting lengthy and dense, consider presenting the same conte

A bulleted list with introductory text:

> A dbt project is a directory of `.sql` and .yml` files. The directory must contain at a minimum:
> A dbt project is a directory of `.sql` and `.yml` files. The directory must contain at a minimum:
>
> - Models: A model is a single `.sql` file. Each model contains a single `select` statement that either transforms raw data into a dataset that is ready for analytics or, more often, is an intermediate step in such a transformation.
> - A project file: A `dbt_project.yml` file, which configures and defines your dbt project.
Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/build/semantic-models.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ semantic_models:
- name: the_name_of_the_semantic_model ## Required
description: same as always ## Optional
model: ref('some_model') ## Required
default: ## Required
defaults: ## Required
agg_time_dimension: dimension_name ## Required if the model contains dimensions
entities: ## Required
- see more information in entities
Expand Down
8 changes: 5 additions & 3 deletions website/docs/docs/cloud/manage-access/sso-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,8 +57,9 @@ Non-admin users that currently login with a password will no longer be able to d
### Security best practices

There are a few scenarios that might require you to login with a password. We recommend these security best-practices for the two most common scenarios:
* **Onboarding partners and contractors** - We highly recommend that you add partners and contractors to your Identity Provider. IdPs like Okta and Azure Active Directory (AAD) offer capabilities explicitly for temporary employees. We highly recommend that you reach out to your IT team to provision an SSO license for these situations. Using an IdP highly secure, reduces any breach risk, and significantly increases the security posture of your dbt Cloud environment.
* **Identity Provider is down -** Account admins will continue to be able to log in with a password which would allow them to work with your Identity Provider to troubleshoot the problem.
* **Onboarding partners and contractors** — We highly recommend that you add partners and contractors to your Identity Provider. IdPs like Okta and Azure Active Directory (AAD) offer capabilities explicitly for temporary employees. We highly recommend that you reach out to your IT team to provision an SSO license for these situations. Using an IdP highly secure, reduces any breach risk, and significantly increases the security posture of your dbt Cloud environment.
* **Identity Provider is down** — Account admins will continue to be able to log in with a password which would allow them to work with your Identity Provider to troubleshoot the problem.
* **Offboarding admins** — When offboarding admins, revoke access to dbt Cloud by deleting the user from your environment; otherwise, they can continue to use username/password credentials to log in.

### Next steps for non-admin users currently logging in with passwords

Expand All @@ -67,4 +68,5 @@ If you have any non-admin users logging into dbt Cloud with a password today:
1. Ensure that all users have a user account in your identity provider and are assigned dbt Cloud so they won’t lose access.
2. Alert all dbt Cloud users that they won’t be able to use a password for logging in anymore unless they are already an Admin with a password.
3. We **DO NOT** recommend promoting any users to Admins just to preserve password-based logins because you will reduce security of your dbt Cloud environment.
**


1 change: 1 addition & 0 deletions website/docs/docs/cloud/secure/about-privatelink.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,4 @@ dbt Cloud supports the following data platforms for use with the PrivateLink fea
- [Databricks](/docs/cloud/secure/databricks-privatelink)
- [Redshift](/docs/cloud/secure/redshift-privatelink)
- [Postgres](/docs/cloud/secure/postgres-privatelink)
- [VCS](/docs/cloud/secure/vcs-privatelink)
6 changes: 4 additions & 2 deletions website/docs/docs/collaborate/explore-projects.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: "Explore your dbt projects"
sidebar_label: "Explore dbt projects"
description: "Learn about dbt Explorer and how to interact with it to understand, improve, and leverage your data pipelines."
pagination_next: "docs/collaborate/explore-multiple-projects"
pagination_next: "docs/collaborate/model-performance"
pagination_prev: null
---

Expand Down Expand Up @@ -36,7 +36,7 @@ For a richer experience with dbt Explorer, you must:
- Run [dbt source freshness](/reference/commands/source#dbt-source-freshness) within a job in the environment to view source freshness data.
- Run [dbt snapshot](/reference/commands/snapshot) or [dbt build](/reference/commands/build) within a job in the environment to view snapshot details.

Richer and more timely metadata will become available as dbt, the Discovery API, and the underlying dbt Cloud platform evolves.
Richer and more timely metadata will become available as dbt Core, the Discovery API, and the underlying dbt Cloud platform evolves.

## Explore your project's lineage graph {#project-lineage}

Expand All @@ -46,6 +46,8 @@ If you don't see the project lineage graph immediately, click **Render Lineage**

The nodes in the lineage graph represent the project’s resources and the edges represent the relationships between the nodes. Nodes are color-coded and include iconography according to their resource type.

By default, dbt Explorer shows the project's [applied state](/docs/dbt-cloud-apis/project-state#definition-logical-vs-applied-state-of-dbt-nodes) lineage. That is, it shows models that have been successfully built and are available to query, not just the models defined in the project.

To explore the lineage graphs of tests and macros, view [their resource details pages](#view-resource-details). By default, dbt Explorer excludes these resources from the full lineage graph unless a search query returns them as results.

To interact with the full lineage graph, you can:
Expand Down
41 changes: 41 additions & 0 deletions website/docs/docs/collaborate/model-performance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: "Model performance"
sidebar_label: "Model performance"
description: "Learn about ."
---

dbt Explorer provides metadata on dbt Cloud runs for in-depth model performance and quality analysis. This feature assists in reducing infrastructure costs and saving time for data teams by highlighting where to fine-tune projects and deployments — such as model refactoring or job configuration adjustments.

<LoomVideo id='98f33b3b7a374df0b7c04747eae6ef44' />

:::tip Beta

The model performance beta feature is now available in dbt Explorer! Check it out!
:::

## The Performance overview page

You can pinpoint areas for performance enhancement by using the Performance overview page. This page presents a comprehensive analysis across all project models and displays the longest-running models, those most frequently executed, and the ones with the highest failure rates during runs/tests. Data can be segmented by environment and job type which can offer insights into:

- Most executed models (total count).
- Models with the longest execution time (average duration).
- Models with the most failures, detailing run failures (percentage and count) and test failures (percentage and count).

Each data point links to individual models in Explorer.

<Lightbox src="/img/docs/collaborate/dbt-explorer/example-performance-overview-page.png" width="80%" title="Example of Performance overview page"/>

You can view historical metadata for up to the past three months. Select the time horizon using the filter, which defaults to a two-week lookback.

<Lightbox src="/img/docs/collaborate/dbt-explorer/ex-2-week-default.png" title="Example of dropdown"/>

## The Model performance tab

You can view trends in execution times, counts, and failures by using the Model performance tab for historical performance analysis. Daily execution data includes:

- Average model execution time.
- Model execution counts, including failures/errors (total sum).

Clicking on a data point reveals a table listing all job runs for that day, with each row providing a direct link to the details of a specific run.

<Lightbox src="/img/docs/collaborate/dbt-explorer/example-model-performance-tab.png" title="Example of the Model performance tab"/>
50 changes: 50 additions & 0 deletions website/docs/docs/collaborate/project-recommendations.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
title: "Project recommendations"
sidebar_label: "Project recommendations"
description: "dbt Explorer provides recommendations that you can take to improve the quality of your dbt project."
---

:::tip Beta

The project recommendations beta feature is now available in dbt Explorer! Check it out!

:::

dbt Explorer provides recommendations about your project from the `dbt_project_evaluator` [package](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest/) using metadata from the Discovery API.

Explorer also offers a global view, showing all the recommendations across the project for easy sorting and summarizing.

These recommendations provide insight into how you can build a more well documented, well tested, and well built project, leading to less confusion and more trust.

The Recommendations overview page includes two top-level metrics measuring the test and documentation coverage of the models in your project.

- **Model test coverage** &mdash; The percent of models in your project (models not from a package or imported via dbt Mesh) with at least one dbt test configured on them.
- **Model documentation coverage** &mdash; The percent of models in your project (models not from a package or imported via dbt Mesh) with a description.

<Lightbox src="/img/docs/collaborate/dbt-explorer/example-recommendations-overview.png" width="80%" title="Example of the Recommendations overview page with project metrics and the recommendations for all resources in the project"/>

## List of rules

| Category | Name | Description | Package Docs Link |
| --- | --- | --- | --- |
| Modeling | Direct Join to Source | Model that joins both a model and source, indicating a missing staging model | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/modeling/#direct-join-to-source) |
| Modeling | Duplicate Sources | More than one source node corresponds to the same data warehouse relation | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/modeling/#duplicate-sources) |
| Modeling | Multiple Sources Joined | Models with more than one source parent, indicating lack of staging models | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/modeling/#multiple-sources-joined) |
| Modeling | Root Model | Models with no parents, indicating potential hardcoded references and need for sources | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/modeling/#root-models) |
| Modeling | Source Fanout | Sources with more than one model child, indicating a need for staging models | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/modeling/#source-fanout) |
| Modeling | Unused Source | Sources that are not referenced by any resource | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/modeling/#unused-sources) |
| Performance | Exposure Dependent on View | Exposures with at least one model parent materialized as a view, indicating potential query performance issues | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/performance/#exposure-parents-materializations) |
| Testing | Missing Primary Key Test | Models with insufficient testing on the grain of the model. | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/testing/#missing-primary-key-tests) |
| Documentation | Undocumented Models | Models without a model-level description | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/documentation/#undocumented-models) |
| Documentation | Undocumented Source | Sources (collections of source tables) without descriptions | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/documentation/#undocumented-sources) |
| Documentation | Undocumented Source Tables | Source tables without descriptions | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/documentation/#undocumented-source-tables) |
| Governance | Public Model Missing Contract | Models with public access that do not have a model contract to ensure the data types | [GitHub](https://dbt-labs.github.io/dbt-project-evaluator/0.8/rules/governance/#public-models-without-contracts) |


## The Recommendations tab

Models, sources and exposures each also have a Recommendations tab on their resource details page, with the specific recommendations that correspond to that resource:

<Lightbox src="/img/docs/collaborate/dbt-explorer/example-recommendations-tab.png" width="80%" title="Example of the Recommendations tab "/>


Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
title: "Update: Extended attributes is GA"
description: "December 2023: The extended attributes feature is now GA in dbt Cloud. It enables you to override dbt adapter YAML attributes at the environment level."
sidebar_label: "Update: Extended attributes is GA"
sidebar_position: 10
tags: [Dec-2023]
date: 2023-12-06
---

The extended attributes feature in dbt Cloud is now GA! It allows for an environment level override on any YAML attribute that a dbt adapter accepts in its `profiles.yml`. You can provide a YAML snippet to add or replace any [profile](/docs/core/connect-data-platform/profiles.yml) value.

To learn more, refer to [Extended attributes](/docs/dbt-cloud-environments#extended-attributes).

The **Extended Atrributes** text box is available from your environment's settings page:

<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/extended-attributes.jpg" width="85%" title="Example of the Extended Attributes text box" />
4 changes: 2 additions & 2 deletions website/docs/docs/deploy/retry-jobs.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ If your dbt job run completed with a status of **Error**, you can rerun it from
<Lightbox src="/img/docs/deploy/native-retry.gif" width="70%" title="Example of the Rerun options in dbt Cloud"/>

## Related content
- [Retry a failed run for a job](/dbt-cloud/api-v2#/operations/Retry%20a%20failed%20run%20for%20a%20job) API endpoint
- [Retry a failed run for a job](/dbt-cloud/api-v2#/operations/Retry%20Failed%20Job) API endpoint
- [Run visibility](/docs/deploy/run-visibility)
- [Jobs](/docs/deploy/jobs)
- [Job commands](/docs/deploy/job-commands)
- [Job commands](/docs/deploy/job-commands)
182 changes: 182 additions & 0 deletions website/docs/reference/resource-configs/databricks-configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -361,6 +361,188 @@ insert into analytics.replace_where_incremental
</TabItem>
</Tabs>

<VersionBlock firstVersion="1.7">

## Selecting compute per model

Beginning in version 1.7.2, you can assign which compute resource to use on a per-model basis.
For SQL models, you can select a SQL Warehouse (serverless or provisioned) or an all purpose cluster.
For details on how this feature interacts with python models, see [Specifying compute for Python models](#specifying-compute-for-python-models).
To take advantage of this capability, you will need to add compute blocks to your profile:

<File name='profile.yml'>

```yaml

<profile-name>:
target: <target-name> # this is the default target
outputs:
<target-name>:
type: databricks
catalog: [optional catalog name if you are using Unity Catalog]
schema: [schema name] # Required
host: [yourorg.databrickshost.com] # Required

### This path is used as the default compute
http_path: [/sql/your/http/path] # Required

### New compute section
compute:

### Name that you will use to refer to an alternate compute
Compute1:
http_path: [‘/sql/your/http/path’] # Required of each alternate compute

### A third named compute, use whatever name you like
Compute2:
http_path: [‘/some/other/path’] # Required of each alternate compute
...

<target-name>: # additional targets
...
### For each target, you need to define the same compute,
### but you can specify different paths
compute:

### Name that you will use to refer to an alternate compute
Compute1:
http_path: [‘/sql/your/http/path’] # Required of each alternate compute

### A third named compute, use whatever name you like
Compute2:
http_path: [‘/some/other/path’] # Required of each alternate compute
...

```

</File>

The new compute section is a map of user chosen names to objects with an http_path property.
Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models.
We recommend choosing a name that is easily recognized as the compute resources you're using, such as the name of the compute resource inside the Databricks UI.

:::note

You need to use the same set of names for compute across your outputs, though you may supply different http_paths, allowing you to use different computes in different deployment scenarios.

:::

To configure this inside of dbt Cloud, use the [extended attributes feature](/docs/dbt-cloud-environments#extended-attributes-) on the desired environments:

```yaml

compute:
Compute1:
http_path:[`/some/other/path']
Compute2:
http_path:[`/some/other/path']

```

### Specifying the compute for models

As with many other configuaration options, you can specify the compute for a model in multiple ways, using `databricks_compute`.
In your `dbt_project.yml`, the selected compute can be specified for all the models in a given directory:

<File name='dbt_project.yml'>

```yaml

...

models:
+databricks_compute: "Compute1" # use the `Compute1` warehouse/cluster for all models in the project...
my_project:
clickstream:
+databricks_compute: "Compute2" # ...except for the models in the `clickstream` folder, which will use `Compute2`.

snapshots:
+databricks_compute: "Compute1" # all Snapshot models are configured to use `Compute1`.

```

</File>

For an individual model the compute can be specified in the model config in your schema file.

<File name='schema.yml'>

```yaml

models:
- name: table_model
config:
databricks_compute: Compute1
columns:
- name: id
data_type: int

```

</File>


Alternatively the warehouse can be specified in the config block of a model's SQL file.

<File name='model.sql'>

```sql

{{
config(
materialized='table',
databricks_compute='Compute1'
)
}}
select * from {{ ref('seed') }}

```

</File>

:::note

In the absence of a specified compute, we will default to the compute specified by http_path in the top level of the output section in your profile.
This is also the compute that will be used for tasks not associated with a particular model, such as gathering metadata for all tables in a schema.

:::

To validate that the specified compute is being used, look for lines in your dbt.log like:

```
Databricks adapter ... using default compute resource.
```

or

```
Databricks adapter ... using compute resource <name of compute>.
```

### Specifying compute for Python models

Materializing a python model requires execution of SQL as well as python.
Specifically, if your python model is incremental, the current execution pattern involves executing python to create a staging table that is then merged into your target table using SQL.
The python code needs to run on an all purpose cluster, while the SQL code can run on an all purpose cluster or a SQL Warehouse.
When you specify your `databricks_compute` for a python model, you are currently only specifying which compute to use when running the model-specific SQL.
If you wish to use a different compute for executing the python itself, you must specify an alternate `http_path` in the config for the model:

<File name="model.py">

```python

def model(dbt, session):
dbt.config(
http_path="sql/protocolv1/..."
)

```

</File>

If your default compute is a SQL Warehouse, you will need to specify an all purpose cluster `http_path` in this way.

</VersionBlock>

## Persisting model descriptions

Expand Down
Loading

0 comments on commit d639b3a

Please sign in to comment.