Skip to content

Commit

Permalink
Merge branch 'current' into sl-deprecation
Browse files Browse the repository at this point in the history
mirnawong1 authored Dec 11, 2023
2 parents d569754 + 297ea3f commit 09407b6
Showing 15 changed files with 239 additions and 18 deletions.
Original file line number Diff line number Diff line change
@@ -7,7 +7,7 @@ displayText: Materializations best practices
hoverSnippet: Read this guide to understand the incremental models you can create in dbt.
---

So far we’ve looked at tables and views, which map to the traditional objects in the data warehouse. As mentioned earlier, incremental models are a little different. This where we start to deviate from this pattern with more powerful and complex materializations.
So far we’ve looked at tables and views, which map to the traditional objects in the data warehouse. As mentioned earlier, incremental models are a little different. This is where we start to deviate from this pattern with more powerful and complex materializations.

- 📚 **Incremental models generate tables.** They physically persist the data itself to the warehouse, just piece by piece. What’s different is **how we build that table**.
- 💅 **Only apply our transformations to rows of data with new or updated information**, this maximizes efficiency.
@@ -53,7 +53,7 @@ where
updated_at > (select max(updated_at) from {{ this }})
```

Let’s break down that `where` clause a bit, because this where the action is with incremental models. Stepping through the code **_right-to-left_** we:
Let’s break down that `where` clause a bit, because this is where the action is with incremental models. Stepping through the code **_right-to-left_** we:

1. Get our **cutoff.**
1. Select the `max(updated_at)` timestamp — the **most recent record**
@@ -138,7 +138,7 @@ where
{% endif %}
```

Fantastic! We’ve got a working incremental model. On our first run, when there is no corresponding table in the warehouse, `is_incremental` will evaluate to false and we’ll capture the entire table. On subsequent runs is it will evaluate to true and we’ll apply our filter logic, capturing only the newer data.
Fantastic! We’ve got a working incremental model. On our first run, when there is no corresponding table in the warehouse, `is_incremental` will evaluate to false and we’ll capture the entire table. On subsequent runs it will evaluate to true and we’ll apply our filter logic, capturing only the newer data.

### Late arriving facts

5 changes: 5 additions & 0 deletions website/docs/docs/build/metricflow-commands.md
Original file line number Diff line number Diff line change
@@ -556,3 +556,8 @@ Keep in mind that modifying your shell configuration files can have an impact on
</details>
<details>
<summary>Why is my query limited to 100 rows in the dbt Cloud CLI?</summary>
The default <code>limit</code> for query issues from the dbt Cloud CLI is 100 rows. We set this default to prevent returning unnecessarily large data sets as the dbt Cloud CLI is typically used to query the dbt Semantic Layer during the development process, not for production reporting or to access large data sets. For most workflows, you only need to return a subset of the data.<br /><br />
However, you can change this limit if needed by setting the <code>--limit</code> option in your query. For example, to return 1000 rows, you can run <code>dbt sl list metrics --limit 1000</code>.
</details>
2 changes: 1 addition & 1 deletion website/docs/docs/cloud/cloud-cli-installation.md
Original file line number Diff line number Diff line change
@@ -26,7 +26,7 @@ dbt commands are run against dbt Cloud's infrastructure and benefit from:
The dbt Cloud CLI is available in all [deployment regions](/docs/cloud/about-cloud/regions-ip-addresses) and for both multi-tenant and single-tenant accounts (Azure single-tenant not supported at this time).

- Ensure you are using dbt version 1.5 or higher. Refer to [dbt Cloud versions](/docs/dbt-versions/upgrade-core-in-cloud) to upgrade.
- Note that SSH tunneling for [Postgres and Redshift](/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb) connections and [Single sign-on (SSO)](/docs/cloud/manage-access/sso-overview) doesn't support the dbt Cloud CLI yet.
- Note that SSH tunneling for [Postgres and Redshift](/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb) connections doesn't support the dbt Cloud CLI yet.

## Install dbt Cloud CLI

16 changes: 11 additions & 5 deletions website/docs/docs/cloud/manage-access/audit-log.md
Original file line number Diff line number Diff line change
@@ -34,7 +34,7 @@ On the audit log page, you will see a list of various events and their associate

Click the event card to see the details about the activity that triggered the event. This view provides important details, including when it happened and what type of event was triggered. For example, if someone changes the settings for a job, you can use the event details to see which job was changed (type of event: `v1.events.job_definition.Changed`), by whom (person who triggered the event: `actor`), and when (time it was triggered: `created_at_utc`). For types of events and their descriptions, see [Events in audit log](#events-in-audit-log).

The event details provides the key factors of an event:
The event details provide the key factors of an event:

| Name | Description |
| -------------------- | --------------------------------------------- |
@@ -160,16 +160,22 @@ The audit log supports various events for different objects in dbt Cloud. You wi
You can search the audit log to find a specific event or actor, which is limited to the ones listed in [Events in audit log](#events-in-audit-log). The audit log successfully lists historical events spanning the last 90 days. You can search for an actor or event using the search bar, and then narrow your results using the time window.


<Lightbox src="/img/docs/dbt-cloud/dbt-cloud-enterprise/audit-log-search.png" width="85%" title="Use search bar to find content in the audit log"/>
<Lightbox src="/img/docs/dbt-cloud/dbt-cloud-enterprise/audit-log-search.png" width="95%" title="Use search bar to find content in the audit log"/>


## Exporting logs

You can use the audit log to export all historical audit results for security, compliance, and analysis purposes:

- For events within 90 days &mdash; dbt Cloud will automatically display the 90-day selectable date range. Select **Export Selection** to download a CSV file of all the events that occurred in your organization within 90 days.
- For events beyond 90 days &mdash; Select **Export All**. The Account Admin will receive an email link to download a CSV file of all the events that occurred in your organization.
- **For events within 90 days** &mdash; dbt Cloud will automatically display the 90-day selectable date range. Select **Export Selection** to download a CSV file of all the events that occurred in your organization within 90 days.

<Lightbox src="/img/docs/dbt-cloud/dbt-cloud-enterprise/audit-log-section.jpg" width="85%" title="View audit log export options"/>
- **For events beyond 90 days** &mdash; Select **Export All**. The Account Admin will receive an email link to download a CSV file of all the events that occurred in your organization.

<Lightbox src="/img/docs/dbt-cloud/dbt-cloud-enterprise/audit-log-section.jpg" width="95%" title="View audit log export options"/>

### Azure Single-tenant

For users deployed in [Azure single tenant](/docs/cloud/about-cloud/tenancy), while the **Export All** button isn't available, you can conveniently use specific APIs to access all events:

- [Get recent audit log events CSV](/dbt-cloud/api-v3#/operations/Get%20Recent%20Audit%20Log%20Events%20CSV) &mdash; This API returns all events in a single CSV without pagination.
- [List recent audit log events](/dbt-cloud/api-v3#/operations/List%20Recent%20Audit%20Log%20Events) &mdash; This API returns a limited number of events at a time, which means you will need to paginate the results.
8 changes: 5 additions & 3 deletions website/docs/docs/cloud/manage-access/sso-overview.md
Original file line number Diff line number Diff line change
@@ -57,8 +57,9 @@ Non-admin users that currently login with a password will no longer be able to d
### Security best practices

There are a few scenarios that might require you to login with a password. We recommend these security best-practices for the two most common scenarios:
* **Onboarding partners and contractors** - We highly recommend that you add partners and contractors to your Identity Provider. IdPs like Okta and Azure Active Directory (AAD) offer capabilities explicitly for temporary employees. We highly recommend that you reach out to your IT team to provision an SSO license for these situations. Using an IdP highly secure, reduces any breach risk, and significantly increases the security posture of your dbt Cloud environment.
* **Identity Provider is down -** Account admins will continue to be able to log in with a password which would allow them to work with your Identity Provider to troubleshoot the problem.
* **Onboarding partners and contractors** &mdash; We highly recommend that you add partners and contractors to your Identity Provider. IdPs like Okta and Azure Active Directory (AAD) offer capabilities explicitly for temporary employees. We highly recommend that you reach out to your IT team to provision an SSO license for these situations. Using an IdP highly secure, reduces any breach risk, and significantly increases the security posture of your dbt Cloud environment.
* **Identity Provider is down** &mdash; Account admins will continue to be able to log in with a password which would allow them to work with your Identity Provider to troubleshoot the problem.
* **Offboarding admins** &mdash; When offboarding admins, revoke access to dbt Cloud by deleting the user from your environment; otherwise, they can continue to use username/password credentials to log in.

### Next steps for non-admin users currently logging in with passwords

@@ -67,4 +68,5 @@ If you have any non-admin users logging into dbt Cloud with a password today:
1. Ensure that all users have a user account in your identity provider and are assigned dbt Cloud so they won’t lose access.
2. Alert all dbt Cloud users that they won’t be able to use a password for logging in anymore unless they are already an Admin with a password.
3. We **DO NOT** recommend promoting any users to Admins just to preserve password-based logins because you will reduce security of your dbt Cloud environment.
**


4 changes: 3 additions & 1 deletion website/docs/docs/dbt-support.md
Original file line number Diff line number Diff line change
@@ -17,7 +17,9 @@ If you're developing on the command line (CLI) and have questions or need some h

## dbt Cloud support

The global dbt Support team is available to dbt Cloud customers by email or in-product live chat. We want to help you work through implementing and utilizing dbt Cloud at your organization. Have a question you can't find an answer to in [our docs](https://docs.getdbt.com/) or [the Community Forum](https://discourse.getdbt.com/)? Our Support team is here to `dbt help` you!
The global dbt Support team is available to dbt Cloud customers by [email](mailto:support@getdbt.com) or using the in-product live chat (💬).

We want to help you work through implementing and utilizing dbt Cloud at your organization. Have a question you can't find an answer to in [our docs](https://docs.getdbt.com/) or [the Community Forum](https://discourse.getdbt.com/)? Our Support team is here to `dbt help` you!

- **Enterprise plans** &mdash; Priority [support](#severity-level-for-enterprise-support), options for custom support coverage hours, implementation assistance, dedicated management, and dbt Labs security reviews depending on price point.
- **Developer and Team plans** &mdash; 24x5 support (no service level agreement (SLA); [contact Sales](https://www.getdbt.com/pricing/) for Enterprise plan inquiries).
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
title: "Update: Extended attributes is GA"
description: "December 2023: The extended attributes feature is now GA in dbt Cloud. It enables you to override dbt adapter YAML attributes at the environment level."
sidebar_label: "Update: Extended attributes is GA"
sidebar_position: 10
tags: [Dec-2023]
date: 2023-12-06
---

The extended attributes feature in dbt Cloud is now GA! It allows for an environment level override on any YAML attribute that a dbt adapter accepts in its `profiles.yml`. You can provide a YAML snippet to add or replace any [profile](/docs/core/connect-data-platform/profiles.yml) value.

To learn more, refer to [Extended attributes](/docs/dbt-cloud-environments#extended-attributes).

The **Extended Atrributes** text box is available from your environment's settings page:

<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/extended-attributes.jpg" width="85%" title="Example of the Extended Attributes text box" />
2 changes: 1 addition & 1 deletion website/docs/guides/bigquery-qs.md
Original file line number Diff line number Diff line change
@@ -78,7 +78,7 @@ In order to let dbt connect to your warehouse, you'll need to generate a keyfile
- Click **Next** to create a new service account.
2. Create a service account for your new project from the [Service accounts page](https://console.cloud.google.com/projectselector2/iam-admin/serviceaccounts?supportedpurview=project). For more information, refer to [Create a service account](https://developers.google.com/workspace/guides/create-credentials#create_a_service_account) in the Google Cloud docs. As an example for this guide, you can:
- Type `dbt-user` as the **Service account name**
- From the **Select a role** dropdown, choose **BigQuery Admin** and click **Continue**
- From the **Select a role** dropdown, choose **BigQuery Job User** and **BigQuery Data Editor** roles and click **Continue**
- Leave the **Grant users access to this service account** fields blank
- Click **Done**
3. Create a service account key for your new project from the [Service accounts page](https://console.cloud.google.com/iam-admin/serviceaccounts?walkthrough_id=iam--create-service-account-keys&start_index=1#step_index=1). For more information, refer to [Create a service account key](https://cloud.google.com/iam/docs/creating-managing-service-account-keys#creating) in the Google Cloud docs. When downloading the JSON file, make sure to use a filename you can easily remember. For example, `dbt-user-creds.json`. For security reasons, dbt Labs recommends that you protect this JSON file like you would your identity credentials; for example, don't check the JSON file into your version control software.
2 changes: 1 addition & 1 deletion website/docs/guides/manual-install-qs.md
Original file line number Diff line number Diff line change
@@ -16,7 +16,7 @@ When you use dbt Core to work with dbt, you will be editing files locally using

* To use dbt Core, it's important that you know some basics of the Terminal. In particular, you should understand `cd`, `ls` and `pwd` to navigate through the directory structure of your computer easily.
* Install dbt Core using the [installation instructions](/docs/core/installation-overview) for your operating system.
* Complete [Setting up (in BigQuery)](/guides/bigquery?step=2) and [Loading data (BigQuery)](/guides/bigquery?step=3).
* Complete appropriate Setting up and Loading data steps in the Quickstart for dbt Cloud series. For example, for BigQuery, complete [Setting up (in BigQuery)](/guides/bigquery?step=2) and [Loading data (BigQuery)](/guides/bigquery?step=3).
* [Create a GitHub account](https://github.com/join) if you don't already have one.

### Create a starter project
3 changes: 3 additions & 0 deletions website/docs/reference/artifacts/dbt-artifacts.md
Original file line number Diff line number Diff line change
@@ -48,3 +48,6 @@ In the manifest, the `metadata` may also include:
#### Notes:
- The structure of dbt artifacts is canonized by [JSON schemas](https://json-schema.org/), which are hosted at **schemas.getdbt.com**.
- Artifact versions may change in any minor version of dbt (`v1.x.0`). Each artifact is versioned independently.

## Related docs
- [Other artifacts](/reference/artifacts/other-artifacts) files such as `index.html` or `graph_summary.json`.
2 changes: 1 addition & 1 deletion website/docs/reference/artifacts/other-artifacts.md
Original file line number Diff line number Diff line change
@@ -21,7 +21,7 @@ This file is used to store a compressed representation of files dbt has parsed.

**Produced by:** commands supporting [node selection](/reference/node-selection/syntax)

Stores the networkx representation of the dbt resource DAG.
Stores the network representation of the dbt resource DAG.

### graph_summary.json

2 changes: 1 addition & 1 deletion website/docs/reference/dbt-jinja-functions/return.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "About return function"
sidebar_variable: "return"
sidebar_label: "return"
id: "return"
description: "Read this guide to understand the return Jinja function in dbt."
---
182 changes: 182 additions & 0 deletions website/docs/reference/resource-configs/databricks-configs.md
Original file line number Diff line number Diff line change
@@ -361,6 +361,188 @@ insert into analytics.replace_where_incremental
</TabItem>
</Tabs>

<VersionBlock firstVersion="1.7">

## Selecting compute per model

Beginning in version 1.7.2, you can assign which compute resource to use on a per-model basis.
For SQL models, you can select a SQL Warehouse (serverless or provisioned) or an all purpose cluster.
For details on how this feature interacts with python models, see [Specifying compute for Python models](#specifying-compute-for-python-models).
To take advantage of this capability, you will need to add compute blocks to your profile:

<File name='profile.yml'>

```yaml

<profile-name>:
target: <target-name> # this is the default target
outputs:
<target-name>:
type: databricks
catalog: [optional catalog name if you are using Unity Catalog]
schema: [schema name] # Required
host: [yourorg.databrickshost.com] # Required

### This path is used as the default compute
http_path: [/sql/your/http/path] # Required

### New compute section
compute:

### Name that you will use to refer to an alternate compute
Compute1:
http_path: [‘/sql/your/http/path’] # Required of each alternate compute

### A third named compute, use whatever name you like
Compute2:
http_path: [‘/some/other/path’] # Required of each alternate compute
...

<target-name>: # additional targets
...
### For each target, you need to define the same compute,
### but you can specify different paths
compute:

### Name that you will use to refer to an alternate compute
Compute1:
http_path: [‘/sql/your/http/path’] # Required of each alternate compute

### A third named compute, use whatever name you like
Compute2:
http_path: [‘/some/other/path’] # Required of each alternate compute
...

```

</File>

The new compute section is a map of user chosen names to objects with an http_path property.
Each compute is keyed by a name which is used in the model definition/configuration to indicate which compute you wish to use for that model/selection of models.
We recommend choosing a name that is easily recognized as the compute resources you're using, such as the name of the compute resource inside the Databricks UI.

:::note

You need to use the same set of names for compute across your outputs, though you may supply different http_paths, allowing you to use different computes in different deployment scenarios.

:::

To configure this inside of dbt Cloud, use the [extended attributes feature](/docs/dbt-cloud-environments#extended-attributes-) on the desired environments:

```yaml

compute:
Compute1:
http_path:[`/some/other/path']
Compute2:
http_path:[`/some/other/path']

```

### Specifying the compute for models

As with many other configuaration options, you can specify the compute for a model in multiple ways, using `databricks_compute`.
In your `dbt_project.yml`, the selected compute can be specified for all the models in a given directory:

<File name='dbt_project.yml'>

```yaml

...

models:
+databricks_compute: "Compute1" # use the `Compute1` warehouse/cluster for all models in the project...
my_project:
clickstream:
+databricks_compute: "Compute2" # ...except for the models in the `clickstream` folder, which will use `Compute2`.

snapshots:
+databricks_compute: "Compute1" # all Snapshot models are configured to use `Compute1`.

```

</File>

For an individual model the compute can be specified in the model config in your schema file.

<File name='schema.yml'>

```yaml

models:
- name: table_model
config:
databricks_compute: Compute1
columns:
- name: id
data_type: int

```

</File>


Alternatively the warehouse can be specified in the config block of a model's SQL file.

<File name='model.sql'>

```sql

{{
config(
materialized='table',
databricks_compute='Compute1'
)
}}
select * from {{ ref('seed') }}

```

</File>

:::note

In the absence of a specified compute, we will default to the compute specified by http_path in the top level of the output section in your profile.
This is also the compute that will be used for tasks not associated with a particular model, such as gathering metadata for all tables in a schema.

:::

To validate that the specified compute is being used, look for lines in your dbt.log like:

```
Databricks adapter ... using default compute resource.
```

or

```
Databricks adapter ... using compute resource <name of compute>.
```

### Specifying compute for Python models

Materializing a python model requires execution of SQL as well as python.
Specifically, if your python model is incremental, the current execution pattern involves executing python to create a staging table that is then merged into your target table using SQL.
The python code needs to run on an all purpose cluster, while the SQL code can run on an all purpose cluster or a SQL Warehouse.
When you specify your `databricks_compute` for a python model, you are currently only specifying which compute to use when running the model-specific SQL.
If you wish to use a different compute for executing the python itself, you must specify an alternate `http_path` in the config for the model:

<File name="model.py">

```python

def model(dbt, session):
dbt.config(
http_path="sql/protocolv1/..."
)

```

</File>

If your default compute is a SQL Warehouse, you will need to specify an all purpose cluster `http_path` in this way.

</VersionBlock>

## Persisting model descriptions

2 changes: 1 addition & 1 deletion website/docs/terms/data-wrangling.md
Original file line number Diff line number Diff line change
@@ -51,7 +51,7 @@ The cleaning stage involves using different functions so that the values in your
- Removing appropriate duplicates or nulls you found in the discovery process
- Eliminating unnecessary characters or spaces from values

Certain cleaning steps, like removing rows with null values, are helpful to do at the beginning of the process because removing nulls and duplicates from the start can increase the performance of your downstream models. In the cleaning step, it’s important to follow a standard for your transformations here. This means you should be following a consistent naming convention for your columns (especially for your <Term id="primary-key">primary keys</Term>) and casting to the same timezone and datatypes throughout your models. Examples include making sure all dates are in UTC time rather than source timezone-specific, all string in either lower or upper case, etc.
Certain cleaning steps, like removing rows with null values, are helpful to do at the beginning of the process because removing nulls and duplicates from the start can increase the performance of your downstream models. In the cleaning step, it’s important to follow a standard for your transformations here. This means you should be following a consistent naming convention for your columns (especially for your <Term id="primary-key">primary keys</Term>) and casting to the same timezone and datatypes throughout your models. Examples include making sure all dates are in UTC time rather than source timezone-specific, all strings are in either lower or upper case, etc.

:::tip dbt to the rescue!
If you're struggling to do all the cleaning on your own, remember that dbt packages ([dbt expectations](https://github.com/calogica/dbt-expectations), [dbt_utils](https://hub.getdbt.com/dbt-labs/dbt_utils/latest/), and [re_data](https://www.getre.io/)) and their macros are also available to help you clean up your data.
5 changes: 5 additions & 0 deletions website/snippets/_sl-faqs.md
Original file line number Diff line number Diff line change
@@ -10,6 +10,11 @@

As we refine MetricFlow’s API layers, some users may find it easier to set up their own custom service layers for managing query requests. This is not currently recommended, as the API boundaries around MetricFlow are not sufficiently well-defined for broad-based community use

- **Why is my query limited to 100 rows in the dbt Cloud CLI?**
- The default `limit` for query issues from the dbt Cloud CLI is 100 rows. We set this default to prevent returning unnecessarily large data sets as the dbt Cloud CLI is typically used to query the dbt Semantic Layer during the development process, not for production reporting or to access large data sets. For most workflows, you only need to return a subset of the data.

However, you can change this limit if needed by setting the `--limit` option in your query. For example, to return 1000 rows, you can run `dbt sl list metrics --limit 1000`.

- **Can I reference MetricFlow queries inside dbt models?**
- dbt relies on Jinja macros to compile SQL, while MetricFlow is Python-based and does direct SQL rendering targeting at a specific dialect. MetricFlow does not support pass-through rendering of Jinja macros, so we can’t easily reference MetricFlow queries inside of dbt models.

0 comments on commit 09407b6

Please sign in to comment.