Skip to content

Commit

Permalink
Merge branch 'current' into quickstarts-q32023-update
Browse files Browse the repository at this point in the history
  • Loading branch information
runleonarun committed Oct 27, 2023
2 parents badaed7 + 35a5949 commit c9422a4
Show file tree
Hide file tree
Showing 20 changed files with 38 additions and 1,136 deletions.
8 changes: 4 additions & 4 deletions website/blog/2021-11-29-dbt-airflow-spiritual-alignment.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,22 +144,22 @@ An analyst will be in the dark when attempting to debug this, and will need to r
This can be perfectly ok, in the event your data team is structured for data engineers to exclusively own dbt modeling duties, but that’s a quite uncommon org structure pattern from what I’ve seen. And if you have easy solutions for this analyst-blindness problem, I’d love to hear them.

Once the data has been ingested, dbt Core can be used to model it for consumption. Most of the time, users choose to either:
Use the dbt CLI+ [BashOperator](https://registry.astronomer.io/providers/apache-airflow/modules/bashoperator) with Airflow (If you take this route, you can use an external secrets manager to manage credentials externally), or
Use the dbt Core CLI+ [BashOperator](https://registry.astronomer.io/providers/apache-airflow/modules/bashoperator) with Airflow (If you take this route, you can use an external secrets manager to manage credentials externally), or
Use the [KubernetesPodOperator](https://registry.astronomer.io/providers/kubernetes/modules/kubernetespodoperator) for each dbt job, as data teams have at places like [Gitlab](https://gitlab.com/gitlab-data/analytics/-/blob/master/dags/transformation/dbt_trusted_data.py#L72) and [Snowflake](https://www.snowflake.com/blog/migrating-airflow-from-amazon-ec2-to-kubernetes/).

Both approaches are equally valid; the right one will depend on the team and use case at hand.

| | Dependency management | Overhead | Flexibility | Infrastructure Overhead |
|---|---|---|---|---|
| dbt CLI + BashOperator | Medium | Low | Medium | Low |
| dbt Core CLI + BashOperator | Medium | Low | Medium | Low |
| Kubernetes Pod Operator | Very Easy | Medium | High | Medium |
| | | | | |

If you have DevOps resources available to you, and your team is comfortable with concepts like Kubernetes pods and containers, you can use the KubernetesPodOperator to run each job in a Docker image so that you never have to think about Python dependencies. Furthermore, you’ll create a library of images containing your dbt models that can be run on any containerized environment. However, setting up development environments, CI/CD, and managing the arrays of containers can mean a lot of overhead for some teams. Tools like the [astro-cli](https://github.com/astronomer/astro-cli) can make this easier, but at the end of the day, there’s no getting around the need for Kubernetes resources for the Gitlab approach.

If you’re just looking to get started or just don’t want to deal with containers, using the BashOperator to call the dbt CLI can be a great way to begin scheduling your dbt workloads with Airflow.
If you’re just looking to get started or just don’t want to deal with containers, using the BashOperator to call the dbt Core CLI can be a great way to begin scheduling your dbt workloads with Airflow.

It’s important to note that whichever approach you choose, this is just a first step; your actual production needs may have more requirements. If you need granularity and dependencies between your dbt models, like the team at [Updater does, you may need to deconstruct the entire dbt DAG in Airflow.](https://www.astronomer.io/guides/airflow-dbt#use-case-2-dbt-airflow-at-the-model-level) If you’re okay managing some extra dependencies, but want to maximize control over what abstractions you expose to your end users, you may want to use the [GoCardlessProvider](https://github.com/gocardless/airflow-dbt), which wraps the BashOperator and dbt CLI.
It’s important to note that whichever approach you choose, this is just a first step; your actual production needs may have more requirements. If you need granularity and dependencies between your dbt models, like the team at [Updater does, you may need to deconstruct the entire dbt DAG in Airflow.](https://www.astronomer.io/guides/airflow-dbt#use-case-2-dbt-airflow-at-the-model-level) If you’re okay managing some extra dependencies, but want to maximize control over what abstractions you expose to your end users, you may want to use the [GoCardlessProvider](https://github.com/gocardless/airflow-dbt), which wraps the BashOperator and dbt Core CLI.

#### Rerunning jobs from failure

Expand Down
2 changes: 1 addition & 1 deletion website/blog/2022-02-23-founding-an-AE-team-smartsheet.md
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ In the interest of getting a proof of concept out the door (I highly favor focus

- Our own Dev, Prod & Publish databases
- Our own code repository which we managed independently
- dbt CLI
- dbt Core CLI
- Virtual Machine running dbt on a schedule

None of us had used dbt before, but we’d heard amazing things about it. We hotly debated the choice between dbt and building our own lightweight stack, and looking back now, I couldn’t be happier with choosing dbt. While there was a learning curve that slowed us down initially, we’re now seeing the benefit of that decision. Onboarding new analysts is a breeze and much of the functionality we need is pre-built. The more we use the tool, the faster we are at using it and the more value we’re gaining from the product.
Expand Down
6 changes: 3 additions & 3 deletions website/blog/2022-07-26-pre-commit-dbt.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ The last step of our flow is to make those pre-commit checks part of the day-to-

Adding periodic pre-commit checks can be done in 2 different ways, through CI (Continuous Integration) actions, or as git hooks when running dbt locally

#### a) Adding pre-commit-dbt to the CI flow (works for dbt Cloud and dbt CLI users)
#### a) Adding pre-commit-dbt to the CI flow (works for dbt Cloud and dbt Core users)

The example below will assume GitHub actions as the CI engine but similar behavior could be achieved in any other CI tool.

Expand Down Expand Up @@ -237,9 +237,9 @@ With that information, I could now go back to dbt, document my model customers a

We could set up rules that prevent any change to be merged if the GitHub action fails. Alternatively, this action step can be defined as merely informational.

#### b) Installing the pre-commit git hooks (for dbt CLI users)
#### b) Installing the pre-commit git hooks (for dbt Core users)

If we develop locally with the dbt CLI, we could also execute `pre-commit install` to install the git hooks. What it means then is that every time we want to commit code in git, the pre-commit hooks will run and will prevent us from committing if any step fails.
If we develop locally with the dbt Core CLI, we could also execute `pre-commit install` to install the git hooks. What it means then is that every time we want to commit code in git, the pre-commit hooks will run and will prevent us from committing if any step fails.

If we want to commit code without performing all the steps of the pre-hook we could use the environment variable SKIP or the git flag `--no-verify` as described [in the documentation](https://pre-commit.com/#temporarily-disabling-hooks). (e.g. we might want to skip the auto `dbt docs generate` locally to prevent it from running at every commit and rely on running it manually from time to time)

Expand Down
2 changes: 1 addition & 1 deletion website/docs/dbt-cli/cli-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: "CLI overview"
description: "Run your dbt project from the command line."
---

dbt Core ships with a command-line interface (CLI) for running your dbt project. The dbt CLI is free to use and available as an [open source project](https://github.com/dbt-labs/dbt-core).
dbt Core ships with a command-line interface (CLI) for running your dbt project. dbt Core and its CLI are free to use and available as an [open source project](https://github.com/dbt-labs/dbt-core).

When using the command line, you can run commands and do other work from the current or _working directory_ on your computer. Before running the dbt project from the command line, make sure the working directory is your dbt project directory. For more details, see "[Creating a dbt project](/docs/build/projects)."

Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/build/jinja-macros.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ You can recognize Jinja based on the delimiters the language uses, which we refe

When used in a dbt model, your Jinja needs to compile to a valid query. To check what SQL your Jinja compiles to:
* **Using dbt Cloud:** Click the compile button to see the compiled SQL in the Compiled SQL pane
* **Using the dbt CLI:** Run `dbt compile` from the command line. Then open the compiled SQL file in the `target/compiled/{project name}/` directory. Use a split screen in your code editor to keep both files open at once.
* **Using dbt Core:** Run `dbt compile` from the command line. Then open the compiled SQL file in the `target/compiled/{project name}/` directory. Use a split screen in your code editor to keep both files open at once.

### Macros
[Macros](/docs/build/jinja-macros) in Jinja are pieces of code that can be reused multiple times – they are analogous to "functions" in other programming languages, and are extremely useful if you find yourself repeating code across multiple models. Macros are defined in `.sql` files, typically in your `macros` directory ([docs](/reference/project-configs/macro-paths)).
Expand Down
2 changes: 1 addition & 1 deletion website/docs/docs/build/tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ Done. PASS=2 WARN=0 ERROR=0 SKIP=0 TOTAL=2
```
3. Check out the SQL dbt is running by either:
* **dbt Cloud:** checking the Details tab.
* **dbt CLI:** checking the `target/compiled` directory
* **dbt Core:** checking the `target/compiled` directory


**Unique test**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -144,7 +144,7 @@ To complete setup, follow the steps below in the dbt Cloud application.
| ----- | ----- |
| **Log in with** | Azure AD Single Tenant |
| **Client&nbspID** | Paste the **Application (client) ID** recorded in the steps above |
| **Client Secret** | Paste the **Client Secret** (remember to use the Secret Value instead of the Secret ID) recorded in the steps above |
| **Client&nbsp;Secret** | Paste the **Client Secret** (remember to use the Secret Value instead of the Secret ID) recorded in the steps above; <br />**Note:** When the client secret expires, an Azure AD admin will have to generate a new one to be pasted into dbt Cloud for uninterrupted application access. |
| **Tenant&nbsp;ID** | Paste the **Directory (tenant ID)** recorded in the steps above |
| **Domain** | Enter the domain name for your Azure directory (such as `fishtownanalytics.com`). Only use the primary domain; this won't block access for other domains. |
| **Slug** | Enter your desired login slug. Users will be able to log into dbt Cloud by navigating to `https://YOUR_ACCESS_URL/enterprise-login/LOGIN-SLUG`, replacing `YOUR_ACCESS_URL` with the [appropriate Access URL](/docs/cloud/manage-access/sso-overview#auth0-multi-tenant-uris) for your region and plan. Login slugs must be unique across all dbt Cloud accounts, so pick a slug that uniquely identifies your company. |
Expand Down
4 changes: 2 additions & 2 deletions website/docs/docs/deploy/deployment-tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,11 +108,11 @@ If your organization is using [Prefect](https://www.prefect.io/), the way you wi

## Dagster

If your organization is using [Dagster](https://dagster.io/), you can use the [dagster_dbt](https://docs.dagster.io/_apidocs/libraries/dagster-dbt) library to integrate dbt commands into your pipelines. This library supports the execution of dbt through dbt Cloud, dbt CLI and the dbt RPC server. Running dbt from Dagster automatically aggregates metadata about your dbt runs. Refer to the [example pipeline](https://dagster.io/blog/dagster-dbt) for details.
If your organization is using [Dagster](https://dagster.io/), you can use the [dagster_dbt](https://docs.dagster.io/_apidocs/libraries/dagster-dbt) library to integrate dbt commands into your pipelines. This library supports the execution of dbt through dbt Cloud, dbt Core, and the dbt RPC server. Running dbt from Dagster automatically aggregates metadata about your dbt runs. Refer to the [example pipeline](https://dagster.io/blog/dagster-dbt) for details.

## Kestra

If your organization uses [Kestra](http://kestra.io/), you can leverage the [dbt plugin](https://kestra.io/plugins/plugin-dbt) to orchestrate dbt Cloud and dbt Core jobs. Kestra's user interface (UI) has built-in [Blueprints](https://kestra.io/docs/user-interface-guide/blueprints), providing ready-to-use workflows. Navigate to the Blueprints page in the left navigation menu and [select the dbt tag](https://demo.kestra.io/ui/blueprints/community?selectedTag=36) to find several examples of scheduling dbt CLI commands and dbt Cloud jobs as part of your data pipelines. After each scheduled or ad-hoc workflow execution, the Outputs tab in the Kestra UI allows you to download and preview all dbt build artifacts. The Gantt and Topology view additionally render the metadata to visualize dependencies and runtimes of your dbt models and tests. The dbt Cloud task provides convenient links to easily navigate between Kestra and dbt Cloud UI.
If your organization uses [Kestra](http://kestra.io/), you can leverage the [dbt plugin](https://kestra.io/plugins/plugin-dbt) to orchestrate dbt Cloud and dbt Core jobs. Kestra's user interface (UI) has built-in [Blueprints](https://kestra.io/docs/user-interface-guide/blueprints), providing ready-to-use workflows. Navigate to the Blueprints page in the left navigation menu and [select the dbt tag](https://demo.kestra.io/ui/blueprints/community?selectedTag=36) to find several examples of scheduling dbt Core commands and dbt Cloud jobs as part of your data pipelines. After each scheduled or ad-hoc workflow execution, the Outputs tab in the Kestra UI allows you to download and preview all dbt build artifacts. The Gantt and Topology view additionally render the metadata to visualize dependencies and runtimes of your dbt models and tests. The dbt Cloud task provides convenient links to easily navigate between Kestra and dbt Cloud UI.

## Automation servers

Expand Down
2 changes: 1 addition & 1 deletion website/docs/faqs/Project/which-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ id: which-schema
---
By default, dbt builds models in your target schema. To change your target schema:
* If you're developing in **dbt Cloud**, these are set for each user when you first use a development environment.
* If you're developing with the **dbt CLI**, this is the `schema:` parameter in your `profiles.yml` file.
* If you're developing with **dbt Core**, this is the `schema:` parameter in your `profiles.yml` file.

If you wish to split your models across multiple schemas, check out the docs on [using custom schemas](/docs/build/custom-schemas).

Expand Down
2 changes: 1 addition & 1 deletion website/docs/faqs/Runs/checking-logs.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ To check out the SQL that dbt is running, you can look in:

* dbt Cloud:
* Within the run output, click on a model name, and then select "Details"
* dbt CLI:
* dbt Core:
* The `target/compiled/` directory for compiled `select` statements
* The `target/run/` directory for compiled `create` statements
* The `logs/dbt.log` file for verbose logging.
2 changes: 1 addition & 1 deletion website/docs/faqs/Runs/failed-tests.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ To debug a failing test, find the SQL that dbt ran by:
* dbt Cloud:
* Within the test output, click on the failed test, and then select "Details"

* dbt CLI:
* dbt Core:
* Open the file path returned as part of the error message.
* Navigate to the `target/compiled/schema_tests` directory for all compiled test queries

Expand Down
2 changes: 1 addition & 1 deletion website/docs/guides/advanced/using-jinja.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ If you'd like to work through this query, add [this CSV](https://github.com/dbt-

While working through the steps of this model, we recommend that you have your compiled SQL open as well, to check what your Jinja compiles to. To do this:
* **Using dbt Cloud:** Click the compile button to see the compiled SQL in the right hand pane
* **Using the dbt CLI:** Run `dbt compile` from the command line. Then open the compiled SQL file in the `target/compiled/{project name}/` directory. Use a split screen in your code editor to keep both files open at once.
* **Using dbt Core:** Run `dbt compile` from the command line. Then open the compiled SQL file in the `target/compiled/{project name}/` directory. Use a split screen in your code editor to keep both files open at once.

## Write the SQL without Jinja
Consider a data model in which an `order` can have many `payments`. Each `payment` may have a `payment_method` of `bank_transfer`, `credit_card` or `gift_card`, and therefore each `order` can have multiple `payment_methods`
Expand Down
Loading

0 comments on commit c9422a4

Please sign in to comment.