Skip to content

Commit

Permalink
Update project-dependencies.md (#4278)
Browse files Browse the repository at this point in the history
adding info about dependencies .yml and packages.yml to clarify
difference btw the two. also updating lifecycle status

[ ] tech review by @jtcohen6 
[ ] docs review
[ ] launch oct 17th at 9am pst
  • Loading branch information
mirnawong1 authored Oct 17, 2023
2 parents 59a3c5b + d817abf commit 0b6b599
Show file tree
Hide file tree
Showing 2 changed files with 51 additions and 13 deletions.
13 changes: 10 additions & 3 deletions website/docs/docs/build/packages.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ title: "Packages"
id: "packages"
---

## What is a package?

Software engineers frequently modularize code into libraries. These libraries help programmers operate with leverage: they can spend more time focusing on their unique business logic, and less time implementing code that someone else has already spent the time perfecting.

In dbt, libraries like these are called _packages_. dbt's packages are so powerful because so many of the analytic problems we encountered are shared across organizations, for example:
Expand All @@ -22,13 +22,19 @@ dbt _packages_ are in fact standalone dbt projects, with models and macros that
* Models in the package will be materialized when you `dbt run`.
* You can use `ref` in your own models to refer to models from the package.
* You can use macros in the package in your own project.
* It's important to note that defining and installing dbt packages is different from [defining and installing Python packages](/docs/build/python-models#using-pypi-packages)

:::note Using Python packages

Defining and installing dbt packages is different from [defining and installing Python packages](/docs/build/python-models#using-pypi-packages).
:::info `dependencies.yml` has replaced `packages.yml`
Starting from dbt v1.6, `dependencies.yml` has replaced `packages.yml`. This file can now contain both types of dependencies: "package" and "project" dependencies.
- "Package" dependencies lets you add source code from someone else's dbt project into your own, like a library.
- "Project" dependencies provide a different way to build on top of someone else's work in dbt. Refer to [Project dependencies](/docs/collaborate/govern/project-dependencies) for more info.
-
You can rename `packages.yml` to `dependencies.yml`, _unless_ you need to use Jinja within your packages specification. This could be necessary, for example, if you want to add an environment variable with a git token in a private git package specification.

:::


## How do I add a package to my project?
1. Add a file named <VersionBlock firstVersion="1.6"> `dependencies.yml` or </VersionBlock> `packages.yml` to your dbt project. This should be at the same level as your `dbt_project.yml` file.
2. Specify the package(s) you wish to add using one of the supported syntaxes, for example:
Expand Down Expand Up @@ -366,3 +372,4 @@ packages:
```

</File>

51 changes: 41 additions & 10 deletions website/docs/docs/collaborate/govern/project-dependencies.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,16 +6,14 @@ description: "Reference public models across dbt projects"
pagination_next: null
---

:::caution Closed Beta - dbt Cloud Enterprise
"Project" dependencies and cross-project `ref` are features of dbt Cloud Enterprise, currently in Closed Beta. To access these features while they are in beta, please contact your account team at dbt Labs.
:::info Available in Public Preview for dbt Cloud Enterprise accounts

**Prerequisites:** In order to add project dependencies and resolve cross-project `ref`, you must:
- Have the feature enabled (speak to your account team)
- Use dbt v1.6 for **both** the upstream ("producer") project and the downstream ("consumer") project.
- Have a deployment environment in the upstream ("producer") project [that is set to be your production environment](/docs/deploy/deploy-environments#set-as-production-environment-beta)
- Have a successful run of the upstream ("producer") project
Project dependencies and cross-project `ref` are features available in [dbt Cloud Enterprise](https://www.getdbt.com/pricing), currently in [Public Preview](/docs/dbt-versions/product-lifecycles#dbt-cloud).

Enterprise users can use these features by designating a [public model](/docs/collaborate/govern/model-access) and adding a [cross-project ref](#how-to-use-ref).
:::


For a long time, dbt has supported code reuse and extension by installing other projects as [packages](/docs/build/packages). When you install another project as a package, you are pulling in its full source code, and adding it to your own. This enables you to call macros and run models defined in that other project.

While this is a great way to reuse code, share utility macros, and establish a starting point for common transformations, it's not a great way to enable collaboration across teams and at scale, especially at larger organizations.
Expand All @@ -24,6 +22,33 @@ This year, dbt Labs is introducing an expanded notion of `dependencies` across m
- **Packages** &mdash; Familiar and pre-existing type of dependency. You take this dependency by installing the package's full source code (like a software library).
- **Projects** &mdash; A _new_ way to take a dependency on another project. Using a metadata service that runs behind the scenes, dbt Cloud resolves references on-the-fly to public models defined in other projects. You don't need to parse or run those upstream models yourself. Instead, you treat your dependency on those models as an API that returns a dataset. The maintainer of the public model is responsible for guaranteeing its quality and stability.


Starting in dbt v1.6 or higher, `packages.yml` has been renamed to `dependencies.yml`. However, if you need use Jinja within your packages config, such as an environment variable for your private package, you need to keep using `packages.yml` for your packages for now. Refer to [FAQs](#faqs) for more info.

## Prerequisites

In order to add project dependencies and resolve cross-project `ref`, you must:
- Use dbt v1.6 or higher for **both** the upstream ("producer") project and the downstream ("consumer") project.
- Have a deployment environment in the upstream ("producer") project [that is set to be your production environment](/docs/deploy/deploy-environments#set-as-production-environment-beta)
- Have a successful run of the upstream ("producer") project
- Have a multi-tenant or single-tenant [dbt Cloud Enterprise](https://www.getdbt.com/pricing) account (Azure ST is not supported but coming soon)

<!-- commenting out in case we can repurpose content
### About dependencies.yml
There are some important differences between using a `dependencies.yml` compared to a `packages.yml` file:
- `dependencies.yml`
- Primarily designed for dbt Mesh and cross-project reference workflow.
- Supports both Projects and non-private dbt packages (private packages aren't supported yet).
- Helps maintain your project's organization by allowing you to specify hub packages like `dbt_utils`, reducing the need for multiple YAML files.
- Does not support conditional configuration using Jinja-in-yaml
- `packages.yml`
- Does not contribute to the dbt Mesh workflow.
- Serves as a list of dbt Packages (such as dbt projects) that you want to download into your root or parent dbt project.
- Can only include packages, including private packages (doesn't support Projects)
-->
## Example

As an example, let's say you work on the Marketing team at the Jaffle Shop. The name of your team's project is `jaffle_marketing`:
Expand All @@ -37,7 +62,7 @@ name: jaffle_marketing
</File>
As part of your modeling of marketing data, you need to take a dependency on two other projects:
- `dbt_utils` as a [package](#packages-use-case): An collection of utility macros that you can use while writing the SQL for your own models. This package is, open-source public, and maintained by dbt Labs.
- `dbt_utils` as a [package](#packages-use-case): A collection of utility macros that you can use while writing the SQL for your own models. This package is, open-source public, and maintained by dbt Labs.
- `jaffle_finance` as a [project use-case](#projects-use-case): Data models about the Jaffle Shop's revenue. This project is private and maintained by your colleagues on the Finance team. You want to select from some of this project's final models, as a starting point for your own work.

<File name="dependencies.yml">
Expand Down Expand Up @@ -67,7 +92,7 @@ When you're building on top of another team's work, resolving the references in
- You don't need to mirror any conditional configuration of the upstream project such as `vars`, environment variables, or `target.name`. You can reference them directly wherever the Finance team is building their models in production. Even if the Finance team makes changes like renaming the model, changing the name of its schema, or [bumping its version](/docs/collaborate/govern/model-versions), your `ref` would still resolve successfully.
- You eliminate the risk of accidentally building those models with `dbt run` or `dbt build`. While you can select those models, you can't actually build them. This prevents unexpected warehouse costs and permissions issues. This also ensures proper ownership and cost allocation for each team's models.

### Usage
### How to use ref

**Writing `ref`:** Models referenced from a `project`-type dependency must use [two-argument `ref`](/reference/dbt-jinja-functions/ref#two-argument-variant), including the project name:

Expand All @@ -88,6 +113,8 @@ with monthly_revenue as (

**Cycle detection:** Currently, "project" dependencies can only go in one direction, meaning that the `jaffle_finance` project could not add a new model that depends, in turn, on `jaffle_marketing.roi_by_channel`. dbt will check for cycles across projects and raise errors if any are detected. We are considering support for this pattern in the future, whereby dbt would still check for node-level cycles while allowing cycles at the project level.

For more guidance on how to use dbt Mesh, refer to the dedicated [dbt Mesh guide](/guides/best-practices/how-we-mesh/mesh-1-intro).

### Comparison

If you were to instead install the `jaffle_finance` project as a `package` dependency, you would instead be pulling down its full source code and adding it to your runtime environment. This means:
Expand All @@ -100,7 +127,7 @@ There are a few cases where installing another internal project as a package can
- Unified deployments &mdash; In a production environment, if the central data platform team of Jaffle Shop wanted to schedule the deployment of models across both `jaffle_finance` and `jaffle_marketing`, they could use dbt's [selection syntax](/reference/node-selection/syntax) to create a new "passthrough" project that installed both projects as packages.
- Coordinated changes &mdash; In development, if you wanted to test the effects of a change to a public model in an upstream project (`jaffle_finance.monthly_revenue`) on a downstream model (`jaffle_marketing.roi_by_channel`) _before_ introducing changes to a staging or production environment, you can install the `jaffle_finance` package as a package within `jaffle_marketing`. The installation can point to a specific git branch, however, if you find yourself frequently needing to perform end-to-end testing across both projects, we recommend you re-examine if this represents a stable interface boundary.

These are the exceptions, rather than the rule. Installing another team's project as a package adds complexity, latency, and risk of unnecessary costs. By defining clear interface boundaries across teams, by serving one team's public models as "APIs" to another, and by enabling practitioners to develop with a more narrowly-defined scope, we can enable more people to contribute, with more confidence, while requiring less context upfront.
These are the exceptions, rather than the rule. Installing another team's project as a package adds complexity, latency, and risk of unnecessary costs. By defining clear interface boundaries across teams, by serving one team's public models as "APIs" to another, and by enabling practitioners to develop with a more narrowly defined scope, we can enable more people to contribute, with more confidence, while requiring less context upfront.

## FAQs

Expand All @@ -109,3 +136,7 @@ These are the exceptions, rather than the rule. Installing another team's projec

If you're using private packages with the [git token method](/docs/build/packages#git-token-method), you must define them in the `packages.yml` file instead of the `dependencies.yml` file. This is because conditional rendering (like Jinja-in-yaml) is not supported.
</details>


## Related docs
- [dbt Mesh guide](/guides/best-practices/how-we-mesh/mesh-1-intro)

0 comments on commit 0b6b599

Please sign in to comment.