Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explorer update: column level lineage #4767

Closed
wants to merge 12 commits into from
53 changes: 53 additions & 0 deletions website/docs/docs/collaborate/column-level-lineage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
title: "Column level lineage"
description: "dbt Explorer provides recommendations that you can take to improve the quality of your dbt project."
---

dbt Explorer now offers column level lineage (CLL) for the resources in your dbt project. Analytics engineers can quickly and easily gain insight into the provenance of their data products at a more granular level. For each column in a resource (model, source, or snapshot) in a dbt project, Explorer provides end-to-end lineage for the data in that column given how it's used.

Column level lineage is available to dbt Cloud Enterprise accounts that can use Explorer. It’s also available through the Discovery API.

:::tip Beta
Column-level lineage is now available in beta. Check it out! We'd love to [know what you think](https://docs.google.com/forms/d/e/1FAIpQLSdpCbVkGY9QwfExFonpWE4DTOKi3fQxBGLD0wwKYpkMjgcE7g/viewform)!
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i didn't use "open beta" here, aligned with the betas for Project recs and Model perf pages. lemme know if this doesn't work tho! happy to change

:::

## Access the column level lineage

There is no additional setup required for column level lineage if your account is on an Enterprise plan that can use Explorer. You can access the column level lineage by expanding the column card in the **Columns** tab of an Explorer [resource details page](/docs/collaborate/explore-projects#view-resource-details) for a model, source, or snapshot.

dbt updates the lineage after each run that's executed in the production environment. You must make sure that `docs generate` is running within at least one job in the environment. Refer to [Generating metadata](/docs/collaborate/explore-projects#generate-metadata) for more details.

<Lightbox src="/img/docs/collaborate/dbt-explorer/example-cll.png" title="Example of the Columns tab and where to expand for the CLL"/>

<LoomVideo id='278c948ba387457884cc6b9545793685' />
Copy link
Contributor Author

@nghi-ly nghi-ly Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dave-connors-3 : i'm using the loom video from the notion draft. we can update the link if you end up rerecording it, np

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dave-connors-3 feel free to re-record, maybe on Fri or Mon? i'd like to see your take, and we may want to wait for more UX improvements to land


## Column level lineage use cases {#use-cases}
Copy link
Contributor Author

@nghi-ly nghi-ly Jan 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated heading title so it shows up higher (ie "weighted" more) in our search list as well as having a tidy URL. here's a comparison where the "distinct" example is at the top of the search list:

Screenshot 2024-01-19 at 8 51 40 AM
Screenshot 2024-01-19 at 8 52 27 AM


Learn more about why and how you can use column level lineage in these sections.

### Root cause analysis

When there is an unexpected breakage in a data pipeline, column level lineage can be a valuable tool to understand the exact point in the pipeline where the error took place. For example, a failing data test on a particular column in your dbt model might've stemmed from an untested column upstream. Using CLL can help quickly identify and fix breakages when they happen.

### Impact analysis

During development, analytics engineers can use column level lineage to understand the full scope of the impact of their proposed changes. This knowledge empowers them to create higher quality pull requests that require less rework, as they can anticipate and preempt issues that would've been unchecked without column level insights.

### Collaboration and efficiency

When exploring your data products, navigating column lineage allows analytics engineers and data analysts to more easily navigate and understand the origin and usage of their data, enabling them to make better decisions with higher confidence.

## Caveats

Column level lineage relies on SQL parsing. Errors can occur when parsing fails or a column's origin is unknown (like with JSON unpacking, lateral joins, and so forth). In these cases, lineage may be incomplete and dbt Cloud will provide a warning about it in the column lineage. To review the error details, open the [full lineage graph](/docs/collaborate/explore-projects#project-lineage) and select the node to open the column’s details panel.

<Lightbox src="/img/docs/collaborate/dbt-explorer/example-parsing-error-pill.png" width="90%" title="Example of warning in the full lineage graph"/>

Possible error cases are:

- **Parsing error** &mdash; Error occurs when the SQL is ambiguous or too complex for parsing. An example of ambiguous parsing scenarios are _complex_ lateral joins.
- **Python error** &mdash; Error occurs when a Python model is used within the lineage. Due to the nature of Python models, it's not possible to parse and determine the lineage.
- **Unknown error** &mdash; Error occurs when the lineage can't be determined for an unknown reason. An example of this would be if a dbt best practice is not being followed, like using hardcoded table names instead of `ref` statements.
Copy link
Contributor Author

@nghi-ly nghi-ly Jan 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"improper use of dbt" read a bit strong to me so trying this wording instead. this might be a bit of a mouthful tho




16 changes: 8 additions & 8 deletions website/docs/docs/collaborate/project-recommendations.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,20 @@ title: "Project recommendations"
sidebar_label: "Project recommendations"
description: "dbt Explorer provides recommendations that you can take to improve the quality of your dbt project."
---

:::tip Beta

The project recommendations beta feature is now available in dbt Explorer! Check it out!

:::
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

callouts at the top of pages affect SEO so as a best practice we're moving them to a lower location on the page but making sure they're still before the fold


dbt Explorer provides recommendations about your project from the `dbt_project_evaluator` [package](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest/) using metadata from the Discovery API.

Explorer also offers a global view, showing all the recommendations across the project for easy sorting and summarizing.

These recommendations provide insight into how you can build a more well documented, well tested, and well built project, leading to less confusion and more trust.

The Recommendations overview page includes two top-level metrics measuring the test and documentation coverage of the models in your project.
:::tip Beta

The project recommendations beta feature is now available in dbt Explorer! Check it out!

:::

The **Recommendations** overview page includes two top-level metrics measuring the test and documentation coverage of the models in your project.

- **Model test coverage** &mdash; The percent of models in your project (models not from a package or imported via dbt Mesh) with at least one dbt test configured on them.
- **Model documentation coverage** &mdash; The percent of models in your project (models not from a package or imported via dbt Mesh) with a description.
Expand All @@ -43,7 +43,7 @@ The Recommendations overview page includes two top-level metrics measuring the t

## The Recommendations tab

Models, sources and exposures each also have a Recommendations tab on their resource details page, with the specific recommendations that correspond to that resource:
Models, sources and exposures each also have a **Recommendations** tab on their resource details page, with the specific recommendations that correspond to that resource:

<Lightbox src="/img/docs/collaborate/dbt-explorer/example-recommendations-tab.png" width="80%" title="Example of the Recommendations tab "/>

Expand Down
1 change: 1 addition & 0 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -425,6 +425,7 @@ const sidebarSettings = {
link: { type: "doc", id: "docs/collaborate/explore-projects" },
items: [
"docs/collaborate/explore-projects",
"docs/collaborate/column-level-lineage",
"docs/collaborate/model-performance",
"docs/collaborate/project-recommendations",
"docs/collaborate/explore-multiple-projects",
Expand Down
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading