-
Notifications
You must be signed in to change notification settings - Fork 976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Explorer update: column level lineage #4767
Changes from all commits
8bafd09
ce03b45
d8f0f09
5b586ac
4e428aa
db71dbc
5072c9b
ddd78ab
91c33c3
1d542a1
55c5ad3
29bfa9a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,53 @@ | ||
--- | ||
title: "Column level lineage" | ||
description: "dbt Explorer provides recommendations that you can take to improve the quality of your dbt project." | ||
--- | ||
|
||
dbt Explorer now offers column level lineage (CLL) for the resources in your dbt project. Analytics engineers can quickly and easily gain insight into the provenance of their data products at a more granular level. For each column in a resource (model, source, or snapshot) in a dbt project, Explorer provides end-to-end lineage for the data in that column given how it's used. | ||
|
||
Column level lineage is available to dbt Cloud Enterprise accounts that can use Explorer. It’s also available through the Discovery API. | ||
|
||
:::tip Beta | ||
Column-level lineage is now available in beta. Check it out! We'd love to [know what you think](https://docs.google.com/forms/d/e/1FAIpQLSdpCbVkGY9QwfExFonpWE4DTOKi3fQxBGLD0wwKYpkMjgcE7g/viewform)! | ||
::: | ||
|
||
## Access the column level lineage | ||
|
||
There is no additional setup required for column level lineage if your account is on an Enterprise plan that can use Explorer. You can access the column level lineage by expanding the column card in the **Columns** tab of an Explorer [resource details page](/docs/collaborate/explore-projects#view-resource-details) for a model, source, or snapshot. | ||
|
||
dbt updates the lineage after each run that's executed in the production environment. You must make sure that `docs generate` is running within at least one job in the environment. Refer to [Generating metadata](/docs/collaborate/explore-projects#generate-metadata) for more details. | ||
|
||
<Lightbox src="/img/docs/collaborate/dbt-explorer/example-cll.png" title="Example of the Columns tab and where to expand for the CLL"/> | ||
|
||
<LoomVideo id='278c948ba387457884cc6b9545793685' /> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dave-connors-3 : i'm using the loom video from the notion draft. we can update the link if you end up rerecording it, np There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @dave-connors-3 feel free to re-record, maybe on Fri or Mon? i'd like to see your take, and we may want to wait for more UX improvements to land |
||
|
||
## Column level lineage use cases {#use-cases} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
Learn more about why and how you can use column level lineage in these sections. | ||
|
||
### Root cause analysis | ||
|
||
When there is an unexpected breakage in a data pipeline, column level lineage can be a valuable tool to understand the exact point in the pipeline where the error took place. For example, a failing data test on a particular column in your dbt model might've stemmed from an untested column upstream. Using CLL can help quickly identify and fix breakages when they happen. | ||
|
||
### Impact analysis | ||
|
||
During development, analytics engineers can use column level lineage to understand the full scope of the impact of their proposed changes. This knowledge empowers them to create higher quality pull requests that require less rework, as they can anticipate and preempt issues that would've been unchecked without column level insights. | ||
|
||
### Collaboration and efficiency | ||
|
||
When exploring your data products, navigating column lineage allows analytics engineers and data analysts to more easily navigate and understand the origin and usage of their data, enabling them to make better decisions with higher confidence. | ||
|
||
## Caveats | ||
|
||
Column level lineage relies on SQL parsing. Errors can occur when parsing fails or a column's origin is unknown (like with JSON unpacking, lateral joins, and so forth). In these cases, lineage may be incomplete and dbt Cloud will provide a warning about it in the column lineage. To review the error details, open the [full lineage graph](/docs/collaborate/explore-projects#project-lineage) and select the node to open the column’s details panel. | ||
|
||
<Lightbox src="/img/docs/collaborate/dbt-explorer/example-parsing-error-pill.png" width="90%" title="Example of warning in the full lineage graph"/> | ||
|
||
Possible error cases are: | ||
|
||
- **Parsing error** — Error occurs when the SQL is ambiguous or too complex for parsing. An example of ambiguous parsing scenarios are _complex_ lateral joins. | ||
- **Python error** — Error occurs when a Python model is used within the lineage. Due to the nature of Python models, it's not possible to parse and determine the lineage. | ||
- **Unknown error** — Error occurs when the lineage can't be determined for an unknown reason. An example of this would be if a dbt best practice is not being followed, like using hardcoded table names instead of `ref` statements. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. "improper use of dbt" read a bit strong to me so trying this wording instead. this might be a bit of a mouthful tho |
||
|
||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,20 +3,20 @@ title: "Project recommendations" | |
sidebar_label: "Project recommendations" | ||
description: "dbt Explorer provides recommendations that you can take to improve the quality of your dbt project." | ||
--- | ||
|
||
:::tip Beta | ||
|
||
The project recommendations beta feature is now available in dbt Explorer! Check it out! | ||
|
||
::: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. callouts at the top of pages affect SEO so as a best practice we're moving them to a lower location on the page but making sure they're still before the fold |
||
|
||
dbt Explorer provides recommendations about your project from the `dbt_project_evaluator` [package](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest/) using metadata from the Discovery API. | ||
|
||
Explorer also offers a global view, showing all the recommendations across the project for easy sorting and summarizing. | ||
|
||
These recommendations provide insight into how you can build a more well documented, well tested, and well built project, leading to less confusion and more trust. | ||
|
||
The Recommendations overview page includes two top-level metrics measuring the test and documentation coverage of the models in your project. | ||
:::tip Beta | ||
|
||
The project recommendations beta feature is now available in dbt Explorer! Check it out! | ||
|
||
::: | ||
|
||
The **Recommendations** overview page includes two top-level metrics measuring the test and documentation coverage of the models in your project. | ||
|
||
- **Model test coverage** — The percent of models in your project (models not from a package or imported via dbt Mesh) with at least one dbt test configured on them. | ||
- **Model documentation coverage** — The percent of models in your project (models not from a package or imported via dbt Mesh) with a description. | ||
|
@@ -43,7 +43,7 @@ The Recommendations overview page includes two top-level metrics measuring the t | |
|
||
## The Recommendations tab | ||
|
||
Models, sources and exposures each also have a Recommendations tab on their resource details page, with the specific recommendations that correspond to that resource: | ||
Models, sources and exposures each also have a **Recommendations** tab on their resource details page, with the specific recommendations that correspond to that resource: | ||
|
||
<Lightbox src="/img/docs/collaborate/dbt-explorer/example-recommendations-tab.png" width="80%" title="Example of the Recommendations tab "/> | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i didn't use "open beta" here, aligned with the betas for Project recs and Model perf pages. lemme know if this doesn't work tho! happy to change