Skip to content

Commit

Permalink
Beta explorer cll (#4903)
Browse files Browse the repository at this point in the history
## What are you changing in this pull request and why?

Public beta for Explorer's column-level lineage feature

## Checklist
- [x] Review the [Content style
guide](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/content-style-guide.md)
so my content adheres to these guidelines.
- [x] For [docs
versioning](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#about-versioning),
review how to [version a whole
page](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#adding-a-new-version)
and [version a block of
content](https://github.com/dbt-labs/docs.getdbt.com/blob/current/contributing/single-sourcing-content.md#versioning-blocks-of-content).
- [ ] Needs review from product team

Adding pages:
- [x] Add/remove page in `website/sidebars.js`
- [x] Provide a unique filename for new pages
  • Loading branch information
nghi-ly authored Feb 13, 2024
2 parents b9334e4 + b3d7c9e commit afc94f3
Show file tree
Hide file tree
Showing 8 changed files with 218 additions and 12 deletions.
49 changes: 49 additions & 0 deletions .github/ISSUE_TEMPLATE/internal-orch-team.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
name: Orchestration team - Request changes to docs
description: File a docs update request that is not already tracked in Orch team's Release Plans (Notion database).
labels: ["content","internal-orch-team"]
body:
- type: markdown
attributes:
value: |
* You can ask questions or submit ideas for the dbt docs in [Issues](https://github.com/dbt-labs/docs-internal/issues/new/choose)
* Before you file an issue read the [Contributing guide](https://github.com/dbt-labs/docs-internal#contributing).
* Check to make sure someone hasn't already opened a similar [issue](https://github.com/dbt-labs/docs-internal/issues).
- type: checkboxes
id: contributions
attributes:
label: Contributions
description: Please read the contribution docs before opening an issue or pull request.
options:
- label: I have read the contribution docs, and understand what's expected of me.

- type: textarea
attributes:
label: Link to the page on docs.getdbt.com requiring updates
description: Please link to the page or pages you'd like to see improved.
validations:
required: true

- type: textarea
attributes:
label: What part(s) of the page would you like to see updated?
description: |
- Give as much detail as you can to help us understand the change you want to see.
- Why should the docs be changed? What use cases does it support?
- What is the expected outcome?
validations:
required: true

- type: textarea
attributes:
label: Reviewers/Stakeholders/SMEs
description: List the reviewers, stakeholders, and subject matter experts (SMEs) to collaborate with for the docs update.
validations:
required: true

- type: textarea
attributes:
label: Related Jira tickets
description: Add any other context or screenshots about the feature request here.
validations:
required: false
111 changes: 111 additions & 0 deletions .github/workflows/repo-sync.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
name: Repo Sync

# **What it does**: Syncs docs.getdbt.com public repo into the docs private repo
# This GitHub Actions workflow keeps the `current` branch of those two repos in sync.
# **Why we have it**: To keep the open-source repository up-to-date
# while still having an internal repository for sensitive work.
# For more details, see https://github.com/repo-sync/repo-sync#how-it-works

on:
schedule:
- cron: '0 6,12,18 * * *' # Run at 6:00 AM, 12:00 PM, and 6:00 PM

jobs:
repo-sync:
permissions:
contents: write
pull-requests: write
name: Repo Sync
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
with:
# Use the INTERMEDIATE_BRANCH as the checkout reference
ref: ${{ secrets.INTERMEDIATE_BRANCH }}
token: ${{ secrets.GITHUB_TOKEN }}
# Fetch all history for all branches and tags
fetch-depth: 0

# Sync the source repo to the destination branch using repo-sync/github-sync
- uses: repo-sync/github-sync@v2
name: Sync repo to branch
with:
# Source repository to sync from
source_repo: ${{ secrets.SOURCE_REPO }}
# Source branch to sync from
source_branch: current
# Destination branch to sync to
destination_branch: ${{ secrets.INTERMEDIATE_BRANCH }}
github_token: ${{ secrets.WORKFLOW_TOKEN }}

- name: Ship pull request
uses: actions/github-script@v6
with:
github-token: ${{ secrets.WORKFLOW_TOKEN }}
result-encoding: string
script: |
const {owner, repo} = context.repo;
const head = '${{ secrets.INTERMEDIATE_BRANCH }}';
const base = 'current'
async function closePullRequest(prNumber) {
console.log('closing PR', prNumber)
await github.rest.pulls.update({
owner,
repo,
pull_number: prNumber,
state: 'closed'
});
console.log('closed PR', prNumber)
}
console.log('Creating new PR')
let pull, pull_number
try {
const response = await github.rest.pulls.create({
owner,
repo,
head,
base,
title: 'REPO SYNC - Public to Private',
body: 'This is an automated pull request to sync changes between the public and private repos.',
});
pull = response.data
pull_number = pull.number
console.log('Created pull request successfully', pull.html_url)
} catch (err) {
// Don't error/alert if there's no commits to sync
if (err.message?.includes('No commits')) {
console.log(err.message)
return
}
throw err
}
const { data: prFiles } = await github.rest.pulls.listFiles({ owner, repo, pull_number })
if (prFiles.length) {
console.log(prFiles.length, 'files have changed')
} else {
console.log('No files changed, closing')
await closePullRequest(pull_number)
return
}
console.log('Checking for merge conflicts')
if (pull.mergeable_state === 'dirty') {
console.log('Pull request has a conflict', pull.html_url)
await closePullRequest(pull_number)
throw new Error('PR has a conflict, please resolve manually')
}
console.log('No detected merge conflicts')


console.log('Merging the PR')
await github.rest.pulls.merge({
owner,
repo,
pull_number,
merge_method: 'merge',
})
console.log('Merged the PR successfully')
56 changes: 56 additions & 0 deletions website/docs/docs/collaborate/column-level-lineage.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
title: "Column-level lineage"
description: "Use dbt Explorer's column-level lineage to gain insights about your data at a granular level."
---

# Column-level lineage <Lifecycle status='beta' />

dbt Explorer now offers column-level lineage (CLL) for the resources in your dbt project. Analytics engineers can quickly and easily gain insight into the provenance of their data products at a more granular level. For each column in a resource (model, source, or snapshot) in a dbt project, Explorer provides end-to-end lineage for the data in that column given how it's used.

CLL is available to dbt Cloud Enterprise accounts that can use Explorer. It’s also available through the [Discovery API](/docs/dbt-cloud-apis/discovery-api).

:::tip Check out our beta
Explorer's CLL is currently available as a [public beta](/docs/dbt-versions/product-lifecycles#dbt-cloud) for Enterprise plan accounts. Please check it out!
:::

## Access the column-level lineage

There is no additional setup required for CLL if your account is on an Enterprise plan that can use Explorer. You can access the CLL by expanding the column card in the **Columns** tab of an Explorer [resource details page](/docs/collaborate/explore-projects#view-resource-details) for a model, source, or snapshot.

dbt updates the lineage after each run that's executed in the production environment. You must make sure that `dbt docs generate` runs within at least one job in the environment. Refer to [Generating metadata](/docs/collaborate/explore-projects#generate-metadata) for more details.

<Lightbox src="/img/docs/collaborate/dbt-explorer/example-cll.png" width="40%" title="Example of the Columns tab and where to expand for the CLL"/>

<LoomVideo id='3040bf2a2ade45eca7942a7aed6b730c' />

## Column-level lineage use cases {#use-cases}

Learn more about why and how you can use CLL in the following sections.

### Root cause analysis

When there is an unexpected breakage in a data pipeline, column-level lineage can be a valuable tool to understand the exact point where the error occurred in the pipeline. For example, a failing data test on a particular column in your dbt model might've stemmed from an untested column upstream. Using CLL can help quickly identify and fix breakages when they happen.

### Impact analysis

During development, analytics engineers can use column-level lineage to understand the full scope of the impact of their proposed changes. This knowledge empowers them to create higher quality pull requests that require fewer edits, as they can anticipate and preempt issues that would've been unchecked without column-level insights.

### Collaboration and efficiency

When exploring your data products, navigating column lineage allows analytics engineers and data analysts to more easily navigate and understand the origin and usage of their data, enabling them to make better decisions with higher confidence.

## Caveats

Column-level lineage relies on SQL parsing. Errors can occur when parsing fails or a column's origin is unknown (like with JSON unpacking, lateral joins, and so on). In these cases, lineage may be incomplete and dbt Cloud will provide a warning about it in the column lineage.

<Lightbox src="/img/docs/collaborate/dbt-explorer/example-parsing-error-pill.png" title="Example of warning in the full lineage graph"/>

To review the error details:
1. Click the **Expand** icon in the upper right corner to open the column's lineage graph
1. Select the node to open the column’s details panel

Possible error cases are:

- **Parsing error** &mdash; Error occurs when the SQL is ambiguous or too complex for parsing. An example of ambiguous parsing scenarios are _complex_ lateral joins.
- **Python error** &mdash; Error occurs when a Python model is used within the lineage. Due to the nature of Python models, it's not possible to parse and determine the lineage.
- **Unknown error** &mdash; Error occurs when the lineage can't be determined for an unknown reason. An example of this would be if a dbt best practice is not being followed, like using hardcoded table names instead of `ref` statements.
7 changes: 1 addition & 6 deletions website/docs/docs/collaborate/model-performance.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,13 @@
---
title: "Model performance"
sidebar_label: "Model performance"
description: "Learn about ."
description: "Learn about the performance of your models so you can make improvements to save time and money."
---

dbt Explorer provides metadata on dbt Cloud runs for in-depth model performance and quality analysis. This feature assists in reducing infrastructure costs and saving time for data teams by highlighting where to fine-tune projects and deployments &mdash; such as model refactoring or job configuration adjustments.

<LoomVideo id='98f33b3b7a374df0b7c04747eae6ef44' />

:::tip Beta

The model performance beta feature is now available in dbt Explorer! Check it out!
:::

## The Performance overview page

You can pinpoint areas for performance enhancement by using the Performance overview page. This page presents a comprehensive analysis across all project models and displays the longest-running models, those most frequently executed, and the ones with the highest failure rates during runs/tests. Data can be segmented by environment and job type which can offer insights into:
Expand Down
6 changes: 0 additions & 6 deletions website/docs/docs/collaborate/project-recommendations.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,12 +3,6 @@ title: "Project recommendations"
sidebar_label: "Project recommendations"
description: "dbt Explorer provides recommendations that you can take to improve the quality of your dbt project."
---

:::tip Beta

The project recommendations beta feature is now available in dbt Explorer! Check it out!

:::

dbt Explorer provides recommendations about your project from the `dbt_project_evaluator` [package](https://hub.getdbt.com/dbt-labs/dbt_project_evaluator/latest/) using metadata from the Discovery API.

Expand Down
1 change: 1 addition & 0 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -426,6 +426,7 @@ const sidebarSettings = {
link: { type: "doc", id: "docs/collaborate/explore-projects" },
items: [
"docs/collaborate/explore-projects",
"docs/collaborate/column-level-lineage",
"docs/collaborate/model-performance",
"docs/collaborate/project-recommendations",
"docs/collaborate/explore-multiple-projects",
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit afc94f3

Please sign in to comment.