Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Advanced CI overview and related pages #6033

Merged
merged 17 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 27 additions & 0 deletions website/docs/docs/deploy/about-ci.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
---
title: "About continuous integration (CI) in dbt Cloud"
sidebar_label: "About continuous integration"
pagination_prev: null
pagination_next: "docs/deploy/continuous-integration"
hide_table_of_contents: true
---

Use [CI jobs](/docs/deploy/ci-jobs) in dbt Cloud to set up automation for testing code changes before merging to production. And, [enable Advanced CI features](/docs/dbt-cloud-environments#account-access-to-advanced-ci-features) for these jobs to evaluate whether the code changes are producing the appropriate data changes you want by reviewing the comparison differences dbt provides.
nghi-ly marked this conversation as resolved.
Show resolved Hide resolved

Refer to the guide [Get started with continuous integration tests](/guides/set-up-ci?step=1) for more information.

<div className="grid--2-col" >

<Card
title="Continuous integration"
body="Set up CI checks to test every single change prior to deploying the code to production."
link="/docs/deploy/continuous-integration"
icon="dbt-bit"/>

<Card
title="Advanced CI (beta)"
body="Compare the differences between what's in the production environment and the pull request before merging those changes, ensuring that you're always shipping trusted data products."
link="/docs/deploy/advanced-ci"
icon="dbt-bit"/>

</div><br />
41 changes: 31 additions & 10 deletions website/docs/docs/deploy/advanced-ci.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,24 +5,45 @@ sidebar_label: "Advanced CI"
description: "Advanced CI enables developers to compare changes by demonstrating the changes the code produces."
---

Advanced CI helps developers answer the question, “Will this PR build the correct changes in production?” By demonstrating the data changes that code changes produce, users can ensure they always ship trusted data products as they develop.
# Advanced CI <Lifecycle status="preview" />

Customers control what data to use and may implement synthetic data if pre-production or development data is heavily regulated or sensitive. The data selected by users is cached on dbt Labs' systems for up to 30 days. dbt Labs does not access Advanced CI cached data for its benefit, and the data is only used to provide services to clients as they direct. This caching optimizes compute usage so that the entire comparison is not rerun against the data warehouse each time the **Compare** tab is viewed.
[Continuous integration workflows](/docs/deploy/continuous-integration) help increase the governance and improve the quality of the data. Additionally for these CI jobs, you can use Advanced CI features, such as [compare changes](#compare-changes), that provide details about the changes between what's currently in your production environment and the pull request's latest commit, giving you observability into how data changes are affected by your code changes. By analyzing the data changes that code changes produce, you can ensure you're always shipping trustworthy data products as you're developing.

## Data caching
:::tip Preview feature
The compare changes feature is currently available as a [preview](/docs/dbt-versions/product-lifecycles#dbt-cloud) in dbt Cloud. dbt Labs plans to provide additional Advanced CI features in the near future. More info coming soon.

When you run Advanced CI (by enabling **Compare changes**), dbt Cloud stores a cache of no more than 100 records for each modified model. By caching this data, users can view the examples of changed data without rerunning the comparison against the data warehouse every time. To display the changes, dbt Cloud uses a cached version of a sample of data records. These data records are queried from the database using the connection configuration (such as user, role, service account, and so on.) set in the CI job's environment.
:::

<Lightbox src="/img/docs/deploy/compare-changes.png" width="60%" title="The compare tab of the CI job in dbt Cloud" />
## Compare changes feature {#compare-changes}

The cache is encrypted, stored in Amazon S3 or Azure blob storage in your account’s region, and automatically deleted after 30 days. No data is retained on dbt Labs' systems beyond this period. Users accessing a CI run that is more than 30 days old will not be able to see the comparison; instead, they will see a message indicating that the data has expired. No other third-party subcontractor(s), aside from the storage subcontractor(s), has access to the cached data.
For [CI jobs](/docs/deploy/ci-jobs) that have the **Run compare changes** option enabled, dbt Cloud compares the changes between the last applied state of the production environment (defaulting to deferral for lower compute costs) and the latest changes from the pull request, whenever a pull request is opened or new commits are pushed.

<Lightbox src="/img/docs/deploy/compare-expired.png" width="60%" title="The compare tab once the results have expired" />
dbt reports the comparison differences in:

- **dbt Cloud** &mdash; Shows the changes (if any) to the data's primary keys, rows, and columns in the [Compare tab](/docs/deploy/run-visibility#compare-tab) from the [Job run details](/docs/deploy/run-visibility#job-run-details) page.
- **The pull request from your Git provider** &mdash; Shows a summary of the changes as a Git comment.

<Lightbox src="/img/docs/dbt-cloud/example-ci-compare-changes-tab.png" width="85%" title="Example of the Compare tab" />

## About the cached data

When [comparing changes](#compare-changes), dbt Cloud stores a cache of no more than 100 records for each modified model. By caching this data, you can view the examples of changed data without rerunning the comparison against the data warehouse every time (optimizing for lower compute costs). To display the changes, dbt Cloud uses a cached version of a sample of the data records. These data records are queried from the database using the connection configuration (such as user, role, service account, and so on) that's set in the CI job's environment.

You control what data to use. This may include synthetic data if pre-production or development data is heavily regulated or sensitive.

- The selected data is cached on dbt Labs' systems for up to 30 days. No data is retained on dbt Labs' systems beyond this period.
- The cache is encrypted and stored in an Amazon S3 or Azure blob storage in your account’s region.
- dbt Labs will not access cached data from Advanced CI for its benefit and the data is only used to provide services as directed by you.
- Third-party subcontractors, other than storage subcontractors, will not have access to the cached data.

If you access a CI job run that's more than 30 days old, you will not be able to see the comparison results. Instead, a message will appear indicating that the data has expired.

<Lightbox src="/img/docs/deploy/compare-expired.png" width="60%" title="Example of message about expired data in the Compare tab" />

## Connection permissions

The **Compare changes** feature uses the same credentials as your CI job, as defined in your CI job’s environment. Since all users will be able to view the comparison results and the cached data, the account administrator must ensure that client CI credentials are appropriately restricted.
The compare changes feature uses the same credentials as the CI job, as defined in the CI job’s environment. The dbt Cloud administrator must ensure that client CI credentials are appropriately restricted since all customer's account users will be able to view the comparison results and the cached data.

In particular, if you use dynamic data masking in your data warehouse, the cached data will no longer be dynamically masked in the Advanced CI output, depending on the permissions of the users who view it. We recommend limiting user access to unmasked data or considering using synthetic data for the Advanced CI testing functionality.
If using dynamic data masking in the data warehouse, the cached data will no longer be dynamically masked in the Advanced CI output, depending on the permissions of the users who view it. dbt Labs recommends limiting user access to unmasked data or considering using synthetic data for the Advanced CI testing functionality.

<Lightbox src="/img/docs/deploy/compare-credentials.png" width="60%" title="The credentials in the user settings" />
<Lightbox src="/img/docs/deploy/compare-credentials.png" width="60%" title="Example of credentials in the user settings" />
10 changes: 6 additions & 4 deletions website/docs/docs/deploy/ci-jobs.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,10 @@ dbt Labs recommends that you create your CI job in a dedicated dbt Cloud [deploy

### Prerequisites
- You have a dbt Cloud account.
- For the [concurrent CI checks](/docs/deploy/continuous-integration#concurrent-ci-checks) and [smart cancellation of stale builds](/docs/deploy/continuous-integration#smart-cancellation) features, your dbt Cloud account must be on the [Team or Enterprise plan](https://www.getdbt.com/pricing/).
- For the [compare changes](/docs/deploy/continuous-integration#compare-changes) feature, your dbt Cloud account must have access to Advanced CI. Please ask your [dbt Cloud administrator to enable](/docs/dbt-cloud-environments#account-access-to-advanced-ci-features) this for you.
- CI features:
- For both the [concurrent CI checks](/docs/deploy/continuous-integration#concurrent-ci-checks) and [smart cancellation of stale builds](/docs/deploy/continuous-integration#smart-cancellation) features, your dbt Cloud account must be on the [Team or Enterprise plan](https://www.getdbt.com/pricing/).
- [Advanced CI](/docs/deploy/advanced-ci) features:<Lifecycle status="preview" />
- For the [compare changes](/docs/deploy/advanced-ci#compare-changes) feature, your dbt Cloud account must have access to Advanced CI. Please ask your [dbt Cloud administrator to enable](/docs/dbt-cloud-environments#account-access-to-advanced-ci-features) this for you.
- Set up a [connection with your Git provider](/docs/cloud/git/git-configuration-in-dbt-cloud). This integration lets dbt Cloud run jobs on your behalf for job triggering.
- If you're using a native [GitLab](/docs/cloud/git/connect-gitlab) integration, you need a paid or self-hosted account that includes support for GitLab webhooks and [project access tokens](https://docs.gitlab.com/ee/user/project/settings/project_access_tokens.html). If you're using GitLab Free, merge requests will trigger CI jobs but CI job status updates (success or failure of the job) will not be reported back to GitLab.

Expand All @@ -33,7 +35,7 @@ To make CI job creation easier, many options on the **CI job** page are set to d

1. Options in the **Execution settings** section:
- **Commands** &mdash; By default, it includes the `dbt build --select state:modified+` command. This informs dbt Cloud to build only new or changed models and their downstream dependents. Importantly, state comparison can only happen when there is a deferred environment selected to compare state to. Click **Add command** to add more [commands](/docs/deploy/job-commands) that you want to be invoked when this job runs.
- **Run compare changes**<Lifecycle status="beta" /> &mdash; Enable this option to compare the last applied state of the production environment (if one exists) with the latest changes from the pull request, and identify what those differences are. To enable record-level comparison and primary key analysis, you must add a [primary key constraint](/reference/resource-properties/constraints) or [uniqueness test](/reference/resource-properties/data-tests#unique). Otherwise, you'll receive a "Primary key missing" error message in dbt Cloud.
- **Run compare changes**<Lifecycle status="preview" /> &mdash; Enable this option to compare the last applied state of the production environment (if one exists) with the latest changes from the pull request, and identify what those differences are. To enable record-level comparison and primary key analysis, you must add a [primary key constraint](/reference/resource-properties/constraints) or [uniqueness test](/reference/resource-properties/data-tests#unique). Otherwise, you'll receive a "Primary key missing" error message in dbt Cloud.

To review the comparison report, navigate to the [Compare tab](/docs/deploy/run-visibility#compare-tab) in the job run's details. A summary of the report is also available from the pull request in your Git provider (see the [CI report example](#example-ci-report)).
- **Compare changes against an environment (Deferral)** &mdash; By default, it’s set to the **Production** environment if you created one. This option allows dbt Cloud to check the state of the code in the PR against the code running in the deferred environment, so as to only check the modified code, instead of building the full table or the entire DAG.
Expand All @@ -60,7 +62,7 @@ The following is an example of a CI check in a GitHub pull request. The green ch

<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/example-github-pr.png" width="60%" title="Example of CI check in GitHub pull request"/>

### Example of CI report in pull request <Lifecycle status="beta" /> {#example-ci-report}
### Example of CI report in pull request <Lifecycle status="preview" /> {#example-ci-report}
The following is an example of a CI report in a GitHub pull request, which is shown when the **Run compare changes** option is enabled for the CI job. It displays a high-level summary of the models that changed from the pull request.

<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/example-github-ci-report.png" width="75%" title="Example of CI report comment in GitHub pull request"/>
Expand Down
33 changes: 7 additions & 26 deletions website/docs/docs/deploy/continuous-integration.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,11 @@ dbt Cloud deletes the temporary schema from your <Term id="data-warehouse" /> w

The [dbt Cloud scheduler](/docs/deploy/job-scheduler) executes CI jobs differently from other deployment jobs in these important ways:

<Expandable alt_header="Concurrent CI checks">
- **Concurrent CI checks** &mdash; CI runs triggered by the same dbt Cloud CI job execute concurrently (in parallel), when appropriate
nghi-ly marked this conversation as resolved.
Show resolved Hide resolved
- **Smart cancellation of stale builds** &mdash; Automatically cancels stale, in-flight CI runs when there are new commits to the PR
nghi-ly marked this conversation as resolved.
Show resolved Hide resolved
- **Run slot treatment** &mdash; CI runs don't consume a run slot
nghi-ly marked this conversation as resolved.
Show resolved Hide resolved

### Concurrent CI checks

When you have teammates collaborating on the same dbt project creating pull requests on the same dbt repository, the same CI job will get triggered. Since each run builds into a dedicated, temporary schema that’s tied to the pull request, dbt Cloud can safely execute CI runs _concurrently_ instead of _sequentially_ (differing from what is done with deployment dbt Cloud jobs). Because no one needs to wait for one CI run to finish before another one can start, with concurrent CI checks, your whole team can test and integrate dbt code faster.

Expand All @@ -41,35 +45,12 @@ Below describes the conditions when CI checks are run concurrently and when they
- CI runs with the _same_ PR number and _different_ commit SHAs execute serially because they’re building into the same schema. dbt Cloud will run the latest commit and cancel any older, stale commits. For details, refer to [Smart cancellation of stale builds](#smart-cancellation).
- CI runs with the same PR number and same commit SHA, originating from different dbt Cloud projects will execute jobs concurrently. This can happen when two CI jobs are set up in different dbt Cloud projects that share the same dbt repository.

</Expandable>

<Expandable alt_header="Smart cancellation of stale builds">
### Smart cancellation of stale builds

When you push a new commit to a PR, dbt Cloud enqueues a new CI run for the latest commit and cancels any CI run that is (now) stale and still in flight. This can happen when you’re pushing new commits while a CI build is still in process and not yet done. By cancelling runs in a safe and deliberate way, dbt Cloud helps improve productivity and reduce data platform spend on wasteful CI runs.

<Lightbox src="/img/docs/dbt-cloud/using-dbt-cloud/example-smart-cancel-job.png" width="70%" title="Example of an automatically canceled run"/>

</Expandable>

<Expandable alt_header="Run slot treatment" lifecycle="team,enterprise">
### Run slot treatment <Lifecycle status="team,enterprise" />

CI runs don't consume run slots. This guarantees a CI check will never block a production run.

</Expandable>

<Expandable alt_header="Compare changes" lifecycle="beta" >

When a pull request is opened or new commits are pushed, dbt Cloud compares the changes between the last applied state of the production environment (defaulting to deferral for lower computation costs) and the latest changes from the pull request for CI jobs that have the **Run compare changes** option enabled. By analyzing these comparisons, you can gain a better understanding of how the data changes are affected by code changes to help ensure you always ship the correct changes to production and create trusted data products.

:::info Beta feature

The compare changes feature is currently in limited beta for select accounts. If you're interested in gaining access or learning more, please stay tuned for updates.

:::

dbt reports the comparison differences:

- **In dbt Cloud** &mdash; Shows the changes (if any) to the data's primary keys, rows, and columns. To learn more, refer to the [Compare tab](/docs/deploy/run-visibility#compare-tab) in the [Job run details](/docs/deploy/run-visibility#job-run-details).
- **In the pull request from your Git provider** &mdash; Shows a summary of the changes, as a git comment.

</Expandable>
4 changes: 2 additions & 2 deletions website/docs/docs/deploy/run-visibility.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,9 @@ This provides a list of the artifacts generated by the job run. The files are sa

<Lightbox src="/img/docs/dbt-cloud/example-artifacts-tab.png" width="85%" title="Example of the Artifacts tab" />

### Compare tab <Lifecycle status="beta"/>
### Compare tab <Lifecycle status="preview"/>

The **Compare** tab is shown for [CI job runs](/docs/deploy/ci-jobs) with the **Run compare changes** setting enabled. It displays details about [the changes from the comparison dbt performed](/docs/deploy/continuous-integration#compare-changes) between what's in your production environment and the pull request. To help you better visualize the differences, dbt Cloud highlights changes to your models in red (deletions) and green (inserts).
The **Compare** tab is shown for [CI job runs](/docs/deploy/ci-jobs) with the **Run compare changes** setting enabled. It displays details about [the changes from the comparison dbt performed](/docs/deploy/advanced-ci#compare-changes) between what's in your production environment and the pull request. To help you better visualize the differences, dbt Cloud highlights changes to your models in red (deletions) and green (inserts).

From the **Modified** section, you can view the following:

Expand Down
9 changes: 5 additions & 4 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -466,11 +466,12 @@ const sidebarSettings = {
type: "category",
label: "Continuous integration",
collapsed: true,
link: { type: "doc", id: "docs/deploy/continuous-integration" },
link: { type: "doc", id: "docs/deploy/about-ci" },
items: [
"docs/deploy/continuous-integration",
"docs/deploy/advanced-ci",
],
"docs/deploy/about-ci",
"docs/deploy/continuous-integration",
"docs/deploy/advanced-ci",
],
},
"docs/deploy/continuous-deployment",
{
Expand Down
Loading
Loading