Skip to content

Commit

Permalink
Merge branch 'add-event-time' into update-sources-snapshots
Browse files Browse the repository at this point in the history
  • Loading branch information
mirnawong1 authored Nov 19, 2024
2 parents d2bf5af + de8f752 commit 4b28bbc
Show file tree
Hide file tree
Showing 4 changed files with 6 additions and 7 deletions.
2 changes: 1 addition & 1 deletion website/blog/2024-10-04-hybrid-mesh.md
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ This allows dbt Cloud to know about the contents and metadata of your project, w
- Note: If you have [environment variables](/docs/build/environment-variables) in your project, dbt Cloud environment variables must be prefixed with `DBT_ `(including `DBT_ENV_CUSTOM_ENV_` or `DBT_ENV_SECRET`). Follow the instructions in [this guide](https://docs.getdbt.com/guides/core-to-cloud-1?step=8#environment-variables) to convert them for dbt Cloud.
- Each upstream Core project has to have a production [environment](/docs/dbt-cloud-environments) in dbt Cloud. You need to configure credentials and environment variables in dbt Cloud just so that it will resolve relation names to the same places where your dbt Core workflows are deploying those models.
- Set up a [merge job](/docs/deploy/merge-jobs) in a production environment to run `dbt parse`. This will enable connecting downstream projects in dbt Mesh by producing the necessary [artifacts](/reference/artifacts/dbt-artifacts) for cross-project referencing.
- Note: Set up a regular job to run `dbt build` instead of using a merge job for `dbt parse`, and centralize your dbt orchestration by moving production runs to dbt Cloud. Check out [this guide](/guides/core-to-cloud-1?step=9) for more details on converting your production runs to dbt Cloud.
- Optional: Set up a regular job to run `dbt build` instead of using a merge job for `dbt parse`, and centralize your dbt orchestration by moving production runs to dbt Cloud. Check out [this guide](/guides/core-to-cloud-1?step=9) for more details on converting your production runs to dbt Cloud.
- Optional: Set up a regular job (for example, daily) to run `source freshness` and `docs generate`. This will hydrate dbt Cloud with additional metadata and enable features in [dbt Explorer](/docs/collaborate/explore-projects) that will benefit both teams, including [Column-level lineage](/docs/collaborate/column-level-lineage).

### Step 3: Create and connect your downstream projects to your Core project using dbt Mesh
Expand Down
2 changes: 1 addition & 1 deletion website/docs/best-practices/how-we-structure/2-staging.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,4 +223,4 @@ This is a welcome change for many of us who have become used to applying the sam

:::info Development flow versus DAG order.
This guide follows the order of the DAG, so we can get a holistic picture of how these three primary layers build on each other towards fueling impactful data products. It’s important to note though that developing models does not typically move linearly through the DAG. Most commonly, we should start by mocking out a design in a spreadsheet so we know we’re aligned with our stakeholders on output goals. Then, we’ll want to write the SQL to generate that output, and identify what tables are involved. Once we have our logic and dependencies, we’ll make sure we’ve staged all the necessary atomic pieces into the project, then bring them together based on the logic we wrote to generate our mart. Finally, with a functioning model flowing in dbt, we can start refactoring and optimizing that mart. By splitting the logic up and moving parts back upstream into intermediate models, we ensure all of our models are clean and readable, the story of our DAG is clear, and we have more surface area to apply thorough testing.
:::info
:::
7 changes: 3 additions & 4 deletions website/docs/docs/deploy/advanced-ci.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,14 +36,13 @@ dbt reports the comparison differences in:

<Lightbox src="/img/docs/dbt-cloud/example-ci-compare-changes-tab.png" width="85%" title="Example of the Compare tab" />

### Speeding up comparisons

### Optimizing comparisons

When an [`event_time`](/reference/resource-configs/event-time) column is specified on your model, compare changes can optimize comparisons by using only the overlapping timeframe (meaning the timeframe exists in both the CI and production environment), helping you avoid incorrect row-count changes and return results faster.

This is useful in scenarios like:
- **Subset of data in CI** &mdash; When CI builds only a [subset of data](/best-practices/best-practice-workflows#limit-the-data-processed-when-in-development) (like the most recent 7 days), compare changes might interpret the excluded data as "deleted rows." Configuring `event_time` allows you to avoid this issue by limiting comparisons to the overlapping timeframe, preventing false alerts about data deletions that are just filtered out in CI.
- **Fresher data in CI than in production** &mdash; When your CI job includes fresher data than production (because it has run more recently), compare changes might flag the additional rows as "new" data, even though they’re just fresher data in CI. With `event_time` configured, the comparison only includes the shared timeframe and correctly reflects actual changes in the data.
- **Subset of data in CI** &mdash; When CI builds only a [subset of data](/best-practices/best-practice-workflows#limit-the-data-processed-when-in-development) (like the most recent 7 days), compare changes would interpret the excluded data as "deleted rows." Configuring `event_time` allows you to avoid this issue by limiting comparisons to the overlapping timeframe, preventing false alerts about data deletions that are just filtered out in CI.
- **Fresher data in CI than in production** &mdash; When your CI job includes fresher data than production (because it has run more recently), compare changes would flag the additional rows as "new" data, even though they’re just fresher data in CI. With `event_time` configured, the comparison only includes the shared timeframe and correctly reflects actual changes in the data.

<Lightbox src="/img/docs/deploy/apples_to_apples.png" width="90%" title="event_time ensures the same time-slice of data is accurately compared between your CI and production environments." />

Expand Down
2 changes: 1 addition & 1 deletion website/docs/reference/resource-configs/event-time.md
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ Here are some examples of good and bad `event_time` columns:
- `_fivetran_synced` &mdash; This isn't the time that the event happened, it's the time that the event was ingested.
- `last_updated_at` &mdash; This isn't a good use case as this will keep changing over time.

`event_time` is required for [Incremental microbatch](/docs/build/incremental-microbatch) and highly recommended for [Advanced CI's compare changes](/docs/deploy/advanced-ci#speeding-up-comparisons) in CI/CD workflows, where it ensures the same time-slice of data is correctly compared between your CI and production environments.
`event_time` is required for [Incremental microbatch](/docs/build/incremental-microbatch) and highly recommended for [Advanced CI's compare changes](/docs/deploy/advanced-ci#optimizing-comparisons) in CI/CD workflows, where it ensures the same time-slice of data is correctly compared between your CI and production environments.

## Examples

Expand Down

0 comments on commit 4b28bbc

Please sign in to comment.