Skip to content

Commit

Permalink
updating with internal changes
Browse files Browse the repository at this point in the history
  • Loading branch information
runleonarun committed Oct 1, 2024
1 parent 179e624 commit 5cf5f67
Showing 1 changed file with 43 additions and 43 deletions.
86 changes: 43 additions & 43 deletions website/docs/docs/dbt-versions/core-upgrade/06-upgrading-to-v1.9.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,85 +5,85 @@ description: New features and changes in dbt Core v1.9
displayed_sidebar: "docs"
---

## Resources
## Resources

- Changelog INSERT HERE - LINK to 1.9 changelog
- [dbt Core 1.9 changelog](https://github.com/dbt-labs/dbt-core/blob/1.9.latest/CHANGELOG.md)
- [dbt Core CLI Installation guide](/docs/core/installation-overview)
- [Cloud upgrade guide](/docs/dbt-versions/upgrade-dbt-version-in-cloud)

## What to know before upgrading

dbt Labs is committed to providing backward compatibility for all versions 1.x, except for any changes explicitly mentioned on this page. If you encounter an error upon upgrading, please let us know by [opening an issue](https://github.com/dbt-labs/dbt-core/issues/new).
dbt Labs is committed to providing backward compatibility for all versions 1.x, except for any changes explicitly mentioned in this guide or as a [behavior change flag](/reference/global-configs/behavior-changes#behavior-change-flags). If you encounter an error upon upgrading, please let us know by [opening an issue](https://github.com/dbt-labs/dbt-core/issues/new).

Remember from version 1.8 that we're [going versionless](/docs/dbt-versions/core-upgrade/upgrading-to-v1.8#versionless) and we have a new [adapter installation procedure](/docs/dbt-versions/core-upgrade/upgrading-to-v1.8#new-dbt-core-adapter-installation-procedure).
dbt Cloud is now [versionless](/docs/dbt-versions/versionless-cloud). If you have selected "Versionless" in dbt Cloud, you already have access to all the features, fixes, and other functionality that is included in dbt Core v1.9.
For users of dbt Core, since v1.8 we recommend explicitly installing both `dbt-core` and `dbt-<youradapter>`. This may become required for a future version of dbt. For example:

```sql
python3 -m pip install dbt-core dbt-snowflake
```

## New and changed features and functionality

Features and functionality new in dbt v1.9.

### New microbatch `incremental_strategy`

INSERT HERE - link to docs
Incremental models are, and have always been, a *performance optimization —* for datasets that are too large to be dropped and recreated from scratch every time you do a `dbt run`.

Historically, managing incremental models involved several manual steps and responsibilities, which involved using:
Historically, managing incremental models involved several manual steps and responsibilities, including:

* Explicit filtering to define "new" data by writing your SQL within an `is_incremental` block.
* Custom logic for incremental loads by implementing your own logic to handle different loading strategies, such as `append` or `delete+insert`.
* Handle batches manually by implementing custom logic using variables.
- Add a snippet of dbt code (in an `is_incremental()` block) that uses the already-existing table (`this`) as a rough bookmark, so that only new data gets processed.
- Pick one of the strategies for smushing old and new data together (`append`, `delete+insert`, or `merge`).
- If anything goes wrong, or your schema changes, you can always "full-refresh", by running the same simple query that rebuilds the whole table from scratch.

These steps made the process error-prone and introduced performance concerns because you had to run a single large SQL query to process all new and updated records.
While this works for many use-cases, there’s a clear limitation with this approach: *Some datasets are just too big to fit into one query.*

Starting in Core 1.9, you can use the new microbatch strategy, which streamlines this process and automates many of these tasks. The benefits include:
Starting in Core 1.9, you can use the [new microbatch strategy](/docs/build/incremental-microbatch) to optimize your largest datasets -- **process your event data in discrete periods with their own SQL queries, rather than all at once.** The benefits include:

* Simplified query design: Write your model query for a single day of data and no longer need `is_incremental()` logic or manual SQL for determining "new" records.
* Automatic batch processing: dbt automatically breaks down the loading process into smaller batches based on the specified `batch_size` and handles the SQL queries for each batch independently, improving efficiency and reducing the risk of query timeouts.
* Dynamic filtering: Use `event_time`, `lookback`, and `batch_size` configurations to generate necessary filters for you, making the process more streamlined and reducing the need for you to manage these details.
* Handling updates: Use the `lookback` configuration to keep track of late-arriving records instead of you making that calculation.
- Simplified query design: Write your model query for a single batch of data and no longer need manual filtering for determining "new" records. Use `event_time``lookback`, and `batch_size` configurations to generate necessary filters for you, making the process more streamlined and reducing the need for you to manage these details.
- Independent batch processing: dbt automatically breaks down the data to load into smaller batches based on the specified `batch_size` and processes each batch independently, improving efficiency and reducing the risk of query timeouts. If some of your batches fail, you can use `dbt retry` to load only the failed batches.
- Targeted reprocessing: To load a *specific* batch or batches, you can use the CLI arguments `--event-time-start` and `--event-time-end`.

While microbatch is in "beta", this functionality is still gated behind an env var, which will change to a behavior flag when 1.9 is GA. To use microbatch:

### Snapshots improvements
- Set `DBT_EXPERIMENTAL_MICROBATCH` to `true` in your project

Originally, snapshots were defined directly in the `dbt_project.yml` file, which involved YAML configuration for source and target schemas without any SQL logic. This method was cumbersome, as it limited flexibility and made managing snapshots more complex. Over time, snapshots evolved to use Jinja blocks, allowing for SQL logic within `.sql` files, but this added parsing complexity and made the development process less efficient.
### Snapshots improvements

Beginning in dbt Core 1.9, we've streamlined snapshot configuration by defining snapshots purely in YAML without any SQL logic. This improvement includes:
Beginning in dbt Core 1.9, we've streamlined snapshot configuration and added a handful of new configurations to make dbt **snapshots easier to configure, run, and customize.** These improvements include:

* New snapshot specification: Snapshots are now configured in a YAML file doe a cleaner more structured set up.
* New `snapshot_meta_column_names` config: Allows you to customize the names of meta fields (for example, `dbt_valid_from`, `dbt_valid_to`, etc.) that dbt automatically adds to snapshots. This increases flexibility to tailor metadata to your needs.
* `target_schema` now optional for snapshots: This schema is now optional When ommitted, snapshots will use the schema defined for the current environment.
* Standard schema and database configs supported: Snapshots will now be consistent with other dbt resources You can specify where snapshots should be stored.
* Warning for incorrect `updated_at` data type: To ensure data integrity, you'll see a warning if the `updated_at` field specified in the snapshot configuration is not the proper data type or timestamp.
- New snapshot specification: Snapshots can now be configured in a YAML file, which provides a cleaner and more consistent set up.
- New `snapshot_meta_column_names` config: Allows you to customize the names of meta fields (for example, `dbt_valid_from`, `dbt_valid_to`, etc.) that dbt automatically adds to snapshots. This increases flexibility to tailor metadata to your needs.
- `target_schema` is now optional for snapshots: When omitted, snapshots will use the schema defined for the current environment.
- Standard `schema` and `database` configs supported: Snapshots will now be consistent with other dbt resources. You can specify where environment-aware snapshots should be stored.
- Warning for incorrect `updated_at` data type: To ensure data integrity, you'll see a warning if the `updated_at` field specified in the snapshot configuration is not the proper data type or timestamp.

### `state:modified` improvements

INSERT HERE Point me to a resource for this?

Fewer false positives in state:modified
state_modified_compare_more_unrendered_values
state_modified_compare_vars

### Deprecated functionality

INSERT HERE - any deprecated functionality to call out?
We’ve made a number of improvements to `state:modified` behaviors to help reduce the risk of false positives/negatives, including:

- Added environment-aware enhancements for environments where the logic purposefully differs (for example, materializing as a table in `prod` but a `view` in dev).
- Enhanced performance so that models that use `var` or `env_var` are included in `state:modified`.

### Managing changes to legacy behaviors

dbt Core v1.9 has introduced flags for [managing changes to legacy behaviors](/reference/global-configs/behavior-changes). You may opt into recently introduced changes (disabled by default), or opt out of mature changes (enabled by default), by setting `True` / `False` values, respectively, for `flags` in `dbt_project.yml`.
dbt Core v1.9 has introduced flags for [managing changes to legacy behaviors](/reference/global-configs/behavior-changes). You may opt into recently introduced changes (disabled by default), or opt out of mature changes (enabled by default), by setting `True` / `False` values, respectively, for `flags` in `dbt_project.yml`.

You can read more about each of these behavior changes in the following links:

INSERT HERE! Any behavior changes?
- (Introduced, disabled by default) [`state_modified_compare_more_unrendered_values` and `state_modified_compare_vars`](/reference/global-configs/behavior-changes#behavior-change-flags) .
- (Introduced, disabled by default) new [`skip_nodes_if_on_run_start_fails` project config flag](/reference/global-configs/behavior-changes#behavior-change-flags). If the flag is set and **any** `on-run-start` hook fails, mark all selected nodes as skipped
- `on-run-start/end` hooks are **always** run, regardless of whether they passed or failed last time
- [Removing a contracted model by deleting, renaming, or disabling]/docs/collaborate/govern/model-contracts#how-are-breaking-changes-handled) it will return an error (versioned models) or warning (unversioned models).

## Adapter specific features and functionalities

TBD

## Quick hits

We also made some quality-of-life improvements in Core 1.9, enabling you to:

- Document [singular data tests](/docs/build/data-tests#document-singular-tests).
- Use `ref` and `source` in foreign key constraints
- New CLI flag for `dbt test`. Choose which resource types are included or excluded when you run the `dbt test` by including [`--resource-type`/`--exclude-resource-type`](/reference/global-configs/resource-type)
- New CLI flag for [`dbt show`](/reference/commands/show). `--inline-direct` enables you to avoid loading the entire manifest and
skip rendering any Jinja templates.

We also made improvements for adapters, enabling you to:
- Use arbitrary config options in `data_test` For example, you can set `snowflake_warehouse` for tests.
- Use behavior flags INSERT HERE MORE INFO
- Document [singular data tests](/docs/build/data-tests#document-singular-tests).
- Use `ref` and `source` in foreign key constraints.
- `dbt test` supports the `--resource-type` / `--exclude-resource-type` flag, making it possible to include or exclude data tests (`test`) or unit tests (`unit_test`).

0 comments on commit 5cf5f67

Please sign in to comment.