diff --git a/website/blog/2024-10-04-iceberg-is-an-implementation-detail.md b/website/blog/2024-10-04-iceberg-is-an-implementation-detail.md new file mode 100644 index 00000000000..eca0a411dad --- /dev/null +++ b/website/blog/2024-10-04-iceberg-is-an-implementation-detail.md @@ -0,0 +1,82 @@ +--- +title: "Iceberg Is An Implementation Detail" +description: "This blog will talk about iceberg table support and why it both matters and doesn't" +slug: icebeg-is-an-implementation-detail + +authors: [amy_chen] + +tags: [table formats, iceberg] +hide_table_of_contents: false + +date: 2024-10-04 +is_featured: false +--- + +If you haven’t paid attention to the data industry news cycle, you might have missed the recent excitement centered around an open table format called Apache Iceberg™. It’s one of many open table formats like Delta Lake, Hudi, and Hive. These formats are changing the way data is stored and metadata accessed. They are groundbreaking in many ways. + +But I have to be honest: **I don’t care**. But not for the reasons you think. + +## What is Iceberg? + +To have this conversation, we need to start with the same foundational understanding of Iceberg. Apache Iceberg is a high-performance open table format developed for modern data lakes. It was designed for large-scale datasets, and within the project, there are many ways to interact with it. When people talk about Iceberg, it often means multiple components including but not limited to: + +1. Iceberg Table Format - an open-source table format with large-scale data. Tables materialized in iceberg table format are stored on a user’s infrastructure, such as S3 Bucket. +2. Iceberg Data Catalog - an open-source metadata management system that tracks the schema, partition, and versions of Iceberg tables. +3. Iceberg REST Protocol (also called Iceberg REST API) is how engines can support and speak to other Iceberg-compatible catalogs. + +If you have been in the industry, you also know that everything I just wrote above about Iceberg could easily be replaced by `Hive,` `Hudi,` or `Delta.` This is because they were all designed to solve essentially the same problem. Ryan Blue (creator of Iceberg) and Michael Armbrust (creator of Delta Lake) recently sat down for this [fantastic chat](https://vimeo.com/1012543474) and said two points that resonated with me: + +- “We never intended for people to pay attention to this area. It’s something we wanted to fix, but people should be able to not pay attention and just work with their data. Storage systems should just work.” +- “We solve the same challenges with different approaches.” + +At the same time, the industry is converging on Apache Iceberg. [Iceberg has the highest availability of read and write support](https://medium.com/sundeck/2024-lakehouse-format-rundown-7edd75015428). + + + + +Snowflake launched Iceberg support in 2022. Databricks launched Iceberg support via Uniform last year. Microsoft announced Fabric support for Iceberg in September 2024 at Fabric Con. **Customers are demanding interoperability, and vendors are listening**. + +Why does this matter? Standardization of the industry benefits customers. When the industry standardizes - customers have the gift of flexibility. Everyone has a preferred way of working, and with standardization — they can always bring their preferred tools to their organization’s data. + +## Just another implementation detail + +I’m not saying open table formats aren't important. The metadata management and performance make them very meaningful and should be paid attention to. Our users are already excited to use it to create data lakes to save on storage costs, create more abstraction from their computing, etc. + +But when building data models or focusing on delivering business value through analytics, my primary concern is not *how* the data is stored—it's *how* I can leverage it to generate insights and drive decisions. The analytics development lifecycle is hard enough without having to take into every detail. dbt abstracts the underlying platform and lets me focus on writing SQL and orchestrating my transformations. It’s a feature that I don’t need to think about how tables are stored or optimized—I just need to know that when I reference dim_customers or fct_sales, the correct data is there and ready to use. **It should just work.** + +## Sometimes the details do matter + +While table formats are an implementation detail for data transformation — Iceberg can impact dbt developers when the implementation details aren’t seamless. Currently, using Iceberg requires a significant amount of upfront configuration and integration work beyond just creating tables to get started. + +One of the biggest hurdles is managing Iceberg’s metadata layer. This metadata often needs to be synced with external catalogs, which requires careful setup and ongoing maintenance to prevent inconsistencies. Permissions and access controls add another layer of complexity—because multiple engines can access Iceberg tables, you have to ensure that all systems have the correct access to both the data files and the metadata catalog. Currently, setting up integrations between these engines is also far from seamless; while some engines natively support Iceberg, others require brittle workarounds to ensure the metadata is synced correctly. This fragmented landscape means you could land with a web of interconnected components. + +## Fixing it + +**Today, we announced official support for the Iceberg table format in dbt.** By supporting the Iceberg table format, it’s one less thing you have to worry about on your journey to adopting Iceberg. + +With support for Iceberg Table Format, it is now easier to convert your dbt models using proprietary table formats to Iceberg by updating your configuration. After you have set up your external storage for Iceberg and connected it to your platforms, you will be able to jump into your dbt model and update the configuration to look something like this: + + + +It is available on these adapters: + +- Athena +- Databricks +- Snowflake +- Spark +- Starburst/Trino +- Dremio + +As with the beauty of any open-source project, Iceberg support grew organically, so the implementations vary. However, this will change in the coming months as we converge onto one dbt standard. This way, no matter which adapter you jump into, the configuration will always be the same. + +## dbt the Abstraction Layer + +dbt is more than about abstracting away the DDL to create and manage objects. It’s also about ensuring an opinionated approach to managing and optimizing your data. That remains true for our strategy around Iceberg Support. + +In our dbt-snowflake implementation, we have already started to [enforce best practices centered around how to manage the base location](https://docs.getdbt.com/reference/resource-configs/snowflake-configs#base-location) to ensure you don’t create technical debt accidentally, ensuring your Iceberg implementation scales over time. And we aren’t done yet. + +That said, while we can create the models, there is a *lot* of initial work to get to that stage. dbt developers must still consider the implementation, like how their external volume has been set up or where dbt can access the metadata. We have to make this better. + +Given the friction of getting launched on Iceberg, over the coming months, we will enable more capabilities to empower users to adopt Iceberg. It should be easier to read from foreign Iceberg catalogs. It should be easier to mount your volume. It should be easier to manage refreshes. And you should also trust that permissions and governance are consistently enforced. + +And this work doesn’t stop at Iceberg. The framework we are building is also compatible with other table formats, ensuring that whatever table format works for you is supported on dbt. This way — dbt users can also stop caring about table formats. **It’s just another implementation detail.** diff --git a/website/blog/authors.yml b/website/blog/authors.yml index afaa238d2e5..271130a477d 100644 --- a/website/blog/authors.yml +++ b/website/blog/authors.yml @@ -1,7 +1,7 @@ --- amy_chen: image_url: /img/blog/authors/achen.png - job_title: Product Ecosystem Manager + job_title: Product Manager links: - icon: fa-linkedin url: https://www.linkedin.com/in/yuanamychen/ diff --git a/website/docs/docs/build/incremental-microbatch.md b/website/docs/docs/build/incremental-microbatch.md index b4b5406127e..2bfc07d8e2e 100644 --- a/website/docs/docs/build/incremental-microbatch.md +++ b/website/docs/docs/build/incremental-microbatch.md @@ -30,6 +30,15 @@ A `sessions` model is aggregating and enriching data that comes from two other m The `page_view_start` column in `page_views` is configured as that model's `event_time`. The `customers` model does not configure an `event_time`. Therefore, each batch of `sessions` will filter `page_views` to the equivalent time-bounded batch, and it will not filter `customers` (a full scan for every batch). + + +```yaml +models: + - name: page_views + config: + event_time: page_view_start +``` + We run the `sessions` model on October 1, 2024, and then again on October 2. It produces the following queries: diff --git a/website/docs/docs/build/metricflow-time-spine.md b/website/docs/docs/build/metricflow-time-spine.md index 2965b623f13..e4a93aa217e 100644 --- a/website/docs/docs/build/metricflow-time-spine.md +++ b/website/docs/docs/build/metricflow-time-spine.md @@ -21,41 +21,47 @@ To see the generated SQL for the metric and dimension types that use time spine ## Configuring time spine in YAML -- The [`models` key](/reference/model-properties) for the time spine must be in your `models/` directory. -- Each time spine is a normal dbt model with extra configurations that tell dbt and MetricFlow how to use specific columns by defining their properties. -- You likely already have a calendar table in your project which you can use. If you don't, review the [example time-spine tables](#example-time-spine-tables) for sample code. -- You add the configurations under the `time_spine` key for that [model's properties](/reference/model-properties), just as you would add a description or tests. + Time spine models are normal dbt models with extra configurations that tell dbt and MetricFlow how to use specific columns by defining their properties. Add the [`models` key](/reference/model-properties) for the time spine in your `models/` directory. If your project already includes a calendar table or date dimension, you can configure that table as a time spine. Otherwise, review the [example time-spine tables](#example-time-spine-tables) to create one. + + Some things to note when configuring time spine models: + +- Add the configurations under the `time_spine` key for that [model's properties](/reference/model-properties), just as you would add a description or tests. - You only need to configure time-spine models that the Semantic Layer should recognize. - At a minimum, define a time-spine table for a daily grain. - You can optionally define additional time-spine tables for different granularities, like hourly. Review the [granularity considerations](#granularity-considerations) when deciding which tables to create. - If you're looking to specify the grain of a time dimension so that MetricFlow can transform the underlying column to the required granularity, refer to the [Time granularity documentation](/docs/build/dimensions?dimension=time_gran) -For example, given the following directory structure, you can create two time spine configurations, `time_spine_hourly` and `time_spine_daily`. MetricFlow supports granularities ranging from milliseconds to years. Refer to the [Dimensions page](/docs/build/dimensions?dimension=time_gran#time) (time_granularity tab) to find the full list of supported granularities. - :::tip -Previously, you had to create a model called `metricflow_time_spine` in your dbt project. Now, if your project already includes a date dimension or time spine table, you can simply configure MetricFlow to use that table by updating the `model` setting in the Semantic Layer. - -If you don’t have a date dimension table, you can still create one by using the following code snippet to build your time spine model. +If you previously used a model called `metricflow_time_spine`, you no longer need to create this specific model. You can now configure MetricFlow to use any date dimension or time spine table already in your project by updating the `model` setting in the Semantic Layer. +If you don’t have a date dimension table, you can still create one by using the code snippet in the [next section](#creating-a-time-spine-table) to build your time spine model. ::: - +### Creating a time spine table + +MetricFlow supports granularities ranging from milliseconds to years. Refer to the [Dimensions page](/docs/build/dimensions?dimension=time_gran#time) (time_granularity tab) to find the full list of supported granularities. + +To create a time spine table from scratch, you can do so by adding the following code to your dbt project. +This example creates a time spine at an hourly grain and a daily grain: `time_spine_hourly` and `time_spine_daily`. ```yaml -[models:](/reference/model-properties) - - name: time_spine_hourly - description: "my favorite time spine" +[models:](/reference/model-properties) +# Hourly time spine + - name: time_spine_hourly + description: my favorite time spine time_spine: - standard_granularity_column: date_hour # column for the standard grain of your table, must be date time type." + standard_granularity_column: date_hour # column for the standard grain of your table, must be date time type. custom_granularities: - name: fiscal_year column_name: fiscal_year_column columns: - name: date_hour granularity: hour # set granularity at column-level for standard_granularity_column + +# Daily time spine - name: time_spine_daily time_spine: standard_granularity_column: date_day # column for the standard grain of your table @@ -66,6 +72,8 @@ If you don’t have a date dimension table, you can still create one by using th + + @@ -91,30 +99,14 @@ models: -For an example project, refer to our [Jaffle shop](https://github.com/dbt-labs/jaffle-sl-template/blob/main/models/marts/_models.yml) example. - - - -- The previous configuration demonstrates a time spine model called `time_spine_daily`. It sets the time spine configurations under the `time_spine` key. -- The `standard_granularity_column` is the column that maps to one of our [standard granularities](/docs/build/dimensions?dimension=time_gran). The grain of this column must be finer or equal in size to the granularity of all custom granularity columns in the same model. In this case, it's hourly. -- It needs to reference a column defined under the `columns` key, in this case, `date_hour`. -- MetricFlow will use the `standard_granularity_column` as the join key when joining the time spine table to other source table. -- Here, the granularity of the `standard_granularity_column` is set at the column level, in this case, `hour`. - -Additionally, [the `custom_granularities` field](#custom-calendar), (available in dbt v1.9 and higher) lets you specify non-standard time periods like `fiscal_year` or `retail_month` that your organization may use. +- This example configuration shows a time spine model called `time_spine_hourly` and `time_spine_daily`. It sets the time spine configurations under the `time_spine` key. +- The `standard_granularity_column` is the column that maps to one of our [standard granularities](/docs/build/dimensions?dimension=time_gran). This column must be set under the `columns` key and should have a grain that is finer or equal to any custom granularity columns defined in the same model. + - It needs to reference a column defined under the `columns` key, in this case, `date_hour` and `date_day`, respectively. + - It sets the granularity at the column-level using the `granularity` key, in this case, `hour` and `day`, respectively. +- MetricFlow will use the `standard_granularity_column` as the join key when joining the time spine table to another source table. +- [The `custom_granularities` field](#custom-calendar), (available in Versionless and dbt v1.9 and higher) lets you specify non-standard time periods like `fiscal_year` or `retail_month` that your organization may use. - - - - -If you need to create a time spine table from scratch, you can do so by adding the following code to your dbt project. -The example creates a time spine at a daily grain and an hourly grain. A few things to note when creating time spine models: -* MetricFlow will use the time spine with the largest compatible granularity for a given query to ensure the most efficient query possible. For example, if you have a time spine at a monthly grain, and query a dimension at a monthly grain, MetricFlow will use the monthly time spine. If you only have a daily time spine, MetricFlow will use the daily time spine and date_trunc to month. -* You can add a time spine for each granularity you intend to use if query efficiency is more important to you than configuration time, or storage constraints. For most engines, the query performance difference should be minimal and transforming your time spine to a coarser grain at query time shouldn't add significant overhead to your queries. -* We recommend having a time spine at the finest grain used in any of your dimensions to avoid unexpected errors. i.e., if you have dimensions at an hourly grain, you should have a time spine at an hourly grain. - - -Now, break down the configuration above. It's pointing to a model called `time_spine_daily`, and all the configuration is colocated with the rest of the [model's properties](/reference/model-properties). It sets the time spine configurations under the `time_spine` key. The `standard_granularity_column` is the lowest grain of the table, in this case, it's hourly. It needs to reference a column defined under the columns key, in this case, `date_hour`. Use the `standard_granularity_column` as the join key for the time spine table when joining tables in MetricFlow. Here, the granularity of the `standard_granularity_column` is set at the column level, in this case, `hour`. +For an example project, refer to our [Jaffle shop](https://github.com/dbt-labs/jaffle-sl-template/blob/main/models/marts/_models.yml) example. ### Considerations when choosing which granularities to create{#granularity-considerations} @@ -302,9 +294,9 @@ and date_hour < dateadd(day, 30, current_timestamp()) -Being able to configure custom calendars, such as a fiscal calendar, is available in [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless) or dbt Core [v1.9 and above](/docs/dbt-versions/core). +The ability to configure custom calendars, such as a fiscal calendar, is available in [dbt Cloud Versionless](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless) or dbt Core [v1.9 and higher](/docs/dbt-versions/core). -To access this feature, [upgrade to Versionless](/docs/dbt-versions/versionless-cloud) or dbt Core v1.9 and above. +To access this feature, [upgrade to Versionless](/docs/dbt-versions/versionless-cloud) or your dbt Core version to v1.9 or higher. @@ -337,6 +329,6 @@ models: #### Coming soon -Note that features like calculating offsets and period-over-period will be supported soon. +Note that features like calculating offsets and period-over-period will be supported soon! diff --git a/website/docs/docs/build/ratio-metrics.md b/website/docs/docs/build/ratio-metrics.md index cc1d13b7835..fdaeb878450 100644 --- a/website/docs/docs/build/ratio-metrics.md +++ b/website/docs/docs/build/ratio-metrics.md @@ -24,6 +24,8 @@ Ratio allows you to create a ratio between two metrics. You simply specify a num The following displays the complete specification for ratio metrics, along with an example. + + ```yaml metrics: - name: The metric name # Required @@ -40,11 +42,19 @@ metrics: filter: Filter for the denominator # Optional alias: Alias for the denominator # Optional ``` + For advanced data modeling, you can use `fill_nulls_with` and `join_to_timespine` to [set null metric values to zero](/docs/build/fill-nulls-advanced), ensuring numeric values for every data row. ## Ratio metrics example +These examples demonstrate how to create ratio metrics in your model. They cover basic and advanced use cases, including applying filters to the numerator and denominator metrics. + +#### Example 1 +This example is a basic ratio metric that calculates the ratio of food orders to total orders: + + + ```yaml metrics: - name: food_order_pct @@ -55,6 +65,30 @@ metrics: numerator: food_orders denominator: orders ``` + + +#### Example 2 +This example is a ratio metric that calculates the ratio of food orders to total orders, with a filter and alias applied to the numerator. Note that in order to add these attributes, you'll need to use an explicit key for the name attribute too. + + + +```yaml +metrics: + - name: food_order_pct + description: "The food order count as a ratio of the total order count, filtered by location" + label: Food order ratio by location + type: ratio + type_params: + numerator: + name: food_orders + filter: location = 'New York' + alias: ny_food_orders + denominator: + name: orders + filter: location = 'New York' + alias: ny_orders +``` + ## Ratio metrics using different semantic models @@ -109,6 +143,8 @@ on Users can define constraints on input metrics for a ratio metric by applying a filter directly to the input metric, like so: + + ```yaml metrics: - name: frequent_purchaser_ratio @@ -123,6 +159,7 @@ metrics: denominator: name: distinct_purchasers ``` + Note the `filter` and `alias` parameters for the metric referenced in the numerator. - Use the `filter` parameter to apply a filter to the metric it's attached to. diff --git a/website/docs/docs/cloud-integrations/semantic-layer/excel.md b/website/docs/docs/cloud-integrations/semantic-layer/excel.md index 31a028f3d81..c80040dce01 100644 --- a/website/docs/docs/cloud-integrations/semantic-layer/excel.md +++ b/website/docs/docs/cloud-integrations/semantic-layer/excel.md @@ -16,10 +16,11 @@ The dbt Semantic Layer offers a seamless integration with Excel Online and Deskt - You must have a dbt Cloud Team or Enterprise [account](https://www.getdbt.com/pricing). Suitable for both Multi-tenant and Single-tenant deployment. - Single-tenant accounts should contact their account representative for necessary setup and enablement. -import SLCourses from '/snippets/_sl-course.md'; +:::tip - +📹 For on-demand video learning, explore the [Querying the Semantic Layer with Excel](https://learn.getdbt.com/courses/querying-the-semantic-layer-with-excel) course to learn how to query metrics with Excel. +::: ## Installing the add-on diff --git a/website/docs/docs/cloud/about-cloud/about-dbt-cloud.md b/website/docs/docs/cloud/about-cloud/about-dbt-cloud.md index 02f950111ea..d7afd424fc4 100644 --- a/website/docs/docs/cloud/about-cloud/about-dbt-cloud.md +++ b/website/docs/docs/cloud/about-cloud/about-dbt-cloud.md @@ -24,7 +24,7 @@ dbt Cloud's [flexible plans](https://www.getdbt.com/pricing/) and features make diff --git a/website/docs/docs/cloud/connect-data-platform/about-connections.md b/website/docs/docs/cloud/connect-data-platform/about-connections.md index 8bec408af2e..6f2f140b724 100644 --- a/website/docs/docs/cloud/connect-data-platform/about-connections.md +++ b/website/docs/docs/cloud/connect-data-platform/about-connections.md @@ -18,6 +18,7 @@ dbt Cloud can connect with a variety of data platform providers including: - [PostgreSQL](/docs/cloud/connect-data-platform/connect-redshift-postgresql-alloydb) - [Snowflake](/docs/cloud/connect-data-platform/connect-snowflake) - [Starburst or Trino](/docs/cloud/connect-data-platform/connect-starburst-trino) +- [Teradata](/docs/cloud/connect-data-platform/connect-teradata) You can connect to your database in dbt Cloud by clicking the gear in the top right and selecting **Account Settings**. From the Account Settings page, click **+ New Project**. diff --git a/website/docs/docs/cloud/dbt-assist-data.md b/website/docs/docs/cloud/dbt-assist-data.md deleted file mode 100644 index ad32c304ca8..00000000000 --- a/website/docs/docs/cloud/dbt-assist-data.md +++ /dev/null @@ -1,29 +0,0 @@ ---- -title: "dbt Assist privacy and data" -sidebar_label: "dbt Assist privacy" -description: "dbt Assist’s powerful AI feature helps you deliver data that works." ---- - -# dbt Assist privacy and data - -dbt Labs is committed to protecting your privacy and data. This page provides information about how dbt Labs handles your data when you use dbt Assist. - -#### Is my data used by dbt Labs to train AI models? - -No, dbt Assist does not use client warehouse data to train any AI models. It uses API calls to an AI provider. - -#### Does dbt Labs share my personal data with third parties - -dbt Labs only shares client personal information as needed to perform the services, under client instructions, or for legal, tax, or compliance reasons. - -#### Does dbt Assist store or use personal data? - -The user clicks the AI assist button, and the user does not otherwise enter data. - -#### Does dbt Assist access my warehouse data? - -dbt Assist utilizes metadata, including column names, model SQL, the model's name, and model documentation. The row-level data from the warehouse is never used or sent to a third-party provider. Such output must be double-checked by the user for completeness and accuracy. - -#### Can dbt Assist data be deleted upon client written request? - -dbt Assist data, aside from usage data, does not persist on dbt Labs systems. Usage data is retained by dbt Labs. dbt Labs does not have possession of any personal or sensitive data. To the extent client identifies personal or sensitive information uploaded by or on behalf of client to dbt Labs systems, such data can be deleted within 30 days of written request. diff --git a/website/docs/docs/cloud/dbt-assist.md b/website/docs/docs/cloud/dbt-assist.md deleted file mode 100644 index bb8cabaff2b..00000000000 --- a/website/docs/docs/cloud/dbt-assist.md +++ /dev/null @@ -1,25 +0,0 @@ ---- -title: "About dbt Assist" -sidebar_label: "About dbt Assist" -description: "dbt Assist’s powerful AI co-pilot feature helps you deliver data that works." -pagination_next: "docs/cloud/enable-dbt-assist" -pagination_prev: null ---- - -# About dbt Assist - -dbt Assist is a powerful artificial intelligence (AI) co-pilot feature that helps automate development in dbt Cloud, allowing you to focus on delivering data that works. dbt Assist’s AI co-pilot generates [documentation](/docs/build/documentation), [semantic models](/docs/build/semantic-models), and [tests](/docs/build/data-tests) for your SQL models directly in the dbt Cloud IDE, with a click of a button, and helps you accomplish more in less time. - -:::tip Beta feature -dbt Assist is an AI tool meant to _help_ developers generate documentation, semantic models, and tests in dbt Cloud. It's available in beta, in the dbt Cloud IDE only. - -To use dbt Assist, you must have an active [dbt Cloud Enterprise account](https://www.getdbt.com/pricing) and agree to use dbt Labs' OpenAI key. [Register your interest](https://docs.google.com/forms/d/e/1FAIpQLScPjRGyrtgfmdY919Pf3kgqI5E95xxPXz-8JoVruw-L9jVtxg/viewform) to join the private beta or reach out to your account team to begin this process. -::: - - - -## Feedback - -Please note: Always review AI-generated code and content as it may produce incorrect results. dbt Assist features and/or functionality may be added or eliminated as part of the beta trial. - -To give feedback, please reach out to your dbt Labs account team. We appreciate your feedback and suggestions as we improve dbt Assist. diff --git a/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md b/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md index 37f39f6dff8..398b0cff2a1 100644 --- a/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md +++ b/website/docs/docs/cloud/dbt-cloud-ide/develop-in-the-cloud.md @@ -13,7 +13,7 @@ The dbt Cloud integrated development environment (IDE) is a single web-based int The dbt Cloud IDE offers several [keyboard shortcuts](/docs/cloud/dbt-cloud-ide/keyboard-shortcuts) and [editing features](/docs/cloud/dbt-cloud-ide/ide-user-interface#editing-features) for faster and efficient development and governance: - Syntax highlighting for SQL — Makes it easy to distinguish different parts of your code, reducing syntax errors and enhancing readability. -- AI co-pilot — Use [dbt Assist](/docs/cloud/dbt-assist), a powerful AI co-pilot feature, to generate documentation, semantic models, and tests for your dbt SQL models. +- AI copilot — Use [dbt Copilot](/docs/cloud/dbt-copilot), a powerful AI engine that can generate documentation, tests, and semantic models for your dbt SQL models. - Auto-completion — Suggests table names, arguments, and column names as you type, saving time and reducing typos. - Code [formatting and linting](/docs/cloud/dbt-cloud-ide/lint-format) — Helps standardize and fix your SQL code effortlessly. - Navigation tools — Easily move around your code, jump to specific lines, find and replace text, and navigate between project files. @@ -55,7 +55,7 @@ To understand how to navigate the IDE and its user interface elements, refer to | [**Keyboard shortcuts**](/docs/cloud/dbt-cloud-ide/keyboard-shortcuts) | You can access a variety of [commands and actions](/docs/cloud/dbt-cloud-ide/keyboard-shortcuts) in the IDE by choosing the appropriate keyboard shortcut. Use the shortcuts for common tasks like building modified models or resuming builds from the last failure. | | **IDE version control** | The IDE version control section and git button allow you to apply the concept of [version control](/docs/collaborate/git/version-control-basics) to your project directly into the IDE.

- Create or change branches, execute git commands using the git button.
- Commit or revert individual files by right-clicking the edited file
- [Resolve merge conflicts](/docs/collaborate/git/merge-conflicts)
- Link to the repo directly by clicking the branch name
- Edit, format, or lint files and execute dbt commands in your primary protected branch, and commit to a new branch.
- Use Git diff view to view what has been changed in a file before you make a pull request.
- From dbt version 1.6 and higher, use the **Prune branches** [button](/docs/cloud/dbt-cloud-ide/ide-user-interface#prune-branches-modal) to delete local branches that have been deleted from the remote repository, keeping your branch management tidy. | | **Preview and Compile button** | You can [compile or preview](/docs/cloud/dbt-cloud-ide/ide-user-interface#console-section) code, a snippet of dbt code, or one of your dbt models after editing and saving. | -| [**dbt Assist**](/docs/cloud/dbt-assist) | A powerful AI co-pilot feature that generates documentation, semantic models, and tests for your dbt SQL models. Available for dbt Cloud Enterprise plans. | +| [**dbt Copilot**](/docs/cloud/dbt-copilot) | A powerful AI engine that can generate documentation, tests, and semantic models for your dbt SQL models. Available for dbt Cloud Enterprise plans. | | **Build, test, and run button** | Build, test, and run your project with a button click or by using the Cloud IDE command bar. | **Command bar** | You can enter and run commands from the command bar at the bottom of the IDE. Use the [rich model selection syntax](/reference/node-selection/syntax) to execute [dbt commands](/reference/dbt-commands) directly within dbt Cloud. You can also view the history, status, and logs of previous runs by clicking History on the left of the bar. | **Drag and drop** | Drag and drop files located in the file explorer, and use the file breadcrumb on the top of the IDE for quick, linear navigation. Access adjacent files in the same file by right-clicking on the breadcrumb file. @@ -130,7 +130,7 @@ Nice job, you're ready to start developing and building models 🎉! - Starting from dbt v1.6, leverage [environments variables](/docs/build/environment-variables#special-environment-variables) to dynamically use the Git branch name. For example, using the branch name as a prefix for a development schema. - Run [MetricFlow commands](/docs/build/metricflow-commands) to create and manage metrics in your project with the [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl). -- **Generate your YAML configurations with dbt Assist** — [dbt Assist](/docs/cloud/dbt-assist) is a powerful artificial intelligence (AI) co-pilot feature that helps automate development in dbt Cloud. It generates documentation, semantic models, and tests for your dbt SQL models directly in the dbt Cloud IDE, with a click of a button, and helps you accomplish more in less time. Available for dbt Cloud Enterprise plans. +- **Generate your YAML configurations with dbt Copilot** — [dbt Copilot](/docs/cloud/dbt-copilot) is a powerful artificial intelligence (AI) feature that helps automate development in dbt Cloud. It can generate documentation, tests, and semantic models for your dbt SQL models directly in the dbt Cloud IDE, with a click of a button, and helps you accomplish more in less time. Available for dbt Cloud Enterprise plans. - **Build and view your project's docs** — The dbt Cloud IDE makes it possible to [build and view](/docs/collaborate/build-and-view-your-docs) documentation for your dbt project while your code is still in development. With this workflow, you can inspect and verify what your project's generated documentation will look like before your changes are released to production. diff --git a/website/docs/docs/cloud/dbt-copilot-data.md b/website/docs/docs/cloud/dbt-copilot-data.md new file mode 100644 index 00000000000..b55681542e3 --- /dev/null +++ b/website/docs/docs/cloud/dbt-copilot-data.md @@ -0,0 +1,29 @@ +--- +title: "dbt Copilot privacy and data" +sidebar_label: "dbt Copilot privacy" +description: "dbt Copilot is a powerful AI engine to help you deliver data that works." +--- + +# dbt Copilot privacy and data + +dbt Labs is committed to protecting your privacy and data. This page provides information about how the dbt Copilot AI engine handles your data. + +#### Is my data used by dbt Labs to train AI models? + +No, dbt Copilot does not use client warehouse data to train any AI models. It uses API calls to an AI provider. + +#### Does dbt Labs share my personal data with third parties + +dbt Labs only shares client personal information as needed to perform the services, under client instructions, or for legal, tax, or compliance reasons. + +#### Does dbt Copilot store or use personal data? + +The user clicks the dbt Copilot button, and the user does not otherwise enter data. + +#### Does dbt Copilot access my warehouse data? + +dbt Copilot utilizes metadata, including column names, model SQL, the model's name, and model documentation. The row-level data from the warehouse is never used or sent to a third-party provider. Such output must be double-checked by the user for completeness and accuracy. + +#### Can dbt Copilot data be deleted upon client written request? + +The data from using dbt Copilot, aside from usage data, _doesn't_ persist on dbt Labs systems. Usage data is retained by dbt Labs. dbt Labs doesn't have possession of any personal or sensitive data. To the extent client identifies personal or sensitive information uploaded by or on behalf of client to dbt Labs systems, such data can be deleted within 30 days of written request. diff --git a/website/docs/docs/cloud/dbt-copilot.md b/website/docs/docs/cloud/dbt-copilot.md new file mode 100644 index 00000000000..42a05dd91ba --- /dev/null +++ b/website/docs/docs/cloud/dbt-copilot.md @@ -0,0 +1,25 @@ +--- +title: "About dbt Copilot" +sidebar_label: "About dbt Copilot" +description: "dbt Copilot is a powerful AI engine designed to accelerate your analytics workflows throughout your entire ADLC." +pagination_next: "docs/cloud/enable-dbt-copilot" +pagination_prev: null +--- + +# About dbt Copilot + +dbt Copilot is a powerful artificial intelligence (AI) engine that's fully integrated into your dbt Cloud experience and designed to accelerate your analytics workflows. dbt Copilot embeds AI-driven assistance across every stage of the analytics development life cycle (ADLC), empowering data practitioners to deliver data products faster, improve data quality, and enhance data accessibility. With automatic code generation, you can let the AI engine generate the [documentation](/docs/build/documentation), [tests](/docs/build/data-tests), and [semantic models](/docs/build/semantic-models) for you. + +:::tip Beta feature +dbt Copilot is designed to _help_ developers generate documentation, tests, and semantic models in dbt Cloud. It's available in beta, in the dbt Cloud IDE only. + +To use dbt Copilot, you must have an active [dbt Cloud Enterprise account](https://www.getdbt.com/pricing) and agree to use dbt Labs' OpenAI key. [Register your interest](https://docs.google.com/forms/d/e/1FAIpQLScPjRGyrtgfmdY919Pf3kgqI5E95xxPXz-8JoVruw-L9jVtxg/viewform) to join the private beta or reach out to your Account team to begin this process. +::: + + + +## Feedback + +Please note: Always review AI-generated code and content as it may produce incorrect results. The features and/or functionality of dbt Copilot may be added or eliminated as part of the beta trial. + +To give feedback, please contact your dbt Labs account team. We appreciate your feedback and suggestions as we improve dbt Copilot. diff --git a/website/docs/docs/cloud/enable-dbt-assist.md b/website/docs/docs/cloud/enable-dbt-assist.md deleted file mode 100644 index 9432f858001..00000000000 --- a/website/docs/docs/cloud/enable-dbt-assist.md +++ /dev/null @@ -1,35 +0,0 @@ ---- -title: "Enable dbt Assist" -sidebar_label: "Enable dbt Assist" -description: "Enable dbt Assist in dbt Cloud and leverage AI to speed up your development." ---- - -# Enable dbt Assist - -This page explains how to enable dbt Assist in dbt Cloud to leverage AI to speed up your development and allow you to focus on delivering quality data. - -## Prerequisites - -- Available in the dbt Cloud IDE only. -- Must have an active [dbt Cloud Enterprise account](https://www.getdbt.com/pricing). -- Development environment be ["Versionless"](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless). -- Current dbt Assist deployments use a central OpenAI API key managed by dbt Labs. In the future, you may provide your own key for Azure OpenAI or OpenAI. -- Accept and sign legal agreements. Reach out to your account team to begin this process. - -## Enable dbt Assist - -dbt Assist will only be available at an account level after your organization has signed the legal requirements. It will be disabled by default. Your dbt Cloud Admin(s) will enable it by following these steps: - -1. Navigate to **Account Settings** in the navigation menu. - -2. Under **Settings**, confirm the account you're enabling. - -3. Click **Edit** in the top right corner. - -4. To turn on dbt Assist, toggle the **Enable account access to AI-powered features** switch to the right. The toggle will slide to the right side, activating dbt Assist. - -5. Click **Save** and you should now have dbt Assist AI enabled to use. - -Note: To disable (only after enabled), repeat steps 1 to 3, toggle off in step 4, and repeat step 5. - - diff --git a/website/docs/docs/cloud/enable-dbt-copilot.md b/website/docs/docs/cloud/enable-dbt-copilot.md new file mode 100644 index 00000000000..23c253ecf7a --- /dev/null +++ b/website/docs/docs/cloud/enable-dbt-copilot.md @@ -0,0 +1,35 @@ +--- +title: "Enable dbt Copilot" +sidebar_label: "Enable dbt Copilot" +description: "Enable the dbt Copilot AI engine in dbt Cloud to speed up your development." +--- + +# Enable dbt Copilot + +This page explains how to enable the dbt Copilot engine in dbt Cloud, leveraging AI to speed up your development and allowing you to focus on delivering quality data. + +## Prerequisites + +- Available in the dbt Cloud IDE only. +- Must have an active [dbt Cloud Enterprise account](https://www.getdbt.com/pricing). +- Development environment has been upgraded to ["Versionless"](/docs/dbt-versions/upgrade-dbt-version-in-cloud#versionless). +- Current dbt Copilot deployments use a central OpenAI API key managed by dbt Labs. In the future, you may provide your own key for Azure OpenAI or OpenAI. +- Accept and sign legal agreements. Reach out to your Account team to begin this process. + +## Enable dbt Copilot + +dbt Copilot is only available at an account level after your organization has signed the legal requirements. It's disabled by default. A dbt Cloud admin(s) can enable it by following these steps: + +1. Navigate to **Account settings** in the navigation menu. + +2. Under **Settings**, confirm the account you're enabling. + +3. Click **Edit** in the top right corner. + +4. Enable the **Enable account access to AI-powered features** option. + +5. Click **Save**. You should now have the dbt Copilot AI engine enabled for use. + +Note: To disable (only after enabled), repeat steps 1 to 3, toggle off in step 4, and repeat step 5. + + \ No newline at end of file diff --git a/website/docs/docs/cloud/use-dbt-assist.md b/website/docs/docs/cloud/use-dbt-assist.md deleted file mode 100644 index 888d5107999..00000000000 --- a/website/docs/docs/cloud/use-dbt-assist.md +++ /dev/null @@ -1,20 +0,0 @@ ---- -title: "Use dbt Assist" -sidebar_label: "Use dbt Assist" -description: "Use dbt Assist to generate documentation, semantic models, and tests from scratch, giving you the flexibility to modify or fix generated code." ---- - -# Use dbt Assist - -Use dbt Assist to generate documentation, semantic models, and tests from scratch, giving you the flexibility to modify or fix generated code. - -To access and use dbt Assist: - -1. Navigate to the dbt Cloud IDE and select a SQL model file under the **File Explorer**. -2. In the **Console** section (under the **File Editor**), select the **dbt Assist** to view the available AI options. -3. Select the available options to generate the YAML config: **Generate Documentation**, **Generate Tests**, or **Generate Semantic Model**. - - To generate multiple YAML configs for the same model, click each option separately. dbt Assist intelligently saves the YAML config in the same file. -4. Verify the AI-generated code. Update or fix the code if needed. -5. Click **Save** to save the code. You should see the file changes under the **Version control** section. - - diff --git a/website/docs/docs/cloud/use-dbt-copilot.md b/website/docs/docs/cloud/use-dbt-copilot.md new file mode 100644 index 00000000000..30def967f96 --- /dev/null +++ b/website/docs/docs/cloud/use-dbt-copilot.md @@ -0,0 +1,22 @@ +--- +title: "Use dbt Copilot" +sidebar_label: "Use dbt Copilot" +description: "Use the dbt Copilot AI engine to generate documentation, tests, and semantic models from scratch, giving you the flexibility to modify or fix generated code." +--- + +# Use dbt Copilot + +Use dbt Copilot to generate documentation, tests, and semantic models from scratch, giving you the flexibility to modify or fix generated code. To access and use this AI engine: + +1. Navigate to the dbt Cloud IDE and select a SQL model file under the **File Explorer**. + +2. In the **Console** section (under the **File Editor**), click **dbt Copilot** to view the available AI options. + +3. Select the available options to generate the YAML config: **Generate Documentation**, **Generate Tests**, or **Generate Semantic Model**. + - To generate multiple YAML configs for the same model, click each option separately. dbt Copilot intelligently saves the YAML config in the same file. + +4. Verify the AI-generated code. You can update or fix the code as needed. + +5. Click **Save As**. You should see the file changes under the **Version control** section. + + diff --git a/website/docs/docs/core/connect-data-platform/athena-setup.md b/website/docs/docs/core/connect-data-platform/athena-setup.md index 9780e86de88..825d3071ad2 100644 --- a/website/docs/docs/core/connect-data-platform/athena-setup.md +++ b/website/docs/docs/core/connect-data-platform/athena-setup.md @@ -7,7 +7,7 @@ meta: github_repo: 'dbt-labs/dbt-athena' pypi_package: 'dbt-athena-community' min_core_version: 'v1.3.0' - cloud_support: Not Supported + cloud_support: Supported min_supported_version: 'engine version 2 and 3' slack_channel_name: '#db-athena' slack_channel_link: 'https://getdbt.slack.com/archives/C013MLFR7BQ' diff --git a/website/docs/docs/core/connect-data-platform/azuresynapse-setup.md b/website/docs/docs/core/connect-data-platform/azuresynapse-setup.md index 8a4d6b61004..0a0347df9ea 100644 --- a/website/docs/docs/core/connect-data-platform/azuresynapse-setup.md +++ b/website/docs/docs/core/connect-data-platform/azuresynapse-setup.md @@ -7,7 +7,7 @@ meta: github_repo: 'Microsoft/dbt-synapse' pypi_package: 'dbt-synapse' min_core_version: 'v0.18.0' - cloud_support: Not Supported + cloud_support: Supported min_supported_version: 'Azure Synapse 10' slack_channel_name: '#db-synapse' slack_channel_link: 'https://getdbt.slack.com/archives/C01DRQ178LQ' diff --git a/website/docs/docs/dbt-versions/release-notes.md b/website/docs/docs/dbt-versions/release-notes.md index 9030ca8e722..fc8d0265072 100644 --- a/website/docs/docs/dbt-versions/release-notes.md +++ b/website/docs/docs/dbt-versions/release-notes.md @@ -37,7 +37,7 @@ Release notes are grouped by month for both multi-tenant and virtual private clo ## September 2024 -- **New**: Use dbt Assist's co-pilot feature to generate semantic model for your models, now available in beta. dbt Assist automatically generates documentation, tests, and now semantic models based on the data in your model, . To learn more, refer to [dbt Assist](/docs/cloud/dbt-assist). +- **New**: Use the dbt Copilot AI engine to generate semantic model for your models, now available in beta. dbt Copilot automatically generates documentation, tests, and now semantic models based on the data in your model, . To learn more, refer to [dbt Copilot](/docs/cloud/dbt-copilot). - **New**: Use the new recommended syntax for [defining `foreign_key` constraints](/reference/resource-properties/constraints) using `refs`, available in dbt Cloud Versionless. This will soon be released in dbt Core v1.9. This new syntax will capture dependencies and works across different environments. - **Enhancement**: You can now run [Semantic Layer commands](/docs/build/metricflow-commands) commands in the [dbt Cloud IDE](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud). The supported commands are `dbt sl list`, `dbt sl list metrics`, `dbt sl list dimension-values`, `dbt sl list saved-queries`, `dbt sl query`, `dbt sl list dimensions`, `dbt sl list entities`, and `dbt sl validate`. - **New**: Microsoft Excel, a dbt Semantic Layer integration, is now generally available. The integration allows you to connect to Microsoft Excel to query metrics and collaborate with your team. Available for [Excel Desktop](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationId=4132ecd1-425d-982d-efb4-de94ebc83f26) or [Excel Online](https://pages.store.office.com/addinsinstallpage.aspx?assetid=WA200007100&rs=en-US&correlationid=4132ecd1-425d-982d-efb4-de94ebc83f26&isWac=True). For more information, refer to [Microsoft Excel](/docs/cloud-integrations/semantic-layer/excel). @@ -108,7 +108,7 @@ Release notes are grouped by month for both multi-tenant and virtual private clo The following features are new or enhanced as part of our [dbt Cloud Launch Showcase](https://www.getdbt.com/resources/webinars/dbt-cloud-launch-showcase) event on May 14th, 2024: -- **New:** [dbt Assist](/docs/cloud/dbt-assist) is a powerful AI feature helping you generate documentation and tests, saving you time as you deliver high-quality data. Available in private beta for a subset of dbt Cloud Enterprise users and in the dbt Cloud IDE. [Register your interest](https://docs.google.com/forms/d/e/1FAIpQLScPjRGyrtgfmdY919Pf3kgqI5E95xxPXz-8JoVruw-L9jVtxg/viewform) to join the private beta. +- **New:** [dbt Copilot](/docs/cloud/dbt-copilot) is a powerful AI engine helping you generate documentation, tests, and semantic models, saving you time as you deliver high-quality data. Available in private beta for a subset of dbt Cloud Enterprise users and in the dbt Cloud IDE. [Register your interest](https://docs.google.com/forms/d/e/1FAIpQLScPjRGyrtgfmdY919Pf3kgqI5E95xxPXz-8JoVruw-L9jVtxg/viewform) to join the private beta. - **New:** The new low-code editor, now in private beta, enables less SQL-savvy analysts to create or edit dbt models through a visual, drag-and-drop experience inside of dbt Cloud. These models compile directly to SQL and are indistinguishable from other dbt models in your projects: they are version-controlled, can be accessed across projects in dbt Mesh, and integrate with dbt Explorer and the Cloud IDE. [Register your interest](https://docs.google.com/forms/d/e/1FAIpQLScPjRGyrtgfmdY919Pf3kgqI5E95xxPXz-8JoVruw-L9jVtxg/viewform) to join the private beta. diff --git a/website/sidebars.js b/website/sidebars.js index 44dceb466dd..3b93c19616a 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -290,13 +290,13 @@ const sidebarSettings = { "docs/cloud/dbt-cloud-ide/lint-format", { type: "category", - label: "dbt Assist", - link: { type: "doc", id: "docs/cloud/dbt-assist" }, + label: "dbt Copilot", + link: { type: "doc", id: "docs/cloud/dbt-copilot" }, items: [ - "docs/cloud/dbt-assist", - "docs/cloud/enable-dbt-assist", - "docs/cloud/use-dbt-assist", - "docs/cloud/dbt-assist-data", + "docs/cloud/dbt-copilot", + "docs/cloud/enable-dbt-copilot", + "docs/cloud/use-dbt-copilot", + "docs/cloud/dbt-copilot-data", ], }, ], diff --git a/website/snippets/_adapters-trusted.md b/website/snippets/_adapters-trusted.md index 3594f050897..6fc3b2b2f8f 100644 --- a/website/snippets/_adapters-trusted.md +++ b/website/snippets/_adapters-trusted.md @@ -14,7 +14,7 @@ @@ -94,8 +94,8 @@ diff --git a/website/snippets/_sl-course.md b/website/snippets/_sl-course.md index 6be9ec7e959..1400be91f37 100644 --- a/website/snippets/_sl-course.md +++ b/website/snippets/_sl-course.md @@ -3,7 +3,7 @@ Explore our [dbt Semantic Layer on-demand course](https://learn.getdbt.com/courses/semantic-layer) to learn how to define and query metrics in your dbt project. -Additionally, dive into mini-courses for querying the dbt Semantic Layer in your favorite tools: [Tableau](https://courses.getdbt.com/courses/tableau-querying-the-semantic-layer), [Hex](https://courses.getdbt.com/courses/hex-querying-the-semantic-layer), and [Mode](https://courses.getdbt.com/courses/mode-querying-the-semantic-layer). +Additionally, dive into mini-courses for querying the dbt Semantic Layer in your favorite tools: [Tableau](https://courses.getdbt.com/courses/tableau-querying-the-semantic-layer), [Excel](https://learn.getdbt.com/courses/querying-the-semantic-layer-with-excel), [Hex](https://courses.getdbt.com/courses/hex-querying-the-semantic-layer), and [Mode](https://courses.getdbt.com/courses/mode-querying-the-semantic-layer). diff --git a/website/static/img/blog/2024-10-04-iceberg-blog/2024-10-03-iceberg-support.png b/website/static/img/blog/2024-10-04-iceberg-blog/2024-10-03-iceberg-support.png new file mode 100644 index 00000000000..2b99378fa84 Binary files /dev/null and b/website/static/img/blog/2024-10-04-iceberg-blog/2024-10-03-iceberg-support.png differ diff --git a/website/static/img/blog/2024-10-04-iceberg-blog/iceberg_materialization.png b/website/static/img/blog/2024-10-04-iceberg-blog/iceberg_materialization.png new file mode 100644 index 00000000000..c20e7855858 Binary files /dev/null and b/website/static/img/blog/2024-10-04-iceberg-blog/iceberg_materialization.png differ diff --git a/website/static/img/docs/dbt-cloud/cloud-ide/dbt-copilot-doc.gif b/website/static/img/docs/dbt-cloud/cloud-ide/dbt-copilot-doc.gif new file mode 100644 index 00000000000..cca8db37a0a Binary files /dev/null and b/website/static/img/docs/dbt-cloud/cloud-ide/dbt-copilot-doc.gif differ diff --git a/website/vercel.json b/website/vercel.json index e882b50d2fc..4f5a92ccb08 100644 --- a/website/vercel.json +++ b/website/vercel.json @@ -2,6 +2,26 @@ "cleanUrls": true, "trailingSlash": false, "redirects": [ + { + "source": "/docs/cloud/dbt-assist-data", + "destination": "/docs/cloud/dbt-copilot-data", + "permanent": true + }, + { + "source": "/docs/cloud/use-dbt-assist", + "destination": "/docs/cloud/use-dbt-copilot", + "permanent": true + }, + { + "source": "/docs/cloud/enable-dbt-assist", + "destination": "/docs/cloud/enable-dbt-copilot", + "permanent": true + }, + { + "source": "/docs/cloud/dbt-assist", + "destination": "/docs/cloud/dbt-copilot", + "permanent": true + }, { "source": "/faqs/Troubleshooting/access_token_error", "destination": "/faqs/Troubleshooting/auth-expired-error",