diff --git a/contributing/content-style-guide.md b/contributing/content-style-guide.md index 4ebbf83bf5f..58f5ba2b21c 100644 --- a/contributing/content-style-guide.md +++ b/contributing/content-style-guide.md @@ -519,6 +519,7 @@ enter (in the command line) | type (in the command line) email | e-mail on dbt | on a remote server person, human | client, customer +plan(s), account | organization, customer press (a key) | hit, tap recommended limit | soft limit sign in | log in, login @@ -529,6 +530,15 @@ dbt Cloud CLI | CLI, dbt CLI dbt Core | CLI, dbt CLI +Note, let's make sure we're talking to our readers and keep them close to the content and documentation (second person). + +For example, to explain that a feature is available on a particular dbt Cloud plan, you can use: +- “XYZ is available on Enterprise plans” +- “If you're on an Enterprise plan, you can access XYZ..” +- "Enterprise plans can access XYZ..." to keep users closer to the documentation. + +This will signal users to check their plan or account status independently. + ## Links Links embedded in the documentation are about trust. Users trust that we will lead them to sites or pages related to their reading content. In order to maintain that trust, it's important that links are transparent, up-to-date, and lead to legitimate resources. diff --git a/website/blog/2023-08-01-announcing-materialized-views.md b/website/blog/2023-08-01-announcing-materialized-views.md index eb9716e73a5..3b801c7c719 100644 --- a/website/blog/2023-08-01-announcing-materialized-views.md +++ b/website/blog/2023-08-01-announcing-materialized-views.md @@ -21,7 +21,7 @@ and updates on how to test MVs. The year was 2020. I was a kitten-only household, and dbt Labs was still Fishtown Analytics. A enterprise customer I was working with, Jetblue, asked me for help running their dbt models every 2 minutes to meet a 5 minute SLA. -After getting over the initial terror, we talked through the use case and soon realized there was a better option. Together with my team, I created [lambda views](https://discourse.getdbt.com/t/how-to-create-near-real-time-models-with-just-dbt-sql/1457%20?) to meet the need. +After getting over the initial terror, we talked through the use case and soon realized there was a better option. Together with my team, I created [lambda views](https://discourse.getdbt.com/t/how-to-create-near-real-time-models-with-just-dbt-sql/1457) to meet the need. Flash forward to 2023. I’m writing this as my giant dog snores next to me (don’t worry the cats have multiplied as well). Jetblue has outgrown lambda views due to performance constraints (a view can only be so performant) and we are at another milestone in dbt’s journey to support streaming. What. a. time. @@ -32,8 +32,8 @@ Today we are announcing that we now support Materialized Views in dbt. So, what Materialized views are now an out of the box materialization in your dbt project once you upgrade to the latest version of dbt v1.6 on these following adapters: - [dbt-postgres](/reference/resource-configs/postgres-configs#materialized-views) -- [dbt-redshift](reference/resource-configs/redshift-configs#materialized-views) -- [dbt-snowflake](reference/resource-configs/snowflake-configs#dynamic-tables) +- [dbt-redshift](/reference/resource-configs/redshift-configs#materialized-views) +- [dbt-snowflake](/reference/resource-configs/snowflake-configs#dynamic-tables) - [dbt-databricks](/reference/resource-configs/databricks-configs#materialized-views-and-streaming-tables) - [dbt-materialize*](/reference/resource-configs/materialize-configs#incremental-models-materialized-views) - [dbt-trino*](/reference/resource-configs/trino-configs#materialized-view) @@ -227,4 +227,4 @@ Depending on how you orchestrate your materialized views, you can either run the ## Conclusion -Well, I’m excited for everyone to remove the lines in your packages.yml that installed your experimental package (at least if you’re using it for MVs) and start to get your hands dirty. We are still new in our journey and I look forward to hearing all the things you are creating and how we can better our best practices in this. \ No newline at end of file +Well, I’m excited for everyone to remove the lines in your packages.yml that installed your experimental package (at least if you’re using it for MVs) and start to get your hands dirty. We are still new in our journey and I look forward to hearing all the things you are creating and how we can better our best practices in this. diff --git a/website/blog/2023-12-15-serverless-free-tier-data-stack-with-dlt-and-dbt-core.md b/website/blog/2023-12-15-serverless-free-tier-data-stack-with-dlt-and-dbt-core.md new file mode 100644 index 00000000000..7e63b6e1c6d --- /dev/null +++ b/website/blog/2023-12-15-serverless-free-tier-data-stack-with-dlt-and-dbt-core.md @@ -0,0 +1,160 @@ +--- +title: Serverless, free-tier data stack with dlt + dbt core. +description: "In this article, Euan shares his personal project to fetch property price data during his and his partner's house-hunting process, and how he created a serverless free-tier data stack by using Google Cloud Functions to run data ingestion tool dlt alongside dbt for transformation." +slug: serverless-dlt-dbt-stack + +authors: [euan_johnston] + +hide_table_of_contents: false + +date: 2023-01-15 +is_featured: false +--- + + + +## The problem, the builder and tooling + +**The problem**: My partner and I are considering buying a property in Portugal. There is no reference data for the real estate market here - how many houses are being sold, for what price? Nobody knows except the property office and maybe the banks, and they don’t readily divulge this information. The only data source we have is Idealista, which is a portal where real estate agencies post ads. + +Unfortunately, there are significantly fewer properties than ads - it seems many real estate companies re-post the same ad that others do, with intentionally different data and often misleading bits of info. The real estate agencies do this so the interested parties reach out to them for clarification, and from there they can start a sales process. At the same time, the website with the ads is incentivised to allow this to continue as they get paid per ad, not per property. + +**The builder:** I’m a data freelancer who deploys end to end solutions, so when I have a data problem, I cannot just let it go. + +**The tools:** I want to be able to run my project on [Google Cloud Functions](https://cloud.google.com/functions) due to the generous free tier. [dlt](https://dlthub.com/) is a new Python library for declarative data ingestion which I have wanted to test for some time. Finally, I will use dbt Core for transformation. + +## The starting point + +If I want to have reliable information on the state of the market I will need to: + +- Grab the messy data from Idealista and historize it. +- Deduplicate existing listings. +- Try to infer what listings sold for how much. + +Once I have deduplicated listings with some online history, I can get an idea: + +- How expensive which properties are. +- How fast they get sold, hopefully a signal of whether they are “worth it” or not. + +## Towards a solution + +The solution has pretty standard components: + +- An EtL pipeline. The little t stands for normalisation, such as transforming strings to dates or unpacking nested structures. This is handled by dlt functions written in Python. +- A transformation layer taking the source data loaded by my dlt functions and creating the tables necessary, handled by dbt. +- Due to the complexity of deduplication, I needed to add a human element to confirm the deduplication in Google Sheets. + +These elements are reflected in the diagram below and further clarified in greater detail later in the article: + + + +### Ingesting the data + +For ingestion, I use a couple of sources: + +First, I ingest home listings from the Idealista API, accessed through [API Dojo's freemium wrapper](https://rapidapi.com/apidojo/api/idealista2). The dlt pipeline I created for ingestion is in [this repo](https://github.com/euanjohnston-dev/Idealista_pipeline). + +After an initial round of transformation (described in the next section), the deduplicated data is loaded into BigQuery where I can query it from the Google Sheets client and manually review the deduplication. + +When I'm happy with the results, I use the [ready-made dlt Sheets source connector](https://dlthub.com/docs/dlt-ecosystem/verified-sources/google_sheets) to pull the data back into BigQuery, [as defined here](https://github.com/euanjohnston-dev/gsheets_check_pipeline). + +### Transforming the data + +For transforming I use my favorite solution, dbt Core. For running and orchestrating dbt on Cloud Functions, I am using dlt’s dbt Core runner. The benefit of the runner in this context is that I can re-use the same credential setup, instead of creating a separate profiles.yml file. + +This is the package I created: + +### Production-readying the pipeline + +To make the pipeline more “production ready”, I made some improvements: + +- Using a credential store instead of hard-coding passwords, in this case Google Secret Manager. +- Be notified when the pipeline runs and what the outcome is. For this I sent data to Slack via a dlt decorator that posts the error on failure and the metadata on success. + +```python +from dlt.common.runtime.slack import send_slack_message + +def notify_on_completion(hook): + def decorator(func): + def wrapper(*args, **kwargs): + try: + load_info = func(*args, **kwargs) + message = f"Function {func.__name__} completed successfully. Load info: {load_info}" + send_slack_message(hook, message) + return load_info + except Exception as e: + message = f"Function {func.__name__} failed. Error: {str(e)}" + send_slack_message(hook, message) + raise + return wrapper + return decorator +``` + +## The outcome + +The outcome was first and foremost a visualisation highlighting the unique properties available in my specific area of search. The map shown on the left of the page gives a live overview of location, number of duplicates (bubble size) and price (bubble colour) which can amongst other features be filtered using the sliders on the right. This represents a much better decluttered solution from which to observe the actual inventory available. + + + +Further charts highlight additional metrics which – now that deduplication is complete – can be accurately measured including most importantly, the development over time of “average price/square metre” and those properties which have been inferred to have been sold. + +### Next steps + +This version was very much about getting a base from which to analyze the properties for my own personal use case. + +In terms of further development which could take place, I have had interest from people to run the solution on their own specific target area. + +For this to work at scale I would need a more robust method to deal with duplicate attribution, which is a difficult problem as real estate agencies intentionally change details like number of rooms or surface area. + +Perhaps this is a problem ML or GPT could solve equally well as a human, given the limited options available. + +## Learnings and conclusion + +The data problem itself was an eye opener into the real-estate market. It’s a messy market full of unknowns and noise, which adds a significant purchase risk to first time buyers. + +Tooling wise, it was surprising how quick it was to set everything up. dlt integrates well with dbt and enables fast and simple data ingestion, making this project simpler than I thought it would be. + +### dlt + +Good: + +- As a big fan of dbt I love how seamlessly the two solutions complement one another. dlt handles the data cleaning and normalisation automatically so I can focus on curating and modelling it in dbt. While the automatic unpacking leaves some small adjustments for the analytics engineer, it’s much better than cleaning and typing json in the database or in custom python code. +- When creating my first dummy pipeline I used duckdb. It felt like a great introduction into how simple it is to get started and provided a solid starting block before developing something for the cloud. + +Bad: + +- I did have a small hiccup with the google sheets connector assuming an oauth authentication over my desired sdk but this was relatively easy to rectify. (explicitly stating GcpServiceAccountCredentials in the init.py file for the source). +- Using both a verified source in the gsheets connector and building my own from Rapid API endpoints seemed equally intuitive. However I would have wanted more documentation on how to run these 2 pipelines in the same script with the dbt pipeline. + +### dbt + +No surprises there. I developed the project locally, and to deploy to cloud functions I injected credentials to dbt via the dlt runner. This meant I could re-use the setup I did for the other dlt pipelines. + +```python +def dbt_run(): + # make an authenticated connection with dlt to the dwh + pipeline = dlt.pipeline( + pipeline_name='dbt_pipeline', + destination='bigquery', # credentials read from env + dataset_name='dbt' + ) + # make a venv in case we have lib conflicts between dlt and current env + venv = dlt.dbt.get_venv(pipeline) + # package the pipeline, dbt package and env + dbt = dlt.dbt.package(pipeline, "dbt/property_analytics", venv=venv) + # and run it + models = dbt.run_all() + # show outcome + for m in models: + print(f"Model {m.model_name} materialized in {m.time} with status {m.status} and message {m.message}" +``` + +### Cloud functions + +While I had used cloud functions before, I had never previously set them up for dbt and I was able to easily follow dlt’s docs to run the pipelines there. Cloud functions is a great solution to cheaply run small scale pipelines and my running cost of the project is a few cents a month. If the insights drawn from the project help us save even 1% of a house price, the project will have been a success. + +### To sum up + +dlt feels like the perfect solution for anyone who has scratched the surface of python development. To be able to have schemas ready for transformation in such a short space of time is truly… transformational. As a freelancer, being able to accelerate the development of pipelines is a huge benefit within companies who are often frustrated with the amount of time it takes to start ‘showing value’. + +I’d welcome the chance to discuss what’s been built to date or collaborate on any potential further development in the comments below. diff --git a/website/blog/authors.yml b/website/blog/authors.yml index a3548575b6e..4aa33773988 100644 --- a/website/blog/authors.yml +++ b/website/blog/authors.yml @@ -187,6 +187,16 @@ emily_riederer: - icon: fa-readme url: https://emilyriederer.com +euan_johnston: + image_url: /img/blog/authors/ejohnston.png + job_title: Freelance Business Intelligence manager + name: Euan Johnston + links: + - icon: fa-linkedin + url: https://www.linkedin.com/in/euan-johnston-610a05a8/ + - icon: fa-github + url: https://github.com/euanjohnston-dev + grace_goheen: image_url: /img/blog/authors/grace-goheen.jpeg job_title: Analytics Engineer diff --git a/website/docs/best-practices/materializations/materializations-guide-2-available-materializations.md b/website/docs/best-practices/materializations/materializations-guide-2-available-materializations.md index 9910e5f8269..1096c07cde7 100644 --- a/website/docs/best-practices/materializations/materializations-guide-2-available-materializations.md +++ b/website/docs/best-practices/materializations/materializations-guide-2-available-materializations.md @@ -9,12 +9,12 @@ hoverSnippet: Read this guide to understand the different types of materializati Views and tables and incremental models, oh my! In this section we’ll start getting our hands dirty digging into the three basic materializations that ship with dbt. They are considerably less scary and more helpful than lions, tigers, or bears — although perhaps not as cute (can data be cute? We at dbt Labs think so). We’re going to define, implement, and explore: -- 🔍 **views** -- ⚒️ **tables** -- 📚 **incremental model** +- 🔍 [**views**](/docs/build/materializations#view) +- ⚒️ [**tables**](/docs/build/materializations#table) +- 📚 [**incremental model**](/docs/build/materializations#incremental) :::info -👻 There is a fourth default materialization available in dbt called **ephemeral materialization**. It is less broadly applicable than the other three, and better deployed for specific use cases that require weighing some tradeoffs. We chose to leave it out of this guide and focus on the three materializations that will power 99% of your modeling needs. +👻 There is a fourth default materialization available in dbt called [**ephemeral materialization**](/docs/build/materializations#ephemeral). It is less broadly applicable than the other three, and better deployed for specific use cases that require weighing some tradeoffs. We chose to leave it out of this guide and focus on the three materializations that will power 99% of your modeling needs. ::: **Views and Tables are the two basic categories** of object that we can create across warehouses. They exist natively as types of objects in the warehouse, as you can see from this screenshot of Snowflake (depending on your warehouse the interface will look a little different). **Incremental models** and other materializations types are a little bit different. They tell dbt to **construct tables in a special way**. diff --git a/website/docs/docs/build/about-metricflow.md b/website/docs/docs/build/about-metricflow.md index ea2efcabf06..77b19a02d79 100644 --- a/website/docs/docs/build/about-metricflow.md +++ b/website/docs/docs/build/about-metricflow.md @@ -63,6 +63,7 @@ Metrics, which is a key concept, are functions that combine measures, constraint MetricFlow supports different metric types: +- [Conversion](/docs/build/conversion) — Helps you track when a base event and a subsequent conversion event occurs for an entity within a set time period. - [Cumulative](/docs/build/cumulative) — Aggregates a measure over a given window. - [Derived](/docs/build/derived) — An expression of other metrics, which allows you to do calculations on top of metrics. - [Ratio](/docs/build/ratio) — Create a ratio out of two measures, like revenue per customer. diff --git a/website/docs/docs/build/conversion-metrics.md b/website/docs/docs/build/conversion-metrics.md new file mode 100644 index 00000000000..2238655fbe0 --- /dev/null +++ b/website/docs/docs/build/conversion-metrics.md @@ -0,0 +1,352 @@ +--- +title: "Conversion metrics" +id: conversion +description: "Use Conversion metrics to measure conversion events." +sidebar_label: Conversion +tags: [Metrics, Semantic Layer] +--- + +Conversion metrics allow you to define when a base event and a subsequent conversion event happen for a specific entity within some time range. + +For example, using conversion metrics allows you to track how often a user (entity) completes a visit (base event) and then makes a purchase (conversion event) within 7 days (time window). You would need to add a time range and an entity to join. + +Conversion metrics are different from [ratio metrics](/docs/build/ratio) because you need to include an entity in the pre-aggregated join. + +## Parameters + +The specification for conversion metrics is as follows: + +| Parameter | Description | Type | Required/Optional | +| --- | --- | --- | --- | +| `name` | The name of the metric. | String | Required | +| `description` | The description of the metric. | String | Optional | +| `type` | The type of metric (such as derived, ratio, and so on.). In this case, set as 'conversion' | String | Required | +| `label` | Displayed value in downstream tools. | String | Required | +| `type_params` | Specific configurations for each metric type. | List | Required | +| `conversion_type_params` | Additional configuration specific to conversion metrics. | List | Required | +| `entity` | The entity for each conversion event. | Entity | Required | +| `calculation` | Method of calculation. Either `conversion_rate` or `conversions`. Defaults to `conversion_rate`. | String | Optional | +| `base_measure` | The base conversion event measure. | Measure | Required | +| `conversion_measure` | The conversion event measure. | Measure | Required | +| `window` | The time window for the conversion event, such as 7 days, 1 week, 3 months. Defaults to infinity. | String | Optional | +| `constant_properties` | List of constant properties. | List | Optional | +| `base_property` | The property from the base semantic model that you want to hold constant. | Entity or Dimension | Optional | +| `conversion_property` | The property from the conversion semantic model that you want to hold constant. | Entity or Dimension | Optional | + +The following code example displays the complete specification for conversion metrics and details how they're applied: + +```yaml +metrics: + - name: The metric name # Required + description: the metric description # Optional + type: conversion # Required + label: # Required + type_params: # Required + conversion_type_params: # Required + entity: ENTITY # Required + calculation: CALCULATION_TYPE # Optional. default: conversion_rate. options: conversions(buys) or conversion_rate (buys/visits), and more to come. + base_measure: MEASURE # Required + conversion_measure: MEASURE # Required + window: TIME_WINDOW # Optional. default: infinity. window to join the two events. Follows a similar format as time windows elsewhere (such as 7 days) + constant_properties: # Optional. List of constant properties default: None + - base_property: DIMENSION or ENTITY # Required. A reference to a dimension/entity of the semantic model linked to the base_measure + conversion_property: DIMENSION or ENTITY # Same as base above, but to the semantic model of the conversion_measure +``` + +## Conversion metric example + +The following example will measure conversions from website visits (`VISITS` table) to order completions (`BUYS` table) and calculate a conversion metric for this scenario step by step. + +Suppose you have two semantic models, `VISITS` and `BUYS`: + +- The `VISITS` table represents visits to an e-commerce site. +- The `BUYS` table represents someone completing an order on that site. + +The underlying tables look like the following: + +`VISITS`
+Contains user visits with `USER_ID` and `REFERRER_ID`. + +| DS | USER_ID | REFERRER_ID | +| --- | --- | --- | +| 2020-01-01 | bob | facebook | +| 2020-01-04 | bob | google | +| 2020-01-07 | bob | amazon | + +`BUYS`
+Records completed orders with `USER_ID` and `REFERRER_ID`. + +| DS | USER_ID | REFERRER_ID | +| --- | --- | --- | +| 2020-01-02 | bob | facebook | +| 2020-01-07 | bob | amazon | + +Next, define a conversion metric as follows: + +```yaml +- name: visit_to_buy_conversion_rate_7d + description: "Conversion rate from visiting to transaction in 7 days" + type: conversion + label: Visit to Buy Conversion Rate (7-day window) + type_params: + conversion_type_params: + base_measure: visits + conversion_measure: sellers + entity: user + window: 7 days +``` + +To calculate the conversion, link the `BUYS` event to the nearest `VISITS` event (or closest base event). The following steps explain this process in more detail: + +### Step 1: Join `VISITS` and `BUYS` + +This step joins the `BUYS` table to the `VISITS` table and gets all combinations of visits-buys events that match the join condition where buys occur within 7 days of the visit (any rows that have the same user and a buy happened at most 7 days after the visit). + +The SQL generated in these steps looks like the following: + +```sql +select + v.ds, + v.user_id, + v.referrer_id, + b.ds, + b.uuid, + 1 as buys +from visits v +inner join ( + select *, uuid_string() as uuid from buys -- Adds a uuid column to uniquely identify the different rows +) b +on +v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 day' +``` + +The dataset returns the following (note that there are two potential conversion events for the first visit): + +| V.DS | V.USER_ID | V.REFERRER_ID | B.DS | UUID | BUYS | +| --- | --- | --- | --- | --- | --- | +| 2020-01-01 | bob | facebook | 2020-01-02 | uuid1 | 1 | +| 2020-01-01 | bob | facebook | 2020-01-07 | uuid2 | 1 | +| 2020-01-04 | bob | google | 2020-01-07 | uuid2 | 1 | +| 2020-01-07 | bob | amazon | 2020-01-07 | uuid2 | 1 | + +### Step 2: Refine with window function + +Instead of returning the raw visit values, use window functions to link conversions to the closest base event. You can partition by the conversion source and get the `first_value` ordered by `visit ds`, descending to get the closest base event from the conversion event: + +```sql +select + first_value(v.ds) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as v_ds, + first_value(v.user_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as user_id, + first_value(v.referrer_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as referrer_id, + b.ds, + b.uuid, + 1 as buys +from visits v +inner join ( + select *, uuid_string() as uuid from buys +) b +on +v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 day' + +``` + +The dataset returns the following: + +| V.DS | V.USER_ID | V.REFERRER_ID | B.DS | UUID | BUYS | +| --- | --- | --- | --- | --- | --- | +| 2020-01-01 | bob | facebook | 2020-01-02 | uuid1 | 1 | +| 2020-01-07 | bob | amazon | 2020-01-07 | uuid2 | 1 | +| 2020-01-07 | bob | amazon | 2020-01-07 | uuid2 | 1 | +| 2020-01-07 | bob | amazon | 2020-01-07 | uuid2 | 1 | + +This workflow links the two conversions to the correct visit events. Due to the join, you end up with multiple combinations, leading to fanout results. After applying the window function, duplicates appear. + +To resolve this and eliminate duplicates, use a distinct select. The UUID also helps identify which conversion is unique. The next steps provide more detail on how to do this. + +### Step 3: Remove duplicates + +Instead of regular select used in the [Step 2](#step-2-refine-with-window-function), use a distinct select to remove the duplicates: + +```sql +select distinct + first_value(v.ds) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as v_ds, + first_value(v.user_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as user_id, + first_value(v.referrer_id) over (partition by b.ds, b.user_id, b.uuid order by v.ds desc) as referrer_id, + b.ds, + b.uuid, + 1 as buys +from visits v +inner join ( + select *, uuid_string() as uuid from buys +) b +on +v.user_id = b.user_id and v.ds <= b.ds and v.ds > b.ds - interval '7 day'; +``` + +The dataset returns the following: + +| V.DS | V.USER_ID | V.REFERRER_ID | B.DS | UUID | BUYS | +| --- | --- | --- | --- | --- | --- | +| 2020-01-01 | bob | facebook | 2020-01-02 | uuid1 | 1 | +| 2020-01-07 | bob | amazon | 2020-01-07 | uuid2 | 1 | + +You now have a dataset where every conversion is connected to a visit event. To proceed: + +1. Sum up the total conversions in the "conversions" table. +2. Combine this table with the "opportunities" table, matching them based on group keys. +3. Calculate the conversion rate. + +### Step 4: Aggregate and calculate + +Now that you’ve tied each conversion event to a visit, you can calculate the aggregated conversions and opportunities measures. Then, you can join them to calculate the actual conversion rate. The SQL to calculate the conversion rate is as follows: + +```sql +select + coalesce(subq_3.metric_time__day, subq_13.metric_time__day) as metric_time__day, + cast(max(subq_13.buys) as double) / cast(nullif(max(subq_3.visits), 0) as double) as visit_to_buy_conversion_rate_7d +from ( -- base measure + select + metric_time__day, + sum(visits) as mqls + from ( + select + date_trunc('day', first_contact_date) as metric_time__day, + 1 as visits + from visits + ) subq_2 + group by + metric_time__day +) subq_3 +full outer join ( -- conversion measure + select + metric_time__day, + sum(buys) as sellers + from ( + -- ... + -- The output of this subquery is the table produced in Step 3. The SQL is hidden for legibility. + -- To see the full SQL output, add --explain to your conversion metric query. + ) subq_10 + group by + metric_time__day +) subq_13 +on + subq_3.metric_time__day = subq_13.metric_time__day +group by + metric_time__day +``` + +### Additional settings + +Use the following additional settings to customize your conversion metrics: + +- **Null conversion values:** Set null conversions to zero using `fill_nulls_with`. +- **Calculation type:** Choose between showing raw conversions or conversion rate. +- **Constant property:** Add conditions for specific scenarios to join conversions on constant properties. + + + + +To return zero in the final data set, you can set the value of a null conversion event to zero instead of null. You can add the `fill_nulls_with` parameter to your conversion metric definition like this: + +```yaml +- name: vist_to_buy_conversion_rate_7_day_window + description: "Conversion rate from viewing a page to making a purchase" + type: conversion + label: Visit to Seller Conversion Rate (7 day window) + type_params: + conversion_type_params: + calculation: conversions + base_measure: visits + conversion_measure: + name: buys + fill_nulls_with: 0 + entity: user + window: 7 days + +``` + +This will return the following results: + + + + + + + +Use the conversion calculation parameter to either show the raw number of conversions or the conversion rate. The default value is the conversion rate. + +You can change the default to display the number of conversions by setting the `calculation: conversion` parameter: + +```yaml +- name: visit_to_buy_conversions_1_week_window + description: "Visit to Buy Conversions" + type: conversion + label: Visit to Buy Conversions (1 week window) + type_params: + conversion_type_params: + calculation: conversions + base_measure: visits + conversion_measure: + name: buys + fill_nulls_with: 0 + entity: user + window: 1 week +``` + + + + + +*Refer to [Amplitude's blog posts on constant properties](https://amplitude.com/blog/holding-constant) to learn about this concept.* + +You can add a constant property to a conversion metric to count only those conversions where a specific dimension or entity matches in both the base and conversion events. + +For example, if you're at an e-commerce company and want to answer the following question: +- _How often did visitors convert from `View Item Details` to `Complete Purchase` with the same product in each step?_
+ - This question is tricky to answer because users could have completed these two conversion milestones across many products. For example, they may have viewed a pair of shoes, then a T-shirt, and eventually checked out with a bow tie. This would still count as a conversion, even though the conversion event only happened for the bow tie. + +Back to the initial questions, you want to see how many customers viewed an item detail page and then completed a purchase for the _same_ product. + +In this case, you want to set `product_id` as the constant property. You can specify this in the configs as follows: + +```yaml +- name: view_item_detail_to_purchase_with_same_item + description: "Conversion rate for users who viewed the item detail page and purchased the item" + type: Conversion + label: View Item Detail > Purchase + type_params: + conversion_type_params: + calculation: conversions + base_measure: view_item_detail + conversion_measure: purchase + entity: user + window: 1 week + constant_properties: + - base_property: product + conversion_property: product +``` + +You will add an additional condition to the join to make sure the constant property is the same across conversions. + +```sql +select distinct + first_value(v.ds) over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as ds, + first_value(v.user_id) over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as user_id, + first_value(v.referrer_id) over (partition by buy_source.ds, buy_source.user_id, buy_source.session_id order by v.ds desc rows between unbounded preceding and unbounded following) as referrer_id, + buy_source.uuid, + 1 as buys +from {{ source_schema }}.fct_view_item_details v +inner join + ( + select *, {{ generate_random_uuid() }} as uuid from {{ source_schema }}.fct_purchases + ) buy_source +on + v.user_id = buy_source.user_id + and v.ds <= buy_source.ds + and v.ds > buy_source.ds - interval '7 day' + and buy_source.product_id = v.product_id --Joining on the constant property product_id + +``` + +
+
diff --git a/website/docs/docs/build/derived-metrics.md b/website/docs/docs/build/derived-metrics.md index fc7961bbe7f..7f01736d2b3 100644 --- a/website/docs/docs/build/derived-metrics.md +++ b/website/docs/docs/build/derived-metrics.md @@ -21,7 +21,7 @@ In MetricFlow, derived metrics are metrics created by defining an expression usi | `metrics` | The list of metrics used in the derived metrics. | Required | | `alias` | Optional alias for the metric that you can use in the expr. | Optional | | `filter` | Optional filter to apply to the metric. | Optional | -| `offset_window` | Set the period for the offset window, such as 1 month. This will return the value of the metric one month from the metric time. | Required | +| `offset_window` | Set the period for the offset window, such as 1 month. This will return the value of the metric one month from the metric time. | Optional | The following displays the complete specification for derived metrics, along with an example. @@ -37,7 +37,7 @@ metrics: - name: the name of the metrics. must reference a metric you have already defined # Required alias: optional alias for the metric that you can use in the expr # Optional filter: optional filter to apply to the metric # Optional - offset_window: set the period for the offset window, such as 1 month. This will return the value of the metric one month from the metric time. # Required + offset_window: set the period for the offset window, such as 1 month. This will return the value of the metric one month from the metric time. # Optional ``` ## Derived metrics example diff --git a/website/docs/docs/build/metrics-overview.md b/website/docs/docs/build/metrics-overview.md index b6ccc1c3b9c..ea602d0953f 100644 --- a/website/docs/docs/build/metrics-overview.md +++ b/website/docs/docs/build/metrics-overview.md @@ -14,7 +14,7 @@ The keys for metrics definitions are: | Parameter | Description | Type | | --------- | ----------- | ---- | | `name` | Provide the reference name for the metric. This name must be unique amongst all metrics. | Required | -| `description` | Provide the description for your metric. | Optional | +| `description` | Describe your metric. | Optional | | `type` | Define the type of metric, which can be `simple`, `ratio`, `cumulative`, or `derived`. | Required | | `type_params` | Additional parameters used to configure metrics. `type_params` are different for each metric type. | Required | | `config` | Provide the specific configurations for your metric. | Optional | @@ -48,12 +48,34 @@ This page explains the different supported metric types you can add to your dbt - [Ratio](#ratio-metrics) — Create a ratio out of two measures. --> +### Conversion metrics + +[Conversion metrics](/docs/build/conversion) help you track when a base event and a subsequent conversion event occurs for an entity within a set time period. + +```yaml +metrics: + - name: The metric name # Required + description: the metric description # Optional + type: conversion # Required + label: # Required + type_params: # Required + conversion_type_params: # Required + entity: ENTITY # Required + calculation: CALCULATION_TYPE # Optional. default: conversion_rate. options: conversions(buys) or conversion_rate (buys/visits), and more to come. + base_measure: MEASURE # Required + conversion_measure: MEASURE # Required + window: TIME_WINDOW # Optional. default: infinity. window to join the two events. Follows a similar format as time windows elsewhere (such as 7 days) + constant_properties: # Optional. List of constant properties default: None + - base_property: DIMENSION or ENTITY # Required. A reference to a dimension/entity of the semantic model linked to the base_measure + conversion_property: DIMENSION or ENTITY # Same as base above, but to the semantic model of the conversion_measure +``` + ### Cumulative metrics -[Cumulative metrics](/docs/build/cumulative) aggregate a measure over a given window. If no window is specified, the window would accumulate the measure over all time. **Note**, you will need to create the [time spine model](/docs/build/metricflow-time-spine) before you add cumulative metrics. +[Cumulative metrics](/docs/build/cumulative) aggregate a measure over a given window. If no window is specified, the window will accumulate the measure over all of the recorded time period. Note that you will need to create the [time spine model](/docs/build/metricflow-time-spine) before you add cumulative metrics. ```yaml -# Cumulative metrics aggregate a measure over a given window. The window is considered infinite if no window parameter is passed (accumulate the measure over all time) +# Cumulative metrics aggregate a measure over a given window. The window is considered infinite if no window parameter is passed (accumulate the measure over all of time) metrics: - name: wau_rolling_7 owners: @@ -66,6 +88,7 @@ metrics: window: 7 days ``` + ### Derived metrics [Derived metrics](/docs/build/derived) are defined as an expression of other metrics. Derived metrics allow you to do calculations on top of metrics. @@ -104,7 +127,7 @@ metrics: ### Ratio metrics -[Ratio metrics](/docs/build/ratio) involve a numerator metric and a denominator metric. A `constraint` string can be applied, to both numerator and denominator, or applied separately to the numerator or denominator. +[Ratio metrics](/docs/build/ratio) involve a numerator metric and a denominator metric. A `constraint` string can be applied to both the numerator and denominator or separately to the numerator or denominator. ```yaml # Ratio Metric @@ -170,9 +193,6 @@ You can set more metadata for your metrics, which can be used by other tools lat - **Description** — Write a detailed description of the metric. - - - ## Related docs - [Semantic models](/docs/build/semantic-models) diff --git a/website/docs/docs/cloud/cloud-cli-installation.md b/website/docs/docs/cloud/cloud-cli-installation.md index 7d459cdd91d..edf6511d4b8 100644 --- a/website/docs/docs/cloud/cloud-cli-installation.md +++ b/website/docs/docs/cloud/cloud-cli-installation.md @@ -150,7 +150,7 @@ If you already have dbt Core installed, the dbt Cloud CLI may conflict. Here are - **Prevent conflicts**
Use both the dbt Cloud CLI and dbt Core with `pip` and create a new virtual environment.

- **Use both dbt Cloud CLI and dbt Core with brew or native installs**
If you use Homebrew, consider aliasing the dbt Cloud CLI as "dbt-cloud" to avoid conflict. For more details, check the [FAQs](#faqs) if your operating system experiences path conflicts.

-- **Reverting back to dbt Core from the dbt Cloud CLI**
+- **Reverting to dbt Core from the dbt Cloud CLI**
If you've already installed the dbt Cloud CLI and need to switch back to dbt Core:
- Uninstall the dbt Cloud CLI using the command: `pip uninstall dbt` - Reinstall dbt Core using the following command, replacing "adapter_name" with the appropriate adapter name: @@ -223,7 +223,7 @@ During the public preview period, we recommend updating before filing a bug repo -To update the dbt Cloud CLI, run `brew upgrade dbt`. (You can also use `brew install dbt`). +To update the dbt Cloud CLI, run `brew update` and then `brew upgrade dbt`. @@ -235,7 +235,7 @@ To update, follow the same process explained in [Windows](/docs/cloud/cloud-cli- -To update, follow the same process explained in [Windows](/docs/cloud/cloud-cli-installation?install=linux#install-dbt-cloud-cli) and replace the existing `dbt` executable with the new one. +To update, follow the same process explained in [Linux](/docs/cloud/cloud-cli-installation?install=linux#install-dbt-cloud-cli) and replace the existing `dbt` executable with the new one. @@ -251,10 +251,14 @@ To update: ## Using VS Code extensions -Visual Studio (VS) Code extensions enhance command line tools by adding extra functionalities. The dbt Cloud CLI is fully compatible with dbt Core, however it doesn't support some dbt Core APIs required by certain tools, for example VS Code extensions. +Visual Studio (VS) Code extensions enhance command line tools by adding extra functionalities. The dbt Cloud CLI is fully compatible with dbt Core, however, it doesn't support some dbt Core APIs required by certain tools, for example, VS Code extensions. -To use these extensions, such as dbt-power-user, with the dbt Cloud CLI, you can install it using Homebrew (along with dbt Core) and create an alias to run the dbt Cloud CLI as `dbt-cloud`. This allows dbt-power-user to continue to invoke dbt Core under the hood, alongside the dbt Cloud CLI. +You can use extensions like [dbt-power-user](https://www.dbt-power-user.com/) with the dbt Cloud CLI by following these steps: +- [Install](/docs/cloud/cloud-cli-installation?install=brew) it using Homebrew along with dbt Core. +- [Create an alias](#faqs) to run the dbt Cloud CLI as `dbt-cloud`. + +This setup allows dbt-power-user to continue to work with dbt Core in the background, alongside the dbt Cloud CLI. ## FAQs diff --git a/website/docs/docs/cloud/configure-cloud-cli.md b/website/docs/docs/cloud/configure-cloud-cli.md index d6fca00cf25..a442a6e6ad1 100644 --- a/website/docs/docs/cloud/configure-cloud-cli.md +++ b/website/docs/docs/cloud/configure-cloud-cli.md @@ -66,9 +66,8 @@ Once you install the dbt Cloud CLI, you need to configure it to connect to a dbt ```yaml # dbt_project.yml name: - version: - ... + # Your project configs... dbt-cloud: project-id: PROJECT_ID @@ -86,6 +85,7 @@ To set environment variables in the dbt Cloud CLI for your dbt project: 2. Then select **Profile Settings**, then **Credentials**. 3. Click on your project and scroll to the **Environment Variables** section. 4. Click **Edit** on the lower right and then set the user-level environment variables. + - Note, when setting up the [dbt Semantic Layer](/docs/use-dbt-semantic-layer/dbt-sl), using [environment variables](/docs/build/environment-variables) like `{{env_var('DBT_WAREHOUSE')}}` is not supported. You should use the actual credentials instead. ## Use the dbt Cloud CLI diff --git a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md index 63786f40bd8..f07720a9771 100644 --- a/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md +++ b/website/docs/docs/cloud/manage-access/cloud-seats-and-users.md @@ -11,7 +11,7 @@ In dbt Cloud, _licenses_ are used to allocate users to your account. There are t - **Developer** — Granted access to the Deployment and [Development](/docs/cloud/dbt-cloud-ide/develop-in-the-cloud) functionality in dbt Cloud. - **Read-Only** — Intended to view the [artifacts](/docs/deploy/artifacts) created in a dbt Cloud account. Read-Only users can receive job notifications but not configure them. -- **IT** — Can manage users, groups, and licenses, among other permissions. IT users can receive job notifications but not configure them. Available on Enterprise and Team plans only. +- **IT** — Can manage users, groups, and licenses, among other permissions. IT users can receive job notifications but not configure them. Available on Enterprise and Team plans only. In Enterprise plans, the IT license type grants access equivalent to the ['Security admin' and 'Billing admin' roles](/docs/cloud/manage-access/enterprise-permissions#account-permissions-for-account-roles). The user's assigned license determines the specific capabilities they can access in dbt Cloud. @@ -29,6 +29,12 @@ The user's assigned license determines the specific capabilities they can access ## Licenses +:::tip Licenses or Permission sets + +The user's license type always overrides their assigned [Enterprise permission](/docs/cloud/manage-access/enterprise-permissions) set. This means that even if a user belongs to a dbt Cloud group with 'Account Admin' permissions, having a 'Read-Only' license would still prevent them from performing administrative actions on the account. + +::: + Each dbt Cloud plan comes with a base number of Developer, IT, and Read-Only licenses. You can add or remove licenses by modifying the number of users in your account settings. If you have a Developer plan account and want to add more people to your team, you'll need to upgrade to the Team plan. Refer to [dbt Pricing Plans](https://www.getdbt.com/pricing/) for more information about licenses available with each plan. diff --git a/website/docs/docs/cloud/manage-access/enterprise-permissions.md b/website/docs/docs/cloud/manage-access/enterprise-permissions.md index dcacda20deb..4ed7ab228e5 100644 --- a/website/docs/docs/cloud/manage-access/enterprise-permissions.md +++ b/website/docs/docs/cloud/manage-access/enterprise-permissions.md @@ -20,6 +20,11 @@ control (RBAC). The following roles and permission sets are available for assignment in dbt Cloud Enterprise accounts. They can be granted to dbt Cloud groups which are then in turn granted to users. A dbt Cloud group can be associated with more than one role and permission set. Roles with more access take precedence. +:::tip Licenses or Permission sets + +The user's [license](/docs/cloud/manage-access/seats-and-users) type always overrides their assigned permission set. This means that even if a user belongs to a dbt Cloud group with 'Account Admin' permissions, having a 'Read-Only' license would still prevent them from performing administrative actions on the account. +::: + ## How to set up RBAC Groups in dbt Cloud diff --git a/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md b/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md index 87018b14d56..f717bf3a5b1 100644 --- a/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md +++ b/website/docs/docs/cloud/manage-access/set-up-bigquery-oauth.md @@ -77,4 +77,5 @@ Select **Allow**. This redirects you back to dbt Cloud. You should now be an aut ## FAQs - + + diff --git a/website/docs/docs/core/connect-data-platform/fabric-setup.md b/website/docs/docs/core/connect-data-platform/fabric-setup.md index deef1e04b22..5180d65ebb9 100644 --- a/website/docs/docs/core/connect-data-platform/fabric-setup.md +++ b/website/docs/docs/core/connect-data-platform/fabric-setup.md @@ -39,12 +39,15 @@ If you already have ODBC Driver 17 installed, then that one will work as well. #### Supported configurations -* The adapter is tested with Microsoft Fabric Synapse Data Warehouse. +* The adapter is tested with Microsoft Fabric Synapse Data Warehouses (also referred to as Warehouses). * We test all combinations with Microsoft ODBC Driver 17 and Microsoft ODBC Driver 18. * The collations we run our tests on are `Latin1_General_100_BIN2_UTF8`. The adapter support is not limited to the matrix of the above configurations. If you notice an issue with any other configuration, let us know by opening an issue on [GitHub](https://github.com/microsoft/dbt-fabric). +##### Unsupported configurations +SQL analytics endpoints are read-only and so are not appropriate for Transformation workloads, use a Warehouse instead. + ## Authentication methods & profile configuration ### Common configuration diff --git a/website/docs/docs/core/connect-data-platform/snowflake-setup.md b/website/docs/docs/core/connect-data-platform/snowflake-setup.md index 2ab5e64e36a..098b09d0219 100644 --- a/website/docs/docs/core/connect-data-platform/snowflake-setup.md +++ b/website/docs/docs/core/connect-data-platform/snowflake-setup.md @@ -101,7 +101,7 @@ Along with adding the `authenticator` parameter, be sure to run `alter account s To use key pair authentication, skip the `password` and provide a `private_key_path`. If needed, you can also add a `private_key_passphrase`. **Note**: Unencrypted private keys are accepted, so add a passphrase only if necessary. -Starting from [dbt v1.5.0](/docs/dbt-versions/core), you have the option to use a `private_key` string instead of a `private_key_path`. The `private_key` string should be in either Base64-encoded DER format, representing the key bytes, or a plain-text PEM format. Refer to [Snowflake documentation](https://docs.snowflake.com/developer-guide/python-connector/python-connector-example#using-key-pair-authentication-key-pair-rotation) for more info on how they generate the key. +Starting from [dbt v1.5.0](/docs/dbt-versions/core), you have the option to use a `private_key` string instead of a `private_key_path`. The `private_key` string should be in either Base64-encoded DER format, representing the key bytes, or a plain-text PEM format. Refer to [Snowflake documentation](https://docs.snowflake.com/en/user-guide/key-pair-auth) for more info on how they generate the key. diff --git a/website/docs/docs/dbt-cloud-apis/sl-graphql.md b/website/docs/docs/dbt-cloud-apis/sl-graphql.md index 56a4ac7ba59..e1caf6c70b8 100644 --- a/website/docs/docs/dbt-cloud-apis/sl-graphql.md +++ b/website/docs/docs/dbt-cloud-apis/sl-graphql.md @@ -22,7 +22,20 @@ GraphQL has several advantages, such as self-documenting, having a strong typing ## dbt Semantic Layer GraphQL API -The dbt Semantic Layer GraphQL API allows you to explore and query metrics and dimensions. Due to its self-documenting nature, you can explore the calls conveniently through the [schema explorer](https://semantic-layer.cloud.getdbt.com/api/graphql). +The dbt Semantic Layer GraphQL API allows you to explore and query metrics and dimensions. Due to its self-documenting nature, you can explore the calls conveniently through a schema explorer. + +The schema explorer URLs vary depending on your [deployment region](/docs/cloud/about-cloud/regions-ip-addresses). Use the following table to find the right link for your region: + +| Deployment type | Schema explorer URL | +| --------------- | ------------------- | +| North America multi-tenant | https://semantic-layer.cloud.getdbt.com/api/graphql | +| EMEA multi-tenant | https://semantic-layer.emea.dbt.com/api/graphql | +| APAC multi-tenant | https://semantic-layer.au.dbt.com/api/graphql | +| Single tenant | `https://YOUR_ACCESS_URL.semantic-layer/api/graphql`

Replace `YOUR_ACCESS_URL` with your specific account prefix with the appropriate Access URL for your region and plan.| +| Multi-cell | `https://YOUR_ACCOUNT_PREFIX.semantic-layer.REGION.dbt.com/api/graphql`

Replace `YOUR_ACCOUNT_PREFIX` with your specific account identifier and `REGION` with your location, which could be `us1.dbt.com`. |
+ +**Example** +- If your Single tenant access URL is `ABC123.getdbt.com`, your schema explorer URL will be `https://ABC123.getdbt.com.semantic-layer/api/graphql`. dbt Partners can use the Semantic Layer GraphQL API to build an integration with the dbt Semantic Layer. diff --git a/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/sl-ga.md b/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/sl-ga.md index a1b59aa6ec1..9d5b91fb191 100644 --- a/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/sl-ga.md +++ b/website/docs/docs/dbt-versions/release-notes/76-Oct-2023/sl-ga.md @@ -18,7 +18,7 @@ It aims to bring the best of modeling and semantics to downstream applications b - Brand new [integrations](/docs/use-dbt-semantic-layer/avail-sl-integrations) such as Tableau, Google Sheets, Hex, Mode, and Lightdash. - New [Semantic Layer APIs](/docs/dbt-cloud-apis/sl-api-overview) using GraphQL and JDBC to query metrics and build integrations. - dbt Cloud [multi-tenant regional](/docs/cloud/about-cloud/regions-ip-addresses) support for North America, EMEA, and APAC. Single-tenant support coming soon. -- Use the APIs to call an export (a way to build tables in your data platform), then access them in your preferred BI tool. Starting from dbt v1.7 or higher, you will be able to schedule exports as part of your dbt job. +- Coming soon — Schedule exports (a way to build tables in your data platform) as part of your dbt Cloud job. Use the APIs to call an export, then access them in your preferred BI tool. diff --git a/website/docs/docs/running-a-dbt-project/using-threads.md b/website/docs/docs/running-a-dbt-project/using-threads.md index 5eede7abc27..af00dd9cc68 100644 --- a/website/docs/docs/running-a-dbt-project/using-threads.md +++ b/website/docs/docs/running-a-dbt-project/using-threads.md @@ -22,5 +22,5 @@ You will define the number of threads in your `profiles.yml` file (for dbt Core ## Related docs -- [About profiles.yml](https://docs.getdbt.com/reference/profiles.yml) +- [About profiles.yml](/docs/core/connect-data-platform/profiles.yml) - [dbt Cloud job scheduler](/docs/deploy/job-scheduler) diff --git a/website/docs/docs/use-dbt-semantic-layer/quickstart-sl.md b/website/docs/docs/use-dbt-semantic-layer/quickstart-sl.md index 665260ed9f4..11a610805a9 100644 --- a/website/docs/docs/use-dbt-semantic-layer/quickstart-sl.md +++ b/website/docs/docs/use-dbt-semantic-layer/quickstart-sl.md @@ -34,7 +34,7 @@ Use this guide to fully experience the power of the universal dbt Semantic Layer - [Define metrics](#define-metrics) in dbt using MetricFlow - [Test and query metrics](#test-and-query-metrics) with MetricFlow - [Run a production job](#run-a-production-job) in dbt Cloud -- [Set up dbt Semantic Layer](#setup) in dbt Cloud +- [Set up dbt Semantic Layer](#set-up-dbt-semantic-layer) in dbt Cloud - [Connect and query API](#connect-and-query-api) with dbt Cloud MetricFlow allows you to define metrics in your dbt project and query them whether in dbt Cloud or dbt Core with [MetricFlow commands](/docs/build/metricflow-commands). diff --git a/website/docs/reference/commands/debug.md b/website/docs/reference/commands/debug.md index 4ae5a1d2dd9..e1865ff1b67 100644 --- a/website/docs/reference/commands/debug.md +++ b/website/docs/reference/commands/debug.md @@ -7,7 +7,7 @@ id: "debug" `dbt debug` is a utility function to test the database connection and display information for debugging purposes, such as the validity of your project file and your installation of any requisite dependencies (like `git` when you run `dbt deps`). -*Note: Not to be confused with [debug-level logging](/reference/global-configs/about-global-configs#debug-level-logging) via the `--debug` option which increases verbosity. +*Note: Not to be confused with [debug-level logging](/reference/global-configs/logs#debug-level-logging) via the `--debug` option which increases verbosity. ### Example usage diff --git a/website/docs/reference/dbt-jinja-functions/builtins.md b/website/docs/reference/dbt-jinja-functions/builtins.md index edc5f34ffda..7d970b9d5e1 100644 --- a/website/docs/reference/dbt-jinja-functions/builtins.md +++ b/website/docs/reference/dbt-jinja-functions/builtins.md @@ -42,9 +42,9 @@ From dbt v1.5 and higher, use the following macro to extract user-provided argum -- call builtins.ref based on provided positional arguments {% set rel = None %} {% if packagename is not none %} - {% set rel = return(builtins.ref(packagename, modelname, version=version)) %} + {% set rel = builtins.ref(packagename, modelname, version=version) %} {% else %} - {% set rel = return(builtins.ref(modelname, version=version)) %} + {% set rel = builtins.ref(modelname, version=version) %} {% endif %} -- finally, override the database name with "dev" diff --git a/website/docs/reference/dbt-jinja-functions/cross-database-macros.md b/website/docs/reference/dbt-jinja-functions/cross-database-macros.md index 4df8275d4bd..334bcfe5760 100644 --- a/website/docs/reference/dbt-jinja-functions/cross-database-macros.md +++ b/website/docs/reference/dbt-jinja-functions/cross-database-macros.md @@ -30,6 +30,7 @@ Please make sure to take a look at the [SQL expressions section](#sql-expression - [type\_numeric](#type_numeric) - [type\_string](#type_string) - [type\_timestamp](#type_timestamp) + - [current\_timestamp](#current_timestamp) - [Set functions](#set-functions) - [except](#except) - [intersect](#intersect) @@ -76,6 +77,7 @@ Please make sure to take a look at the [SQL expressions section](#sql-expression - [type\_numeric](#type_numeric) - [type\_string](#type_string) - [type\_timestamp](#type_timestamp) + - [current\_timestamp](#current_timestamp) - [Set functions](#set-functions) - [except](#except) - [intersect](#intersect) @@ -316,6 +318,29 @@ This macro yields the database-specific data type for a `TIMESTAMP` (which may o TIMESTAMP ``` +### current_timestamp + +This macro returns the current date and time for the system. Depending on the adapter: + +- The result may be an aware or naive timestamp. +- The result may correspond to the start of the statement or the start of the transaction. + + +**Args** +- None + +**Usage** +- You can use the `current_timestamp()` macro within your dbt SQL files like this: + +```sql +{{ dbt.current_timestamp() }} +``` +**Sample output (PostgreSQL)** + +```sql +now() +``` + ## Set functions ### except diff --git a/website/docs/reference/resource-configs/bigquery-configs.md b/website/docs/reference/resource-configs/bigquery-configs.md index 8f323bc4236..94d06311c55 100644 --- a/website/docs/reference/resource-configs/bigquery-configs.md +++ b/website/docs/reference/resource-configs/bigquery-configs.md @@ -596,9 +596,9 @@ with events as ( -#### Copying ingestion-time partitions +#### Copying partitions -If you have configured your incremental model to use "ingestion"-based partitioning (`partition_by.time_ingestion_partitioning: True`), you can opt to use a legacy mechanism for inserting and overwriting partitions. While this mechanism doesn't offer the same visibility and ease of debugging as the SQL `merge` statement, it can yield significant savings in time and cost for large datasets. Behind the scenes, dbt will add or replace each partition via the [copy table API](https://cloud.google.com/bigquery/docs/managing-tables#copy-table) and partition decorators. +If you are replacing entire partitions in your incremental runs, you can opt to do so with the [copy table API](https://cloud.google.com/bigquery/docs/managing-tables#copy-table) and partition decorators rather than a `merge` statement. While this mechanism doesn't offer the same visibility and ease of debugging as the SQL `merge` statement, it can yield significant savings in time and cost for large datasets because the copy table API does not incur any costs for inserting the data - it's equivalent to the `bq cp` gcloud command line interface (CLI) command. You can enable this by switching on `copy_partitions: True` in the `partition_by` configuration. This approach works only in combination with "dynamic" partition replacement. diff --git a/website/docs/reference/resource-properties/constraints.md b/website/docs/reference/resource-properties/constraints.md index 5ec12b100d7..8a8e46f2fa3 100644 --- a/website/docs/reference/resource-properties/constraints.md +++ b/website/docs/reference/resource-properties/constraints.md @@ -300,7 +300,7 @@ select
-BigQuery allows defining `not null` constraints. However, it does _not_ support or enforce the definition of unenforced constraints, such as `primary key`. +BigQuery allows defining and enforcing `not null` constraints, and defining (but _not_ enforcing) `primary key` and `foreign key` constraints (which can be used for query optimization). BigQuery does not support defining or enforcing other constraints. For more information, refer to [Platform constraint support](/docs/collaborate/govern/model-contracts#platform-constraint-support) Documentation: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language diff --git a/website/sidebars.js b/website/sidebars.js index 89b1e005a8c..0566ef8c3a6 100644 --- a/website/sidebars.js +++ b/website/sidebars.js @@ -324,6 +324,7 @@ const sidebarSettings = { link: { type: "doc", id: "docs/build/metrics-overview" }, items: [ "docs/build/metrics-overview", + "docs/build/conversion", "docs/build/cumulative", "docs/build/derived", "docs/build/ratio", diff --git a/website/snippets/_new-sl-setup.md b/website/snippets/_new-sl-setup.md index a02481db33d..a93f233d09c 100644 --- a/website/snippets/_new-sl-setup.md +++ b/website/snippets/_new-sl-setup.md @@ -1,14 +1,12 @@ You can set up the dbt Semantic Layer in dbt Cloud at the environment and project level. Before you begin: -- You must have a dbt Cloud Team or Enterprise account. Suitable for both Multi-tenant and Single-tenant deployment. - - Single-tenant accounts should contact their account representative for necessary setup and enablement. - You must be part of the Owner group, and have the correct [license](/docs/cloud/manage-access/seats-and-users) and [permissions](/docs/cloud/manage-access/self-service-permissions) to configure the Semantic Layer: * Enterprise plan — Developer license with Account Admin permissions. Or Owner with a Developer license, assigned Project Creator, Database Admin, or Admin permissions. * Team plan — Owner with a Developer license. - You must have a successful run in your new environment. :::tip -If you've configured the legacy Semantic Layer, it has been deprecated, and dbt Labs strongly recommends that you [upgrade your dbt version](/docs/dbt-versions/upgrade-core-in-cloud) to dbt version 1.6 or higher to use the latest dbt Semantic Layer. Refer to the dedicated [migration guide](/guides/sl-migration) for details. +If you've configured the legacy Semantic Layer, it has been deprecated. dbt Labs strongly recommends that you [upgrade your dbt version](/docs/dbt-versions/upgrade-core-in-cloud) to dbt version 1.6 or higher to use the latest dbt Semantic Layer. Refer to the dedicated [migration guide](/guides/sl-migration) for details. ::: 1. In dbt Cloud, create a new [deployment environment](/docs/deploy/deploy-environments#create-a-deployment-environment) or use an existing environment on dbt 1.6 or higher. @@ -20,7 +18,10 @@ If you've configured the legacy Semantic Layer, it has been deprecated, and dbt -4. In the **Set Up Semantic Layer Configuration** page, enter the credentials you want the Semantic Layer to use specific to your data platform. We recommend credentials have the least privileges required because your Semantic Layer users will be querying it in downstream applications. At a minimum, the Semantic Layer needs to have read access to the schema(s) that contains the dbt models that you used to build your semantic models. +4. In the **Set Up Semantic Layer Configuration** page, enter the credentials you want the Semantic Layer to use specific to your data platform. + + - Use credentials with minimal privileges. This is because the Semantic Layer requires read access to the schema(s) containing the dbt models used in your semantic models for downstream applications + - Note, [Environment variables](/docs/build/environment-variables) such as `{{env_var('DBT_WAREHOUSE')}`, doesn't supported the dbt Semantic Layer yet. You must use the actual credentials. @@ -28,13 +29,10 @@ If you've configured the legacy Semantic Layer, it has been deprecated, and dbt 6. After saving it, you'll be provided with the connection information that allows you to connect to downstream tools. If your tool supports JDBC, save the JDBC URL or individual components (like environment id and host). If it uses the GraphQL API, save the GraphQL API host information instead. - + 7. Save and copy your environment ID, service token, and host, which you'll need to use downstream tools. For more info on how to integrate with partner integrations, refer to [Available integrations](/docs/use-dbt-semantic-layer/avail-sl-integrations). 8. Return to the **Project Details** page, then select **Generate Service Token**. You will need Semantic Layer Only and Metadata Only [service token](/docs/dbt-cloud-apis/service-tokens) permissions. - - -Great job, you've configured the Semantic Layer 🎉! - +Great job, you've configured the Semantic Layer 🎉! diff --git a/website/snippets/_sl-define-metrics.md b/website/snippets/_sl-define-metrics.md index af3ee9f297f..fe169b4a5b4 100644 --- a/website/snippets/_sl-define-metrics.md +++ b/website/snippets/_sl-define-metrics.md @@ -1,6 +1,6 @@ Now that you've created your first semantic model, it's time to define your first metric! You can define metrics with the dbt Cloud IDE or command line. -MetricFlow supports different metric types like [simple](/docs/build/simple), [ratio](/docs/build/ratio), [cumulative](/docs/build/cumulative), and [derived](/docs/build/derived). It's recommended that you read the [metrics overview docs](/docs/build/metrics-overview) before getting started. +MetricFlow supports different metric types like [conversion](/docs/build/conversion), [simple](/docs/build/simple), [ratio](/docs/build/ratio), [cumulative](/docs/build/cumulative), and [derived](/docs/build/derived). It's recommended that you read the [metrics overview docs](/docs/build/metrics-overview) before getting started. 1. You can define metrics in the same YAML files as your semantic models or create a new file. If you want to create your metrics in a new file, create another directory called `/models/metrics`. The file structure for metrics can become more complex from here if you need to further organize your metrics, for example, by data source or business line. diff --git a/website/static/img/blog/authors/ejohnston.png b/website/static/img/blog/authors/ejohnston.png new file mode 100644 index 00000000000..09fc4ed7ba3 Binary files /dev/null and b/website/static/img/blog/authors/ejohnston.png differ diff --git a/website/static/img/blog/serverless-free-tier-data-stack-with-dlt-and-dbt-core/architecture_diagram.png b/website/static/img/blog/serverless-free-tier-data-stack-with-dlt-and-dbt-core/architecture_diagram.png new file mode 100644 index 00000000000..ad10d32c2e7 Binary files /dev/null and b/website/static/img/blog/serverless-free-tier-data-stack-with-dlt-and-dbt-core/architecture_diagram.png differ diff --git a/website/static/img/blog/serverless-free-tier-data-stack-with-dlt-and-dbt-core/map_screenshot.png b/website/static/img/blog/serverless-free-tier-data-stack-with-dlt-and-dbt-core/map_screenshot.png new file mode 100644 index 00000000000..da8309c2510 Binary files /dev/null and b/website/static/img/blog/serverless-free-tier-data-stack-with-dlt-and-dbt-core/map_screenshot.png differ diff --git a/website/static/img/docs/dbt-cloud/semantic-layer/conversion-metrics-fill-null.png b/website/static/img/docs/dbt-cloud/semantic-layer/conversion-metrics-fill-null.png new file mode 100644 index 00000000000..0fd5e206ba7 Binary files /dev/null and b/website/static/img/docs/dbt-cloud/semantic-layer/conversion-metrics-fill-null.png differ diff --git a/website/vercel.json b/website/vercel.json index b662e1c2144..1e4cc2fb021 100644 --- a/website/vercel.json +++ b/website/vercel.json @@ -2,6 +2,11 @@ "cleanUrls": true, "trailingSlash": false, "redirects": [ + { + "source": "/reference/profiles.yml", + "destination": "/docs/core/connect-data-platform/profiles.yml", + "permanent": true + }, { "source": "/docs/cloud/dbt-cloud-ide/dbt-cloud-tips", "destination": "/docs/build/dbt-tips",