From 04ca572349f46bfddfa9ceb2f841c87315b551f6 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Thu, 19 Dec 2024 11:45:18 +0000 Subject: [PATCH 1/6] add more detail --- website/docs/docs/build/join-logic.md | 87 +++++++++++++++++++++++---- 1 file changed, 75 insertions(+), 12 deletions(-) diff --git a/website/docs/docs/build/join-logic.md b/website/docs/docs/build/join-logic.md index 99d63b38657..22249938b56 100644 --- a/website/docs/docs/build/join-logic.md +++ b/website/docs/docs/build/join-logic.md @@ -10,24 +10,24 @@ Joins are a powerful part of MetricFlow and simplify the process of making all v Joins use `entities` defined in your semantic model configs as the join keys between tables. Assuming entities are defined in the semantic model, MetricFlow creates a graph using the semantic models as nodes and the join paths as edges to perform joins automatically. MetricFlow chooses the appropriate join type and avoids fan-out or chasm joins with other tables based on the entity types. -
- What are fan-out or chasm joins? -
-
— Fan-out joins are when one row in a table is joined to multiple rows in another table, resulting in more output rows than input rows.

- — Chasm joins are when two tables have a many-to-many relationship through an intermediate table, and the join results in duplicate or missing data.
-
-
- + +- Fan-out joins are when one row in a table is joined to multiple rows in another table, resulting in more output rows than input rows. +- Chasm joins are when two tables have a many-to-many relationship through an intermediate table, and the join results in duplicate or missing data. + ## Types of joins :::tip Joins are auto-generated MetricFlow automatically generates the necessary joins to the defined semantic objects, eliminating the need for you to create new semantic models or configuration files. -This document explains the different types of joins that can be used with entities and how to query them using the CLI. +This section explains the different types of joins that can be used with entities and how to query them. ::: -MetricFlow primarily uses left joins for joins, and restricts the use of fan-out and chasm joins. Refer to the table below to identify which joins are or aren't allowed based on specific entity types to prevent the creation of risky joins. +- MetricFlow primarily uses left joins for joins. +- For queries that involve multiple `fct` models, MetricFlow uses full outer joins. +- It restricts the use of fan-out and chasm joins. + +Refer to the following table to identify which joins are or aren't allowed based on specific entity types to prevent the creation of risky joins. | entity type - Table A | entity type - Table B | Join type | |---------------------------|---------------------------|----------------------| @@ -39,9 +39,28 @@ MetricFlow primarily uses left joins for joins, and restricts the use of fan-out | Unique | Foreign | ❌ Fan-out (Not allowed) | | Foreign | Primary | ✅ Left | | Foreign | Unique | ✅ Left | -| Foreign | Foreign | ❌ Fan-out (Not allowed) | +| Foreign | Foreign | ❌ Fan-out (Not allowed) | + +This table primarily represents left joins unless otherwise specified. For scenarios involving multiple `fct` models, MetricFlow uses full outer joins. + +### Explanation of joins + +- **Left joins** — MetricFlow defaults to left joins when joining `fct` and `dim` models. Left joins make sure all rows from the "base" table are retained, while matching rows are included from the joined table. +- **Full outer joins** — For queries that involve multiple `fct` models, MetricFlow uses full outer joins to ensure all data points are captured, even when some `dim` or `fct` models are missing in certain tables. + +Refer to [SQL examples](#sql-examples) for more information on how MetricFlow handles joins in practice. -### Example +### Semantic validation + +MetricFlow performs semantic validation by executing `explain` queries in the data platform to ensure that the generated SQL gets executed without errors. This validation includes: + +- Verifying that all referenced tables and columns exist. +- Ensuring the data platform supports SQL functions, such as `date_diff(x, y)`. +- Checking for ambiguous joins or paths in multi-hop joins. + +If validation fails, MetricFlow surfaces errors for users to address before executing the query. + +## Example The following example uses two semantic models with a common entity and shows a MetricFlow query that requires a join between the two semantic models. The two semantic models are: - `transactions` @@ -83,6 +102,50 @@ dbt sl query --metrics average_purchase_price --group-by metric_time,user_id__ty mf query --metrics average_purchase_price --group-by metric_time,user_id__type # In dbt Core ``` +#### SQL examples + +The following tabs provide SQL examples for both left joins and full outer joins, showing how MetricFlow handles these scenarios in practice. + + + + +Following the previous example using the `transactions` and `user_signup` semantic models, this shows a left join between those two semantic models. + +```sql +select + transactions.user_id, + transactions.purchase_price, + user_signup.type +from transactions +left outer join user_signup + on transactions.user_id = user_signup.user_id +where transactions.purchase_price is not null +group by + transactions.user_id, + user_signup.type; +``` + + + + +If you have multiple `fct` models, let's say `sales` and `returns`, MetricFlow uses full outer joins to ensure all data points are captured. + +This example shows a full outer join between the `sales` and `returns` semantic models. + +```sql +select + sales.user_id, + sales.total_sales, + returns.total_returns +from sales +full outer join returns + on sales.user_id = returns.user_id +where sales.user_id is not null or returns.user_id is not null; +``` + + + + ## Multi-hop joins MetricFlow allows users to join measures and dimensions across a graph of entities by moving from one table to another within a graph. This is referred to as "multi-hop join". From 480ece44475117aaa3885a96276019782eb476d8 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 24 Dec 2024 10:09:51 +0000 Subject: [PATCH 2/6] Update website/docs/docs/build/join-logic.md Co-authored-by: Leona B. Campbell <3880403+runleonarun@users.noreply.github.com> --- website/docs/docs/build/join-logic.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/join-logic.md b/website/docs/docs/build/join-logic.md index 22249938b56..94cd6d226c2 100644 --- a/website/docs/docs/build/join-logic.md +++ b/website/docs/docs/build/join-logic.md @@ -109,7 +109,7 @@ The following tabs provide SQL examples for both left joins and full outer joins -Following the previous example using the `transactions` and `user_signup` semantic models, this shows a left join between those two semantic models. +Using the previous example for `transactions` and `user_signup` semantic models, this shows a left join between those two semantic models. ```sql select From ed7e5ad9e6f5aa25b738695fcdd1ec026f40f3ad Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 24 Dec 2024 10:10:06 +0000 Subject: [PATCH 3/6] Update website/docs/docs/build/join-logic.md Co-authored-by: Leona B. Campbell <3880403+runleonarun@users.noreply.github.com> --- website/docs/docs/build/join-logic.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/join-logic.md b/website/docs/docs/build/join-logic.md index 94cd6d226c2..2499406c5fa 100644 --- a/website/docs/docs/build/join-logic.md +++ b/website/docs/docs/build/join-logic.md @@ -104,7 +104,7 @@ mf query --metrics average_purchase_price --group-by metric_time,user_id__type # #### SQL examples -The following tabs provide SQL examples for both left joins and full outer joins, showing how MetricFlow handles these scenarios in practice. +These SQL examples show how MetricFlow handles both left join and full outer join scenarios in practice: From b9237678930834250dad1573f1e00d8be7a25468 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 24 Dec 2024 10:10:54 +0000 Subject: [PATCH 4/6] Update website/docs/docs/build/join-logic.md Co-authored-by: Leona B. Campbell <3880403+runleonarun@users.noreply.github.com> --- website/docs/docs/build/join-logic.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/website/docs/docs/build/join-logic.md b/website/docs/docs/build/join-logic.md index 2499406c5fa..bc81b238166 100644 --- a/website/docs/docs/build/join-logic.md +++ b/website/docs/docs/build/join-logic.md @@ -27,7 +27,7 @@ This section explains the different types of joins that can be used with entitie - For queries that involve multiple `fct` models, MetricFlow uses full outer joins. - It restricts the use of fan-out and chasm joins. -Refer to the following table to identify which joins are or aren't allowed based on specific entity types to prevent the creation of risky joins. +The following table identifies which joins are allowed based on specific entity types to prevent the creation of risky joins. This table primarily represents left joins unless otherwise specified. For scenarios involving multiple `fct` models, MetricFlow uses full outer joins. | entity type - Table A | entity type - Table B | Join type | |---------------------------|---------------------------|----------------------| From f57aa914bd78fd2e8e071523cdd3ec985ef1ef80 Mon Sep 17 00:00:00 2001 From: Mirna Wong <89008547+mirnawong1@users.noreply.github.com> Date: Tue, 24 Dec 2024 10:11:08 +0000 Subject: [PATCH 5/6] Update website/docs/docs/build/join-logic.md Co-authored-by: Leona B. Campbell <3880403+runleonarun@users.noreply.github.com> --- website/docs/docs/build/join-logic.md | 1 - 1 file changed, 1 deletion(-) diff --git a/website/docs/docs/build/join-logic.md b/website/docs/docs/build/join-logic.md index bc81b238166..67122302e2c 100644 --- a/website/docs/docs/build/join-logic.md +++ b/website/docs/docs/build/join-logic.md @@ -41,7 +41,6 @@ The following table identifies which joins are allowed based on specific entity | Foreign | Unique | ✅ Left | | Foreign | Foreign | ❌ Fan-out (Not allowed) | -This table primarily represents left joins unless otherwise specified. For scenarios involving multiple `fct` models, MetricFlow uses full outer joins. ### Explanation of joins From 13a1be35afdd53723b6ba8a244a5c46031876659 Mon Sep 17 00:00:00 2001 From: mirnawong1 Date: Tue, 24 Dec 2024 10:20:08 +0000 Subject: [PATCH 6/6] move bullets adn consolidate --- website/docs/docs/build/join-logic.md | 18 +++++++----------- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/website/docs/docs/build/join-logic.md b/website/docs/docs/build/join-logic.md index 67122302e2c..d11c0248d25 100644 --- a/website/docs/docs/build/join-logic.md +++ b/website/docs/docs/build/join-logic.md @@ -23,9 +23,13 @@ MetricFlow automatically generates the necessary joins to the defined semantic o This section explains the different types of joins that can be used with entities and how to query them. ::: -- MetricFlow primarily uses left joins for joins. -- For queries that involve multiple `fct` models, MetricFlow uses full outer joins. -- It restricts the use of fan-out and chasm joins. +Metricflow uses these specific join strategies: + +- Primarily uses left joins when joining `fct` and `dim` models. Left joins make sure all rows from the "base" table are retained, while matching rows are included from the joined table. +- For queries that involve multiple `fct` models, MetricFlow uses full outer joins to ensure all data points are captured, even when some `dim` or `fct` models are missing in certain tables. +- MetricFlow restricts the use of fan-out and chasm joins. + +Refer to [SQL examples](#sql-examples) for more information on how MetricFlow handles joins in practice. The following table identifies which joins are allowed based on specific entity types to prevent the creation of risky joins. This table primarily represents left joins unless otherwise specified. For scenarios involving multiple `fct` models, MetricFlow uses full outer joins. @@ -41,14 +45,6 @@ The following table identifies which joins are allowed based on specific entity | Foreign | Unique | ✅ Left | | Foreign | Foreign | ❌ Fan-out (Not allowed) | - -### Explanation of joins - -- **Left joins** — MetricFlow defaults to left joins when joining `fct` and `dim` models. Left joins make sure all rows from the "base" table are retained, while matching rows are included from the joined table. -- **Full outer joins** — For queries that involve multiple `fct` models, MetricFlow uses full outer joins to ensure all data points are captured, even when some `dim` or `fct` models are missing in certain tables. - -Refer to [SQL examples](#sql-examples) for more information on how MetricFlow handles joins in practice. - ### Semantic validation MetricFlow performs semantic validation by executing `explain` queries in the data platform to ensure that the generated SQL gets executed without errors. This validation includes: