Skip to content

Commit

Permalink
Merge branch 'current' into 6715_improve_documentation_for_dbt_clean_…
Browse files Browse the repository at this point in the history
…options
  • Loading branch information
asarraf authored Dec 26, 2024
2 parents 934a1fb + 65a46f1 commit a6eef17
Show file tree
Hide file tree
Showing 2 changed files with 96 additions and 21 deletions.
82 changes: 70 additions & 12 deletions website/docs/docs/build/join-logic.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,24 +10,28 @@ Joins are a powerful part of MetricFlow and simplify the process of making all v

Joins use `entities` defined in your semantic model configs as the join keys between tables. Assuming entities are defined in the semantic model, MetricFlow creates a graph using the semantic models as nodes and the join paths as edges to perform joins automatically. MetricFlow chooses the appropriate join type and avoids fan-out or chasm joins with other tables based on the entity types.

<details>
<summary>What are fan-out or chasm joins?</summary>
<div>
<div>&mdash; Fan-out joins are when one row in a table is joined to multiple rows in another table, resulting in more output rows than input rows.<br /><br />
&mdash; Chasm joins are when two tables have a many-to-many relationship through an intermediate table, and the join results in duplicate or missing data. </div>
</div>
</details>

<Expandable alt_header="What are fan-out or chasm joins?" >
- Fan-out joins are when one row in a table is joined to multiple rows in another table, resulting in more output rows than input rows.
- Chasm joins are when two tables have a many-to-many relationship through an intermediate table, and the join results in duplicate or missing data.
</Expandable>

## Types of joins

:::tip Joins are auto-generated
MetricFlow automatically generates the necessary joins to the defined semantic objects, eliminating the need for you to create new semantic models or configuration files.

This document explains the different types of joins that can be used with entities and how to query them using the CLI.
This section explains the different types of joins that can be used with entities and how to query them.
:::

MetricFlow primarily uses left joins for joins, and restricts the use of fan-out and chasm joins. Refer to the table below to identify which joins are or aren't allowed based on specific entity types to prevent the creation of risky joins.
Metricflow uses these specific join strategies:

- Primarily uses left joins when joining `fct` and `dim` models. Left joins make sure all rows from the "base" table are retained, while matching rows are included from the joined table.
- For queries that involve multiple `fct` models, MetricFlow uses full outer joins to ensure all data points are captured, even when some `dim` or `fct` models are missing in certain tables.
- MetricFlow restricts the use of fan-out and chasm joins.

Refer to [SQL examples](#sql-examples) for more information on how MetricFlow handles joins in practice.

The following table identifies which joins are allowed based on specific entity types to prevent the creation of risky joins. This table primarily represents left joins unless otherwise specified. For scenarios involving multiple `fct` models, MetricFlow uses full outer joins.

| entity type - Table A | entity type - Table B | Join type |
|---------------------------|---------------------------|----------------------|
Expand All @@ -39,9 +43,19 @@ MetricFlow primarily uses left joins for joins, and restricts the use of fan-out
| Unique | Foreign | ❌ Fan-out (Not allowed) |
| Foreign | Primary | ✅ Left |
| Foreign | Unique | ✅ Left |
| Foreign | Foreign | ❌ Fan-out (Not allowed) |
| Foreign | Foreign | ❌ Fan-out (Not allowed) |

### Semantic validation

### Example
MetricFlow performs semantic validation by executing `explain` queries in the data platform to ensure that the generated SQL gets executed without errors. This validation includes:

- Verifying that all referenced tables and columns exist.
- Ensuring the data platform supports SQL functions, such as `date_diff(x, y)`.
- Checking for ambiguous joins or paths in multi-hop joins.

If validation fails, MetricFlow surfaces errors for users to address before executing the query.

## Example

The following example uses two semantic models with a common entity and shows a MetricFlow query that requires a join between the two semantic models. The two semantic models are:
- `transactions`
Expand Down Expand Up @@ -83,6 +97,50 @@ dbt sl query --metrics average_purchase_price --group-by metric_time,user_id__ty
mf query --metrics average_purchase_price --group-by metric_time,user_id__type # In dbt Core
```

#### SQL examples

These SQL examples show how MetricFlow handles both left join and full outer join scenarios in practice:

<Tabs>
<TabItem value="SQL example for left join">

Using the previous example for `transactions` and `user_signup` semantic models, this shows a left join between those two semantic models.

```sql
select
transactions.user_id,
transactions.purchase_price,
user_signup.type
from transactions
left outer join user_signup
on transactions.user_id = user_signup.user_id
where transactions.purchase_price is not null
group by
transactions.user_id,
user_signup.type;
```
</TabItem>

<TabItem value="SQL example for outer joins">

If you have multiple `fct` models, let's say `sales` and `returns`, MetricFlow uses full outer joins to ensure all data points are captured.

This example shows a full outer join between the `sales` and `returns` semantic models.

```sql
select
sales.user_id,
sales.total_sales,
returns.total_returns
from sales
full outer join returns
on sales.user_id = returns.user_id
where sales.user_id is not null or returns.user_id is not null;
```

</TabItem>
</Tabs>

## Multi-hop joins

MetricFlow allows users to join measures and dimensions across a graph of entities by moving from one table to another within a graph. This is referred to as "multi-hop join".
Expand Down
35 changes: 26 additions & 9 deletions website/docs/docs/build/packages.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,27 +165,44 @@ dbt Cloud supports private packages from [supported](#prerequisites) Git repos l

#### Prerequisites

To use native private packages, you must have one of the following Git providers configured in the **Integrations** section of your **Account settings**:
- [GitHub](/docs/cloud/git/connect-github)
- [Azure DevOps](/docs/cloud/git/connect-azure-devops)
- Support for GitLab is coming soon.

- To use native private packages, you must have one of the following Git providers configured in the **Integrations** section of your **Account settings**:
- [GitHub](/docs/cloud/git/connect-github)
- [Azure DevOps](/docs/cloud/git/connect-azure-devops)
- Private packages only work within a single Azure DevOps project. If your repositories are in different projects within the same organization, you can't reference them in the `private` key at this time.
- For Azure DevOps, use the `org/repo` path (not the `org_name/project_name/repo_name` path) with the project tier inherited from the integrated source repository.
- Support for GitLab is coming soon.

#### Configuration

Use the `private` key in your `packages.yml` or `dependencies.yml` to clone package repos using your existing dbt Cloud Git integration without having to provision an access token or create a dbt Cloud environment variable:
Use the `private` key in your `packages.yml` or `dependencies.yml` to clone package repos using your existing dbt Cloud Git integration without having to provision an access token or create a dbt Cloud environment variable.


<File name="packages.yml">

```yaml
packages:
- private: dbt-labs/awesome_repo
- private: dbt-labs/awesome_repo # your-org/your-repo path
- package: normal packages
[...]
[...]
```
</File>

:::tip Azure DevOps considerations

- Private packages currently only work if the package repository is in the same Azure DevOps project as the source repo.
- Use the `org/repo` path (not the normal ADO `org_name/project_name/repo_name` path) in the `private` key.
- Repositories in different Azure DevOps projects is currently not supported until a future update.

You can use private packages by specifying `org/repo` in the `private` key:

<File name="packages.yml">

```yaml
packages:
- private: my-org/my-repo # Works if your ADO source repo and package repo are in the same project
```
</File>
:::

You can pin private packages similar to regular dbt packages:

Expand Down

0 comments on commit a6eef17

Please sign in to comment.