Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add cumulative metrics granularity info #5688

Merged
merged 28 commits into from
Jul 3, 2024
Merged
Show file tree
Hide file tree
Changes from 9 commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
acead33
add cumulative metrics info
mirnawong1 Jun 24, 2024
f0ec4cc
dbt sql
mirnawong1 Jun 24, 2024
7e76ad3
Merge branch 'current' into mwong-add-granularity-cumulative-metrics
mirnawong1 Jun 25, 2024
4ba8dfd
Merge branch 'current' into mwong-add-granularity-cumulative-metrics
mirnawong1 Jun 26, 2024
ee34960
Update website/docs/docs/build/cumulative-metrics.md
mirnawong1 Jun 26, 2024
f2d5ced
Update cumulative-metrics.md
mirnawong1 Jun 26, 2024
a34778e
updated to cumulative metrics docs
Jstein77 Jun 26, 2024
cb2986e
updated to cumulative metrics docs
Jstein77 Jun 26, 2024
5dc290d
Merge branch 'current' into mwong-add-granularity-cumulative-metrics
mirnawong1 Jun 27, 2024
f6a6b83
add updates
mirnawong1 Jun 27, 2024
befd1ae
Merge branch 'current' into mwong-add-granularity-cumulative-metrics
mirnawong1 Jun 27, 2024
9eec57b
add expandable and more context to example
mirnawong1 Jun 27, 2024
c0272c6
add context to window exmaples
mirnawong1 Jun 27, 2024
4313c7e
typos
mirnawong1 Jun 27, 2024
573403f
Merge branch 'current' into mwong-add-granularity-cumulative-metrics
mirnawong1 Jun 28, 2024
f04d77b
Update website/docs/docs/build/cumulative-metrics.md
mirnawong1 Jun 28, 2024
72aac6b
Update cumulative-metrics.md
mirnawong1 Jun 28, 2024
9b922f2
update table
mirnawong1 Jun 28, 2024
68be176
Update release-notes.md
mirnawong1 Jun 28, 2024
4ccf1f4
Update cumulative-metrics.md
mirnawong1 Jun 28, 2024
4227693
Update release-notes.md
mirnawong1 Jun 28, 2024
4aee683
Update cumulative-metrics.md
mirnawong1 Jun 28, 2024
37723c3
Merge branch 'current' into mwong-add-granularity-cumulative-metrics
mirnawong1 Jun 28, 2024
674e412
Merge branch 'current' into mwong-add-granularity-cumulative-metrics
mirnawong1 Jul 2, 2024
0ab1f22
Update website/docs/docs/build/cumulative-metrics.md
mirnawong1 Jul 3, 2024
cf6b231
Merge branch 'current' into mwong-add-granularity-cumulative-metrics
mirnawong1 Jul 3, 2024
fac61b0
Merge branch 'current' into mwong-add-granularity-cumulative-metrics
mirnawong1 Jul 3, 2024
89526aa
Merge branch 'current' into mwong-add-granularity-cumulative-metrics
mirnawong1 Jul 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions website/docs/docs/build/conversion-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,10 @@ Conversion metrics are different from [ratio metrics](/docs/build/ratio) because

The specification for conversion metrics is as follows:

:::tip
Note that we use the double colon (::) to indicate whether a parameter is nested within another parameter. So for example, `query_params::metrics` means the `metrics` parameter is nested under `query_params`.
:::

| Parameter | Description | Type |
| --- | --- | --- |
| `name` | The name of the metric. | Required |
Expand Down
222 changes: 187 additions & 35 deletions website/docs/docs/build/cumulative-metrics.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,13 @@ sidebar_label: Cumulative
tags: [Metrics, Semantic Layer]
---

Cumulative metrics aggregate a measure over a given accumulation window. If no window is specified, the window is considered infinite and accumulates values over all time. You will need to create the [time spine model](/docs/build/metricflow-time-spine) before you add cumulative metrics.
Cumulative metrics aggregate a measure over a given accumulation window. If no window is specified, the window is considered infinite and accumulates values over all time. You will need to create a [time spine model](/docs/build/metricflow-time-spine) before you add cumulative metrics.

This metric is common for calculating things like weekly active users, or month-to-date revenue. You can use `fill_nulls_with` to [set null metric values to zero](/docs/build/fill-nulls-advanced), ensuring numeric values for every data row. The parameters, description, and type for cumulative metrics are:
Cumulative metrics are useful for calculating things like weekly active users, or month-to-date revenue. The parameters, description, and type for cumulative metrics are:

:::tip
Note that we use the double colon (::) to indicate whether a parameter is nested within another parameter. So for example, `query_params::metrics` means the `metrics` parameter is nested under `query_params`.
:::

| Parameter | Description | Type |
| --------- | ----------- | ---- |
Expand All @@ -17,12 +21,13 @@ This metric is common for calculating things like weekly active users, or month-
| `type` | The type of the metric (cumulative, derived, ratio, or simple). | Required |
| `label` | Required string that defines the display value in downstream tools. Accepts plain text, spaces, and quotes (such as `orders_total` or `"orders_total"`). | Required |
| `type_params` | The type parameters of the metric. | Required |
| `window` | The accumulation window, such as 1 month, 7 days, 1 year. This can't be used with `grain_to_date`. | Optional |
| `grain_to_date` | Sets the accumulation grain, such as month will accumulate data for one month. Then restart at the beginning of the next. This can't be used with `window`. | Optional |
| `measure` | A list of measure inputs | Required |
| `measure:name` | TThe measure you are referencing. | Optional |
| `measure:fill_nulls_with` | Set the value in your metric definition instead of null (such as zero).| Optional |
| `measure:join_to_timespine` | Boolean that indicates if the aggregated measure should be joined to the time spine table to fill in missing dates. Default `false`. | Optional |
| `type_param::cumulative_type_params::window` | The accumulation window, such as 1 month, 7 days, 1 year. This can't be used with `grain_to_date`. | Optional |
| `type_param::cumulative_type_params::grain_to_date` | Sets the accumulation grain, such as `month`, which will accumulate data for one month. Then restart at the beginning of the next. This can't be used with `window`. | Optional |
| `type_param::cumulative_type_params::period_agg` | Specifies how to roll up the cumulative metric to another granularity. Options are `first`, `last`, `avg`. Defaults to `first` if no `window` is specified. | Optional |
| `type_param::measure` | A list of measure inputs | Required |
| `measure::name` | TThe measure you are referencing. | Optional |
| `measure::fill_nulls_with` | Set the value in your metric definition instead of null (such as zero).| Optional |
| `measure::join_to_timespine` | Boolean that indicates if the aggregated measure should be joined to the time spine table to fill in missing dates. Default `false`. | Optional |

The following displays the complete specification for cumulative metrics, along with an example:

Expand All @@ -37,8 +42,10 @@ metrics:
name: The measure you are referencing. # Required
fill_nulls_with: Set the value in your metric definition instead of null (such as zero). # Optional
join_to_timespine: true/false # Boolean that indicates if the aggregated measure should be joined to the time spine table to fill in missing dates. Default `false`. # Optional
window: The accumulation window, such as 1 month, 7 days, 1 year. # Optional. It cannot be used with grain_to_date.
grain_to_date: Sets the accumulation grain, such as month will accumulate data for one month, then restart at the beginning of the next. # Optional. It cannot be used with window.
cumulative_type_params:
period_agg: first # Optional. Defaults to first. Accepted values: first|last|avg
window: The accumulation window, such as 1 month, 7 days, 1 year. # Optional. It cannot be used with grain_to_date.
grain_to_date: Sets the accumulation grain, such as month will accumulate data for one month, then restart at the beginning of the next. # Optional. It cannot be used with window.

```

Expand All @@ -50,33 +57,117 @@ Cumulative metrics measure data over a given window and consider the window infi

metrics:
- name: cumulative_order_total
label: Cumulative Order total (All-Time)
label: Cumulative order total (All-Time)
description: The cumulative value of all orders
type: cumulative
type_params:
measure:
name: order_total
fill_nulls_with: 0

- name: cumulative_order_total_l1m
label: Cumulative Order total (L1M)
description: Trailing 1-month cumulative order amount
label: Cumulative order total (L1M)
description: Trailing 1-month cumulative order total
type: cumulative
type_params:
measure:
name: order_total
fill_nulls_with: 0
window: 1 month
cumulative_type_params
window: 1 month

- name: cumulative_order_total_mtd
label: Cumulative Order total (MTD)
label: Cumulative order total (MTD)
description: The month-to-date value of all orders
type: cumulative
type_params:
measure:
name: order_total
fill_nulls_with: 0
grain_to_date: month
cumulative_type_params
grain_to_date: month
```

### Granularity options
Granularity options for cumulative metrics are slightly diffrent that granulairty for other metric types. Granularity for other metrics is implemented using the `date_trunc` function, however cumulative values are not additive so we can't simply use the `date_trunc` function to aggregate cumulative metrics.

Instead we the first_value(), last_value(), and avg() aggregation functions to aggregate cumulative metrics over the requested period. By default we take the first value of the period. You can change this behaviour using the `period_agg` parameter.

Let's walk through an example using the following configs:


```yaml
- name: cumulative_revenue
description: The cumulative revenue for all orders.
label: Cumulative Revenue (All Time)
type: cumulative
type_params:
measure: revenue
cumulative_type_params:
period_agg: first # Optional. Defaults to first. Accepted values: first|end|avg
```

`period_agg` is set to first so we will choose the first value for the select granularity window. Let's say we cumulative_revenue by week using the following query: `dbt sl query --metrics cumulative_revenue --group-by metric_time__week`.

This compiles the following SQl, Note the use of the window function to select the first value:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jstein77 i'm not sure i understand the diff btw first_value(), etc. here compared to first. but is this placed correctly

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's because we use the first_value() function in the generated SQL. We can remove this if it's confusing.

```sql
-- Re-aggregate Metric via Group By
SELECT
metric_time__week
, metric_time__quarter
, revenue_all_time
FROM (
-- Window Function for Metric Re-aggregation
SELECT
metric_time__week
, metric_time__quarter
, FIRST_VALUE(revenue_all_time) OVER (
PARTITION BY
metric_time__week
, metric_time__quarter
ORDER BY metric_time__day
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
) AS revenue_all_time
FROM (
-- Join Self Over Time Range
-- Pass Only Elements: ['txn_revenue', 'metric_time__week', 'metric_time__quarter', 'metric_time__day']
-- Aggregate Measures
-- Compute Metrics via Expressions
SELECT
subq_11.metric_time__day AS metric_time__day
, subq_11.metric_time__week AS metric_time__week
, subq_11.metric_time__quarter AS metric_time__quarter
, SUM(revenue_src_28000.revenue) AS revenue_all_time
FROM (
-- Time Spine
SELECT
ds AS metric_time__day
, DATE_TRUNC('week', ds) AS metric_time__week
, DATE_TRUNC('quarter', ds) AS metric_time__quarter
FROM mf_time_spine subq_12
GROUP BY
ds
, DATE_TRUNC('week', ds)
, DATE_TRUNC('quarter', ds)
) subq_11
INNER JOIN
fct_revenue revenue_src_28000
ON
(
DATE_TRUNC('day', revenue_src_28000.created_at) <= subq_11.metric_time__day
)
GROUP BY
subq_11.metric_time__day
, subq_11.metric_time__week
, subq_11.metric_time__quarter
) subq_16
) subq_17
GROUP BY
metric_time__week
, metric_time__quarter
, revenue_all_time
```

For `last` and `avg`, we would replace the `frist_value()` function with `last_value()` and `average` respectively.

### Window options

This section details examples of when you specify and don't specify window options.
Expand All @@ -85,7 +176,7 @@ This section details examples of when you specify and don't specify window optio

<TabItem value="specified" label="Example of window specified">

If a window option is specified, the MetricFlow framework applies a sliding window to the underlying measure.
If a window option is specified, MetricFlow applies a sliding window to the underlying measure.

Suppose the underlying measure `customers` is configured to count the unique customers making orders at the Jaffle shop.

Expand All @@ -97,32 +188,32 @@ measures:

```

We can write a cumulative metric `weekly_customers` as such:
We can write a cumulative metric `weekly_customers` as such:

``` yaml
metrics:
- name: weekly_customers # Define the measure and the window.
type: cumulative
type_params:
measure: customers
window: 7 days # Setting the window to 7 days since we want to track weekly active
cumulative_type_params:
window: 7 days # Setting the window to 7 days since we want to track weekly active
period_agg: first # This will choose the first value of the ganularity window when changing the granularity.
```

From the sample YAML above, note the following:
From the sample YAML above, note the following:

* `type`: Specify cumulative to indicate the type of metric.
* `type_params`: Specify the measure you want to aggregate as a cumulative metric. You have the option of specifying a `window`, or a `grain to date`.
* `type_params`: Configure the cumulative metric by providing the a `measure`
* `cumulative_type_params` optionally add a `window`, `period_agg` and `grain_to_date` configuration.

For example, in the `weekly_customers` cumulative metric, MetricFlow takes a sliding 7-day window of relevant customers and applies a count distinct function.

If you omit the `window`, the measure will accumulate over all time. Otherwise, you can choose from granularities like day, week, quarter, or month, and describe the window using phrases like "7 days" or "1 month."

If you remove `window`, the measure will accumulate over all time.
</TabItem>

<TabItem value="notspecified" label="Example of window not specified">

You can use cumulative metrics without a window specified to obtain a running total. Suppose you have a log table with columns like:

Suppose you (a subscription-based company for the sake of this example) have an event-based log table with the following columns:

* `date`: a date column
Expand All @@ -132,7 +223,7 @@ Suppose you (a subscription-based company for the sake of this example) have an
* `event_type`: (integer) a column that populates with +1 to indicate an added subscription, or -1 to indicate a deleted subscription.
* `revenue`: (integer) a column that multiplies `event_type` and `subscription_revenue` to depict the amount of revenue added or lost for a specific date.

Using cumulative metrics without specifying a window, you can calculate running totals for metrics like the count of active subscriptions and revenue at any point in time. The following configuration YAML displays creating such cumulative metrics to obtain current revenue or the total number of active subscriptions as a cumulative sum:
Using cumulative metrics without specifying a window, you can calculate running totals for metrics like the count of active subscriptions and revenue at any point in time. The following YAML file shows creating a cumulative metrics to obtain current revenue and the total number of active subscriptions as a cumulative sum:

```yaml
measures:
Expand Down Expand Up @@ -164,7 +255,7 @@ metrics:

</Tabs>

### Grain to date
### Grain to date

You can choose to specify a grain to date in your cumulative metric configuration to accumulate a metric from the start of a grain (such as week, month, or year). When using a window, such as a month, MetricFlow will go back one full calendar month. However, grain to date will always start accumulating from the beginning of the grain, regardless of the latest date of data.

Expand All @@ -176,29 +267,90 @@ For example, let's consider an underlying measure of `order_total.`
agg: sum
```

We can compare the difference between a 1-month window and a monthly grain to date. The cumulative metric in a window approach applies a sliding window of 1 month, whereas the grain to date by month resets at the beginning of each month.
We can compare the difference between a 1-month window and a monthly grain to date.
- The cumulative metric in a window approach applies a sliding window of 1 month
- The grain to date by month resets at the beginning of each month.

```yaml
metrics:
- name: cumulative_order_total_l1m #For this metric, we use a window of 1 month
- name: cumulative_order_total_l1m # For this metric, we use a window of 1 month
label: Cumulative Order total (L1M)
description: Trailing 1-month cumulative order amount
type: cumulative
type_params:
measure: order_total
window: 1 month
- name: cumulative_order_total_mtd #For this metric, we use a monthly grain-to-date
cumulative_type_params:
window: 1 month # Applies a sliding window of 1 month
- name: cumulative_order_total_mtd # For this metric, we use a monthly grain-to-date
label: Cumulative Order total (MTD)
description: The month-to-date value of all orders
type: cumulative
type_params:
measure: order_total
cumulative_type_params
grain_to_date: month # Resets at the beginning of each month
period_agg: first # Optional. Defaults to first. Accepted values: first|last|avg
```


Cumulative metric with grain to date:

```yaml
- name: orders_last_month_to_date
label: Orders month to date
type: cumulative
type_params:
measure: order_count
cumulative_type_params:
grain_to_date: month
```

This compiles the following SQL code:

```sql
with staging as (
select
subq_3.date_day as metric_time__day,
date_trunc('week', subq_3.date_day) as metric_time__week,
sum(subq_1.order_count) as orders_last_month_to_date
from dbt_jstein.metricflow_time_spine subq_3
inner join (
select
date_trunc('day', ordered_at) as metric_time__day,
1 as order_count
from analytics.dbt_jstein.orders orders_src_10000
) subq_1
on (
subq_1.metric_time__day <= subq_3.date_day
) and (
subq_1.metric_time__day >= date_trunc('month', subq_3.date_day)
)
group by
subq_3.date_day,
date_trunc('week', subq_3.date_day)
)

select
*
from (
select
metric_time__week,
first_value(orders_last_month_to_date) over (partition by date_trunc('week', metric_time__day) order by metric_time__day) as cumulative_revenue
from
staging
)
group by
metric_time__week,
cumulative_revenue
order by
metric_time__week
1
```


### Implementation
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved

To calculate the cumulative value of the metric over a given window we do a time range join to a timespine table using the primary time dimension as the join key. We use the accumulation window in the join to decide whether a record should be included on a particular day. The following SQL code produced from an example cumulative metric is provided for reference:
To calculate the cumulative value of the metric over a given window, use a time range join to a timespine table using the primary time dimension as the join key. Use the accumulation window in the join to decide whether to include on a particular day. Refer to the following example cumulative metric SQL code:

``` sql
mirnawong1 marked this conversation as resolved.
Show resolved Hide resolved
select
Expand Down
4 changes: 4 additions & 0 deletions website/docs/docs/build/simple.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@ Simple metrics are metrics that directly reference a single measure, without any

The parameters, description, and type for simple metrics are:

:::tip
Note that we use the double colon (::) to indicate whether a parameter is nested within another parameter. So for example, `query_params::metrics` means the `metrics` parameter is nested under `query_params`.
:::

| Parameter | Description | Type |
| --------- | ----------- | ---- |
| `name` | The name of the metric. | Required |
Expand Down
Loading