Skip to content

Commit

Permalink
v0.1.0 (#5)
Browse files Browse the repository at this point in the history
* feat: use relationship, occurance classes

* docs: update README

* feat: simplified interface

* fix: override_columns -> activity_schema_v2_column_mappings

* docs: update README
  • Loading branch information
tnightengale authored Mar 8, 2023
1 parent 45ad426 commit b0953be
Show file tree
Hide file tree
Showing 44 changed files with 608 additions and 310 deletions.
203 changes: 189 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,26 @@
# dbt-activity-schema <!-- omit in toc -->

A [dbt-Core](https://docs.getdbt.com/docs/introduction) [package](https://docs.getdbt.com/docs/build/packages#what-is-a-package) which contains macros to self-join an _activity stream_: the primary table in the [Activity Schema](https://github.com/ActivitySchema/ActivitySchema/blob/main/2.0.md) data modelling framework.
A [dbt-Core](https://docs.getdbt.com/docs/introduction)
[package](https://docs.getdbt.com/docs/build/packages#what-is-a-package) which
contains macros to create derived Datasets by self-joining an [Activity
Stream](https://github.com/ActivitySchema/ActivitySchema/blob/main/2.0.md#activity-stream),
the primary table in the [Activity
Schema](https://github.com/ActivitySchema/ActivitySchema/blob/main/2.0.md) data
modelling framework.

## Table of Contents <!-- omit in toc -->
- [Install](#install)
- [Usage](#usage)
- [Create a Dataset](#create-a-dataset)
- [Configure Columns](#configure-columns)
- [Required Columns](#required-columns)
- [Mapping Column Names](#mapping-column-names)
- [Included Dataset Columns](#included-dataset-columns)
- [Configure Appended Activity Column Names](#configure-appended-activity-column-names)
- [Macros](#macros)
- [dataset (source)](#dataset-source)
- [activity (source)](#activity-source)
- [Relationships](#relationships)
- [Contributions](#contributions)

## Install
Expand All @@ -13,26 +29,185 @@ Include in `packages.yml`:
```yaml
packages:
- git: "https://github.com/tnightengale/dbt-activity-schema"
revision: 0.0.1
revision: 0.1.0
```
For latest release, see
https://github.com/tnightengale/dbt-activity-schema/releases.
## Usage
Use the `dataset.sql` macro with the appropriate params to generate a self-joined dataset from the activity stream model in your project, eg:
```SQL
{{
dbt_activity_schema.dataset(
ref("example__activity_stream"),
dbt_activity_schema.primary_activity("All","bought something"),
[
dbt_activity_schema.append_activity("first_before", "visited page")
### Create a Dataset
Use the [dataset macro](###dataset-source) with the appropriate arguments to
derive a Dataset by self-joining the Activity Stream model in your project. The
[dataset macro](###dataset) will compile based on the provided [activity
macros](###activity-source) and the [relationship macros](##relationships). It
can then be nested in a CTE in a dbt-Core model. Eg:
```c
// my_first_dataset.sql

with

dataset_cte as (
{{ dbt_activity_schema.dataset(
activity_stream_ref = ref("example__activity_stream"),

primary_activity = dbt_activity_schema.activity(
dbt_activity_schema.all_ever(), "bought something"),

appended_activities = [
dbt_activity_schema.activity(
dbt_activity_schema.first_before(), "visited page"),
dbt_activity_schema.activity(
dbt_activity_schema.first_after(), "bought item"),
]
)
}}
) }}
)

select * from dataset_cte

```
> Note: This package does not contain macros to create the Activity Stream
> model. It derives Dataset models on top of an existing Activity Stream model.
### Configure Columns
This package conforms to the [Activity Schema V2
Specification](https://github.com/ActivitySchema/ActivitySchema/blob/main/2.0.md#entity-table)
and, by default, it expects the columns in that spec to exist in the Activity Stream model.

#### Required Columns
In order for critical joins in the [dataset macro](###dataset) to work as
expected, the following columns must exist:
- **`activity`**: A string or ID that identifies the action or fact
attributable to the `customer`.
- **`customer`**: The UUID of the entity or customer. Must be used across
activities.
- **`ts`**: The timestamp at which the activity occurred.
- **`activity_repeated_at`**: The timestamp of the next activity, per
customer. Create using a lead window function, partitioned by activity and
customer.
- **`activity_occurrence`**: The running count of the actvity per customer.
Create using a rank window function, partitioned by activity and customer.

#### Mapping Column Names
If the required columns exist conceptually under different names, they can be
aliased using the nested `activity_schema_v2_column_mappings` project var. Eg:

```yml
# dbt_project.yml

...

vars:
dbt_activity_schema:
activity_schema_v2_column_mappings:
# Activity Stream with required column names that
# differ from the V2 spec, mapped from their spec name.
customer: entity_uuid
ts: activity_occurred_at

...
```

#### Included Dataset Columns
The set of columns that are included in the compiled SQL of the [dataset
macro](###dataset-source) can be configured using the nested
`default_dataset_columns` project var. Eg:
```yml
# dbt_project.yml

...

vars:
dbt_activity_schema:
# List columns from the Activity Schema to include in the Dataset
default_dataset_columns:
- activity_id
- entity_uuid
- activity_occurred_at
- revenue_impact

...
```
See the signature in the macro for more details on each parameter.

These defaults can be overriden using the `override_columns` argument in the
[activity macro](###activity-source).

#### Configure Appended Activity Column Names
The naming convention of the columns, in the activities passed to the
`appended_activities` argument can be configured by overriding the
[generate_appended_column_alias](./macros/utils/generate_appended_column_alias.sql)
macro. See the dbt docs on [overriding package
macros](https://docs.getdbt.com/reference/dbt-jinja-functions/dispatch#overriding-package-macros)
for more details.

## Macros
---
### dataset ([source](macros/dataset.sql))
Create a derived dataset using self-joins from an Activity Stream model.

**params:**
- **`activity_stream_ref (required)`** : [ref](https://docs.getdbt.com/reference/dbt-jinja-functions/ref)

The dbt `ref()` that points to the activty stream model.

- **`primary_activity (required)`** : [activity](###activity)

The primary activity of the derived dataset.

- **`appended_activities (optional)`** : List [ [activity](###activity) ]

The list of appended activities to self-join to the primary activity.

### activity ([source](macros/activity.sql))
Represents either the primary activity or one of the appended activities in a
dataset.

**params:**
- **`relationship (required)`** : [relationship](##relationships)

The relationship that defines how the activity is filtered or joined,
depending on if it is provided to the `primary_activity` or
`appended_activities` argument in the dataset macro.

- **`activity_name (required)`** : str

The string identifier of the activity in the Activity Stream. Should match the
value in the `activity` column.

- **`override_columns (optional)`** : List [ str ]

List of columns to include for the activity. Setting this Overrides the defaults configured
by the `default_dataset_columns` project var.

- **`additional_join_condition (optional)`** : str

A valid SQL boolean to condition the join of the appended activity. Can
optionally contain the python f-string placeholders `{primary}` and
`{appended}` in the string. These placeholders will be compiled by the
[dataset macro](./macros/dataset.sql) with the correct SQL aliases for the
joins between the primary activity and the appended activity.

Eg:
```python
"json_extract({primary}.feature_json, 'dim1') =
json_extract({appended}.feature_json, 'dim1')"
```
The `{primary}` and `{appended}` placeholders compile according to
the cardinality of the activity in the `appended_activities` list
argument to `dataset.sql`.

Compiled:
```python
"json_extract(stream.feature_json, 'dim1') =
json_extract(stream_3.feature_json, 'dim1')"
```
Given that the appended activity was 3rd in the `appended_activities` list
argument.

## Relationships
See the [relationships/](macros/relationships/) path for the most up to date
relationships and their documentation.

## Contributions
Contributions and feedback are welcome. Please create an issue if you'd like to contribute.
Contributions and feedback are welcome. Please create an issue if you'd like to
contribute.
23 changes: 9 additions & 14 deletions integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -20,20 +20,15 @@ models:
+format: csv

vars:
primary_activity_columns:
- activity_id
- customer
- ts
- activity
- anonymous_customer_id
- feature_json
- revenue_impact
- link
- activity_occurrence
- activity_repeated_at
appended_activity_columns:
- feature_json
- ts
dbt_activity_schema:
default_dataset_columns:
- activity_id
- entity_uuid
- ts
- revenue_impact
activity_schema_v2_column_mappings:
customer: entity_uuid
anonymous_customer_id: anonymous_entity_uuid

seeds:
dbt_activity_schema_integration_tests:
Expand Down
11 changes: 9 additions & 2 deletions integration_tests/models/first_after/dataset__first_after_1.sql
Original file line number Diff line number Diff line change
@@ -1,9 +1,16 @@
{{
dbt_activity_schema.dataset(
ref("input__first_after"),
dbt_activity_schema.primary_activity("All","signed up"),
dbt_activity_schema.activity(
dbt_activity_schema.all_ever(),
"signed up"
),
[
dbt_activity_schema.append_activity("first_after", "bought something")
dbt_activity_schema.activity(
dbt_activity_schema.first_after(),
"bought something",
["feature_json", "ts"]
)
]
)
}}
12 changes: 9 additions & 3 deletions integration_tests/models/first_after/dataset__first_after_2.sql
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
{{
dbt_activity_schema.dataset(
ref("input__first_after"),
dbt_activity_schema.primary_activity("All","visit page"),
dbt_activity_schema.activity(
dbt_activity_schema.all_ever(),
"visit page"
),
[
dbt_activity_schema.append_activity(
"first_after", "bought something")
dbt_activity_schema.activity(
dbt_activity_schema.first_after(),
"bought something",
["feature_json", "ts"]
)
]
)
}}
14 changes: 10 additions & 4 deletions integration_tests/models/first_after/dataset__first_after_3.sql
Original file line number Diff line number Diff line change
@@ -1,13 +1,19 @@
{{
dbt_activity_schema.dataset(
ref("input__first_after"),
dbt_activity_schema.primary_activity("All","signed up"),
dbt_activity_schema.activity(
dbt_activity_schema.all_ever(),
"signed up"
),
[
dbt_activity_schema.append_activity(
"first_after",
dbt_activity_schema.activity(
dbt_activity_schema.first_after(),
"visit page",
["feature_json", "activity_occurrence", "ts"],
feature_json_join_columns=["type"]
additional_join_condition="
json_extract({primary}.feature_json, 'type')
= json_extract({appended}.feature_json, 'type')
"
)
]
)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
{{
dbt_activity_schema.dataset(
ref("example__activity_stream"),
dbt_activity_schema.primary_activity("All","bought something"),
dbt_activity_schema.activity(dbt_activity_schema.all_ever(), "bought something"),
[
dbt_activity_schema.append_activity("first_before", "visited page")
dbt_activity_schema.activity(
dbt_activity_schema.first_before(),
"visited page",
["feature_json", "ts"]
)
]
)
}}
8 changes: 6 additions & 2 deletions integration_tests/models/first_ever/dataset__first_ever.sql
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
{{
dbt_activity_schema.dataset(
ref("example__activity_stream"),
dbt_activity_schema.primary_activity("All","visited page"),
dbt_activity_schema.activity(dbt_activity_schema.all_ever(),"visited page"),
[
dbt_activity_schema.append_activity("first_ever", "signed up")
dbt_activity_schema.activity(
dbt_activity_schema.first_ever(),
"signed up",
["feature_json", "ts"]
)
]
)
}}
5 changes: 2 additions & 3 deletions integration_tests/models/last_after/dataset__last_after_1.sql
Original file line number Diff line number Diff line change
@@ -1,10 +1,9 @@
{{
dbt_activity_schema.dataset(
ref("input__last_after"),
dbt_activity_schema.primary_activity(1,"signed up"),
dbt_activity_schema.activity(dbt_activity_schema.nth_ever(1), "signed up"),
[
dbt_activity_schema.append_activity("last_after", "visit page")
dbt_activity_schema.activity(dbt_activity_schema.last_after(), "visit page")
]
)
}}
s
8 changes: 6 additions & 2 deletions integration_tests/models/last_before/dataset__last_before.sql
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
{{
dbt_activity_schema.dataset(
ref("example__activity_stream"),
dbt_activity_schema.primary_activity("All","bought something"),
dbt_activity_schema.activity(dbt_activity_schema.all_ever(),"bought something"),
[
dbt_activity_schema.append_activity("last_before", "visited page")
dbt_activity_schema.activity(
dbt_activity_schema.last_before(),
"visited page",
["feature_json", "ts"]
)
]
)
}}
8 changes: 6 additions & 2 deletions integration_tests/models/last_ever/dataset__last_ever.sql
Original file line number Diff line number Diff line change
@@ -1,9 +1,13 @@
{{
dbt_activity_schema.dataset(
ref("example__activity_stream"),
dbt_activity_schema.primary_activity("Last","visited page"),
dbt_activity_schema.activity(dbt_activity_schema.last_ever(),"visited page"),
[
dbt_activity_schema.append_activity("last_ever", "bought something")
dbt_activity_schema.activity(
dbt_activity_schema.last_ever(),
"bought something",
["feature_json", "ts"]
)
]
)
}}
Loading

0 comments on commit b0953be

Please sign in to comment.