Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ele 1453 implement metrics collection #447

Merged
merged 30 commits into from
Aug 6, 2023
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
c69e0aa
POC Metrics collection.
elongl Jul 20, 2023
9f1702c
DbtProject.seed is not a context manager.
elongl Jul 20, 2023
3b89bc7
Added BigQuery materializations.
elongl Jul 20, 2023
ef355b2
Created a high-level function.
elongl Jul 20, 2023
088c075
Changed a macro.
elongl Jul 23, 2023
f819420
Added view materialization.
elongl Jul 23, 2023
9e9ce64
Simplified seed names.
elongl Jul 23, 2023
6e4d3cf
Merge branch 'master' into ele-1356-poc-metrics-collection
elongl Jul 23, 2023
3542290
Added view materialization.
elongl Jul 23, 2023
bdf3f13
Merged master.
elongl Aug 1, 2023
fe44a61
Using YAML selector.
elongl Aug 1, 2023
7ba7616
Pulled master.
elongl Aug 1, 2023
915ca16
Deleted redundant source table.
elongl Aug 1, 2023
456b849
Renamed a file, added 'updated_at'.
elongl Aug 1, 2023
efafcf7
Added a 'build' metric.
elongl Aug 3, 2023
2bf4de9
Added a feature flag to collect metrics.
elongl Aug 3, 2023
7e2f82c
Changed an error to a warning.
elongl Aug 3, 2023
7bb2bdb
Changed the 'full_table_name'.
elongl Aug 3, 2023
57b0426
Changed the 'full_table_name'.
elongl Aug 3, 2023
3cd6bb5
Not using YAML selector.
elongl Aug 3, 2023
2e3d04f
Added materializations to rest of adapters.
elongl Aug 3, 2023
4f7b75e
Merge branch 'master' into ele-1453-implement-metrics-collection
elongl Aug 3, 2023
22067b2
Moved feature flag.
elongl Aug 3, 2023
ce276b8
Revert "Moved feature flag."
elongl Aug 3, 2023
2b30eaa
Removed a redundant return.
elongl Aug 3, 2023
bd3d1fb
Keeping the default vars.
elongl Aug 3, 2023
1b9aa0a
Deleted 'view' materialization.
elongl Aug 3, 2023
2925c0d
Added a metric value for the 'build_timestamp' metric.
elongl Aug 3, 2023
acc8bae
Added metric count indicator.
elongl Aug 3, 2023
a35e04a
Merged master.
elongl Aug 3, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{{ config(materialized="incremental") }}

select * from {{ source("test_data", "metrics_seed2") }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
{{ config(materialized="table") }}

select * from {{ source("test_data", "metrics_seed1") }}
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
{{ config(materialized="view") }}

select * from {{ source("test_data", "metrics_seed1") }}
union all
select * from {{ source("test_data", "metrics_seed2") }}
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,5 @@ sources:
- name: test_data
schema: "{{ target.schema }}"
tables:
- name: dummy
- name: metrics_seed1
- name: metrics_seed2
3 changes: 3 additions & 0 deletions integration_tests/integration_tests/dbt_project/selectors.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@ selectors:

- name: one
definition: one

- name: metrics
definition: metrics_*
38 changes: 38 additions & 0 deletions integration_tests/integration_tests/tests/test_metrics.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import random
from datetime import datetime

from data_generator import DATE_FORMAT, generate_dates
from dbt_project import DbtProject


def test_metrics(dbt_project: DbtProject):
data1 = [
{"updated_at": date.strftime(DATE_FORMAT)}
for date in generate_dates(base_date=datetime.now())
for _ in range(random.randint(-5, 20))
]
data2 = [
{"created_at": date.strftime(DATE_FORMAT)}
for date in generate_dates(base_date=datetime.now())
for _ in range(random.randint(0, 20))
]
dbt_project.seed(data1, "metrics_seed1")
dbt_project.seed(data2, "metrics_seed2")
dbt_project.dbt_runner.run(selector="metrics")

remaining_models_to_row_count = {
"metrics_view": len(data1) + len(data2),
"metrics_table": len(data1),
"metrics_incremental": len(data2),
}
for metric in dbt_project.read_table("data_monitoring_metrics"):
for model_name, row_count in remaining_models_to_row_count.items():
if (
model_name in metric["full_table_name"]
and metric["metric_name"] == "row_count"
):
assert metric["metric_value"] == row_count
remaining_models_to_row_count.pop(model_name)
break

assert not remaining_models_to_row_count
28 changes: 28 additions & 0 deletions macros/edr/materializations/model/common.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
{% macro query_table_metrics() %}
{% set query %}
select count(*) as row_count
from {{ this }}
{% endset %}

{% set metrics = [] %}
{% for metric_column in elementary.run_query(query).columns %}
{% set metric_name = metric_column.name %}
{% set metric_value = metric_column[0] %}
{% do metrics.append({
"id": "{}.{}".format(invocation_id, this),
"full_table_name": this | string,
"column_name": none,
"metric_name": metric_name,
"metric_value": metric_value
}) %}
{% endfor %}
{% do return(metrics) %}
{% endmacro %}

{% macro query_metrics() %}
{% do return(elementary.query_table_metrics()) %}
{% endmacro %}

{% macro cache_metrics(metrics) %}
{% do elementary.get_cache("tables").get("metrics").extend(metrics) %}
{% endmacro %}
13 changes: 13 additions & 0 deletions macros/edr/materializations/model/incremental.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{% materialization incremental, default %}
{% set relations = dbt.materialization_incremental_default() %}
{% set metrics = elementary.query_metrics() %}
elongl marked this conversation as resolved.
Show resolved Hide resolved
{% do elementary.cache_metrics(metrics) %}
{% do return(relations) %}
{% endmaterialization %}

{% materialization incremental, adapter="bigquery" %}
{% set relations = dbt.materialization_incremental_bigquery() %}
{% set metrics = elementary.query_metrics() %}
{% do elementary.cache_metrics(metrics) %}
{% do return(relations) %}
{% endmaterialization %}
13 changes: 13 additions & 0 deletions macros/edr/materializations/model/table.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{% materialization table, default %}
{% set relations = dbt.materialization_table_default() %}
{% set metrics = elementary.query_metrics() %}
{% do elementary.cache_metrics(metrics) %}
{% do return(relations) %}
{% endmaterialization %}

{% materialization table, adapter="bigquery" %}
{% set relations = dbt.materialization_table_bigquery() %}
{% set metrics = elementary.query_metrics() %}
{% do elementary.cache_metrics(metrics) %}
{% do return(relations) %}
{% endmaterialization %}
13 changes: 13 additions & 0 deletions macros/edr/materializations/model/view.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{% materialization view, default %}
{% set relations = dbt.materialization_view_default() %}
{% set metrics = elementary.query_metrics() %}
{% do elementary.cache_metrics(metrics) %}
{% do return(relations) %}
{% endmaterialization %}

{% materialization view, adapter="bigquery" %}
{% set relations = dbt.materialization_view_bigquery() %}
{% set metrics = elementary.query_metrics() %}
{% do elementary.cache_metrics(metrics) %}
{% do return(relations) %}
{% endmaterialization %}
4 changes: 4 additions & 0 deletions macros/edr/system/hooks/on_run_end.sql
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@
{{ return('') }}
{% endif %}

{% if flags.WHICH in ['run', 'build'] %}
{{ elementary.insert_metrics() }}
{% endif %}

{% if not elementary.get_config_var('disable_dbt_artifacts_autoupload') %}
{{ elementary.upload_dbt_artifacts() }}
{% endif %}
Expand Down
12 changes: 12 additions & 0 deletions macros/edr/tests/on_run_end/insert_metrics.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{% macro insert_metrics() %}
{% set metrics = elementary.get_cache("tables").get("metrics") %}
{% set database_name, schema_name = elementary.get_package_database_and_schema() %}
{%- set target_relation = adapter.get_relation(database=database_name, schema=schema_name, identifier='data_monitoring_metrics') -%}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it somehow affect existing tests that we're writing to data_monitoring_metrics?
I think it should be fine but just verifying

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those metrics aren't being used in the tests for now because they don't have metric properties.

{% if not target_relation %}
{% do exceptions.raise_compiler_error("Couldn't find Elementary's models. Please run `dbt run -s elementary`.") %}
elongl marked this conversation as resolved.
Show resolved Hide resolved
{% endif %}

{{ elementary.file_log("Inserting metrics into {}.".format(target_relation)) }}
{% do elementary.insert_rows(target_relation, metrics, should_commit=true) %}
{{ return('') }}
{% endmacro %}