-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ele 1453 implement metrics collection #447
Conversation
👋 @elongl |
@@ -0,0 +1,29 @@ | |||
{% macro query_table_metrics() %} | |||
{% set query %} | |||
select count(*) as row_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The freshness metrics (model run timestamp) can be calculated from this metric.
Should we add an empty freshness
or build
metric for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of thoughts about this:
- Yes, I definitely think for clarity it's good to have an explicit row :)
- When we move the test to rely on the metrics we'll need to actually compute the freshness metric (as diff between timestamps) - but maybe that's OK. For recommendation anyway that's enough.
@@ -0,0 +1,29 @@ | |||
{% macro query_table_metrics() %} | |||
{% set query %} | |||
select count(*) as row_count |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of thoughts about this:
- Yes, I definitely think for clarity it's good to have an explicit row :)
- When we move the test to rely on the metrics we'll need to actually compute the freshness metric (as diff between timestamps) - but maybe that's OK. For recommendation anyway that's enough.
@@ -0,0 +1,29 @@ | |||
{% macro query_table_metrics() %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May be nice if there will be a mapping of metric -> SQL so this macro won't become bloated (I think this is a part of what makes the existing macros for collecting metrics hard to read)
But I'm fine with deferring until there are more metrics here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also:
- Should this file perhaps be somewhere else rather than under "materializations"?
- Related to the previous comment, but do we actually want to reuse
table_monitoring_query
? Not sure if we do, but it is going to create some duplication otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer the duplication for now, table_monitoring_query
is extremely complex in my opinion.
This is very simple and straight-forward, I'd rather keep it that way in spite of duplication.
Regarding the first point, maybe I don't fully understand what you meant, but I don't find:
{% macro get_row_count_metric_expr() %}
count(*) as row_count
{% endmacro %}
{% macro query_table_metrics() %}
{% set query %}
select
{{ elementary.get_row_count_metric_expr() }}
from {{ this }}
{% endset %}
...
More readable than:
{% macro query_table_metrics() %}
{% set query %}
select
count(*) as row_count
from {{ this }}
{% endset %}
...
I like how all the metrics are centralized in one clear query.
Though, if the expressions would be more complicated than count(*)
than I can understand.
{% macro insert_metrics() %} | ||
{% set metrics = elementary.get_cache("tables").get("metrics") %} | ||
{% set database_name, schema_name = elementary.get_package_database_and_schema() %} | ||
{%- set target_relation = adapter.get_relation(database=database_name, schema=schema_name, identifier='data_monitoring_metrics') -%} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it somehow affect existing tests that we're writing to data_monitoring_metrics
?
I think it should be fine but just verifying
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those metrics aren't being used in the tests for now because they don't have metric properties.
No description provided.