Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ele 1453 implement metrics collection #447

Merged
merged 30 commits into from
Aug 6, 2023

Conversation

elongl
Copy link
Member

@elongl elongl commented Aug 1, 2023

No description provided.

@linear
Copy link

linear bot commented Aug 1, 2023

ELE-1453 Implement metrics collection

Definition of done:

@github-actions
Copy link
Contributor

github-actions bot commented Aug 1, 2023

👋 @elongl
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

@elongl elongl marked this pull request as draft August 2, 2023 12:09
@@ -0,0 +1,29 @@
{% macro query_table_metrics() %}
{% set query %}
select count(*) as row_count
Copy link
Member Author

@elongl elongl Aug 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The freshness metrics (model run timestamp) can be calculated from this metric.
Should we add an empty freshness or build metric for clarity?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of thoughts about this:

  1. Yes, I definitely think for clarity it's good to have an explicit row :)
  2. When we move the test to rely on the metrics we'll need to actually compute the freshness metric (as diff between timestamps) - but maybe that's OK. For recommendation anyway that's enough.

@@ -0,0 +1,29 @@
{% macro query_table_metrics() %}
{% set query %}
select count(*) as row_count
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of thoughts about this:

  1. Yes, I definitely think for clarity it's good to have an explicit row :)
  2. When we move the test to rely on the metrics we'll need to actually compute the freshness metric (as diff between timestamps) - but maybe that's OK. For recommendation anyway that's enough.

macros/edr/tests/on_run_end/insert_metrics.sql Outdated Show resolved Hide resolved
macros/edr/materializations/model/metrics.sql Show resolved Hide resolved
@@ -0,0 +1,29 @@
{% macro query_table_metrics() %}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be nice if there will be a mapping of metric -> SQL so this macro won't become bloated (I think this is a part of what makes the existing macros for collecting metrics hard to read)
But I'm fine with deferring until there are more metrics here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also:

  • Should this file perhaps be somewhere else rather than under "materializations"?
  • Related to the previous comment, but do we actually want to reuse table_monitoring_query? Not sure if we do, but it is going to create some duplication otherwise.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the duplication for now, table_monitoring_query is extremely complex in my opinion.
This is very simple and straight-forward, I'd rather keep it that way in spite of duplication.

Regarding the first point, maybe I don't fully understand what you meant, but I don't find:

{% macro get_row_count_metric_expr() %}
count(*) as row_count
{% endmacro %}


{% macro query_table_metrics() %}
  {% set query %}
    select
    {{ elementary.get_row_count_metric_expr() }}
    from {{ this }}
  {% endset %}
...

More readable than:

{% macro query_table_metrics() %}
  {% set query %}
    select
      count(*) as row_count
    from {{ this }}
  {% endset %}
...

I like how all the metrics are centralized in one clear query.
Though, if the expressions would be more complicated than count(*) than I can understand.

{% macro insert_metrics() %}
{% set metrics = elementary.get_cache("tables").get("metrics") %}
{% set database_name, schema_name = elementary.get_package_database_and_schema() %}
{%- set target_relation = adapter.get_relation(database=database_name, schema=schema_name, identifier='data_monitoring_metrics') -%}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it somehow affect existing tests that we're writing to data_monitoring_metrics?
I think it should be fine but just verifying

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those metrics aren't being used in the tests for now because they don't have metric properties.

@elongl elongl marked this pull request as ready for review August 3, 2023 14:19
@elongl elongl merged commit a6a8401 into master Aug 6, 2023
@elongl elongl deleted the ele-1453-implement-metrics-collection branch August 6, 2023 07:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants