Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data Quality Daily Row Check Mart Model #416

Merged
merged 6 commits into from
Oct 10, 2024
Merged

Data Quality Daily Row Check Mart Model #416

merged 6 commits into from
Oct 10, 2024

Conversation

kengodleskidot
Copy link
Contributor

The data model created in this PR can be used to create data quality visualizations/reports. The model aggregates the daily number of detectors for 4 data models that should all match. Below is an example of how two of the models are matching but subsequent models do not along with a screenshot in PeMS for comparison purposes:
image

image

I anticipate the Good and Bad counts to update as needed after the full data refresh. This should assist with #413, #397 and #398.

…ata models that should match. This data model can be used for a data quality visualization using kibana or another other tools.
@@ -0,0 +1,67 @@
{{ config(
materialized="table"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is minor, but this config could be left out since models in the marts directory are already configured (in dbt_profile.yml) to already be materialized as tables.

@kengodleskidot
Copy link
Contributor Author

The data model created in this PR can be used to create data quality visualizations/reports. The model aggregates the daily number of detectors for 4 data models that should all match. Below is an example of how two of the models are matching but subsequent models do not along with a screenshot in PeMS for comparison purposes: image

image

I anticipate the Good and Bad counts to update as needed after the full data refresh. This should assist with #413, #397 and #398.

@mmmiah pointed out that we should only be comparing detector counts for station types of ML and HV only since the the imputed and performance data models are for those station types only. The mart data model has been updated to reflect these station types. The updated model reflects row counts that match, great catch @mmmiah!

image

We will still include the good/bad/total detector count from the detector_status model for QC usage.

@kengodleskidot kengodleskidot marked this pull request as ready for review October 7, 2024 23:32
Copy link
Contributor

@mmmiah mmmiah Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kengodleskidot , why the all GOOD_STATUS_COUNT is zero. Does that mean all detectors are down???
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am guessing that it will change once it will go through night job in production. Is that the case here?

Copy link
Contributor

@mmmiah mmmiah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please merge! I will use this data for QC visualization from analytical_prd

@kengodleskidot kengodleskidot merged commit 98fd5c1 into main Oct 10, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants