Orient to Caltrans Step 1: Add tests and docs for data grain #340

jkarpen · 2024-08-07T18:46:15Z

Almost all of the tables in the dbt project have an intended data grain. In most cases they are time series for devices, and the grain is a combination of a device ID and a timestamp at a particular aggregation level. However, this data grain is not well documented or tested! Please:

Review the data models in the dbt project and identify the intended grain (please talk to Ken if you need help)
Add documentation where appropriate to better indicate the intended grain of the models.
Add uniqueness and not-null tests to enforce the uniqueness, so long as it is not too costly.

Caveats:

A station is made up of multiple detectors, one in each lane. In some cases we use a station+lane combination to indicate a unique detector, in other cases we use a detector ID. We may want to standardize on the latter, but in the meantime, know that a table with station and lane is at the detector level, and a table with station only is at the station level.
Uniqueness tests on larger tables may be expensive. Do some performance tests and use best judgment on whether they are appropriate for a given dbt model.

summer-mothwood · 2024-12-06T18:04:47Z

Part of this project will be to change to the new data_tests syntax across the yaml files for all Caltrans models that currently have data tests: https://docs.getdbt.com/docs/build/data-tests

From dbt v1.8, "tests" are now called "data tests" to disambiguate from unit tests. The YAML key tests: is still supported as an alias for data_tests:. Refer to New data_tests: syntax for more information.

jkarpen added this to the Onboarding Tasks milestone Aug 7, 2024

jkarpen assigned summer-mothwood Sep 30, 2024

jkarpen added the unplanned label Oct 8, 2024

summer-mothwood mentioned this issue Nov 8, 2024

Standardize unique_key for incremental models cagov/caldata-mdsa-caltrans-pems#432

Closed

ian-r-rose mentioned this issue Nov 15, 2024

Resolve duplicate geolocation of the station. cagov/caldata-mdsa-caltrans-pems#479

Closed

summer-mothwood removed the unplanned label Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orient to Caltrans Step 1: Add tests and docs for data grain #340

Orient to Caltrans Step 1: Add tests and docs for data grain #340

jkarpen commented Aug 7, 2024

summer-mothwood commented Dec 6, 2024

Orient to Caltrans Step 1: Add tests and docs for data grain #340

Orient to Caltrans Step 1: Add tests and docs for data grain #340

Comments

jkarpen commented Aug 7, 2024

summer-mothwood commented Dec 6, 2024