Better unique_id construction for specific instances of generic tests #3254

jtcohen6 · 2021-04-12T20:09:07Z

Describe the feature

(Yes, I know I need to come up with a better phrase than "specific instances of generic tests")

Users will know that they cannot:

Have two bespoke tests named the same
Have two generic tests named the same
Have a bespoke test and a generic test named the same (!)

But they should not be responsible for worrying about namespace clashes between specific instances of generic tests!

Over in #2308 (which I'm closing in favor of this new issue), we had a great example of this. A not_null test on base.extension_id and base_extension.id both get the same:

unique_id: test.my_package.not_null_base_extension_id
fqn: test.schema_test.not_null_base_extension_id

Only one of these tests can be selected, executed, seen in manifest.json, etc.

Q & A

Do they need to be globally unique? No, I think we'd rather the unique_id is consistent from run to run. It should be created from the components of the test definition / instantiation.
Do they need to be human-readable? I don't think this is so important. We already hash the file names of schema tests with too many arguments. In a world where generic tests produce really good descriptions (#3249) and failure messages, the node name / unique_id feels less important: But we better do a good job of printing those descriptions out to stdout / artifacts!

Potential solutions

Components:

test name (e.g. not_null, unique)
model
column_name (if available)
Other non-config arguments., e.g. values for accepted_values, to + field for relationships
- Why not configs? I don't think changing the severity of a test should change the unique_id of that test node. That has the effect of making it seem, to dbt, that you've removed and added a genuinely different test. (I do think that's the effect of changing the values in accepted_values, however.) If this distinction gives us too much trouble, however, we should throw it away and figure out something simpler.

Options:

Concatenate using a character uncommon in SQL databases (not _), such as . or -. Unfortunately, those characters are also unwieldy in python. How wild would it be to use URL encoding for this?
Hash together the concatenation of the test components
Some combination: not_null__base__extension_id__0a9c45d3f7560da45492b85afa1f41b5, i.e. just enough to be readable plus md5(concat('not_null','-','model=model.my_package.base','-','column_name=extension_id'))

The text was updated successfully, but these errors were encountered:

jtcohen6 added enhancement New feature or request dbt tests Issues related to built-in dbt testing functionality labels Apr 12, 2021

This was referenced Apr 12, 2021

Namespace conflict for tests: Two tests gets the same name but only one is executed #2308

Closed

Tests are configurable from dbt_project.yml #3253

Closed

jtcohen6 added this to the Margaret Mead milestone Apr 13, 2021

jtcohen6 mentioned this issue Apr 15, 2021

Add alias to test #3266

Closed

iknox-fa self-assigned this May 3, 2021

iknox-fa mentioned this issue May 10, 2021

Feature/schema tests are more unique #3335

Merged

4 tasks

iknox-fa closed this as completed in #3335 May 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better unique_id construction for specific instances of generic tests #3254

Better unique_id construction for specific instances of generic tests #3254

jtcohen6 commented Apr 12, 2021

Better unique_id construction for specific instances of generic tests #3254

Better unique_id construction for specific instances of generic tests #3254

Comments

jtcohen6 commented Apr 12, 2021

Describe the feature

Q & A

Potential solutions