Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better unique_id construction for specific instances of generic tests #3254

Closed
jtcohen6 opened this issue Apr 12, 2021 · 0 comments · Fixed by #3335
Closed

Better unique_id construction for specific instances of generic tests #3254

jtcohen6 opened this issue Apr 12, 2021 · 0 comments · Fixed by #3335
Assignees
Labels
dbt tests Issues related to built-in dbt testing functionality enhancement New feature or request
Milestone

Comments

@jtcohen6
Copy link
Contributor

Describe the feature

(Yes, I know I need to come up with a better phrase than "specific instances of generic tests")

Users will know that they cannot:

  • Have two bespoke tests named the same
  • Have two generic tests named the same
  • Have a bespoke test and a generic test named the same (!)

But they should not be responsible for worrying about namespace clashes between specific instances of generic tests!

Over in #2308 (which I'm closing in favor of this new issue), we had a great example of this. A not_null test on base.extension_id and base_extension.id both get the same:

  • unique_id: test.my_package.not_null_base_extension_id
  • fqn: test.schema_test.not_null_base_extension_id

Only one of these tests can be selected, executed, seen in manifest.json, etc.

Q & A

Do they need to be globally unique? No, I think we'd rather the unique_id is consistent from run to run. It should be created from the components of the test definition / instantiation.
Do they need to be human-readable? I don't think this is so important. We already hash the file names of schema tests with too many arguments. In a world where generic tests produce really good descriptions (#3249) and failure messages, the node name / unique_id feels less important: But we better do a good job of printing those descriptions out to stdout / artifacts!

Potential solutions

Components:

  • test name (e.g. not_null, unique)
  • model
  • column_name (if available)
  • Other non-config arguments., e.g. values for accepted_values, to + field for relationships
    • Why not configs? I don't think changing the severity of a test should change the unique_id of that test node. That has the effect of making it seem, to dbt, that you've removed and added a genuinely different test. (I do think that's the effect of changing the values in accepted_values, however.) If this distinction gives us too much trouble, however, we should throw it away and figure out something simpler.

Options:

  • Concatenate using a character uncommon in SQL databases (not _), such as . or -. Unfortunately, those characters are also unwieldy in python. How wild would it be to use URL encoding for this?
  • Hash together the concatenation of the test components
  • Some combination: not_null__base__extension_id__0a9c45d3f7560da45492b85afa1f41b5, i.e. just enough to be readable plus md5(concat('not_null','-','model=model.my_package.base','-','column_name=extension_id'))
@jtcohen6 jtcohen6 added enhancement New feature or request dbt tests Issues related to built-in dbt testing functionality labels Apr 12, 2021
@jtcohen6 jtcohen6 added this to the Margaret Mead milestone Apr 13, 2021
@iknox-fa iknox-fa self-assigned this May 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dbt tests Issues related to built-in dbt testing functionality enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants