You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(Yes, I know I need to come up with a better phrase than "specific instances of generic tests")
Users will know that they cannot:
Have two bespoke tests named the same
Have two generic tests named the same
Have a bespoke test and a generic test named the same (!)
But they should not be responsible for worrying about namespace clashes between specific instances of generic tests!
Over in #2308 (which I'm closing in favor of this new issue), we had a great example of this. A not_null test on base.extension_id and base_extension.id both get the same:
Only one of these tests can be selected, executed, seen in manifest.json, etc.
Q & A
Do they need to be globally unique? No, I think we'd rather the unique_id is consistent from run to run. It should be created from the components of the test definition / instantiation. Do they need to be human-readable? I don't think this is so important. We already hash the file names of schema tests with too many arguments. In a world where generic tests produce really good descriptions (#3249) and failure messages, the node name / unique_id feels less important: But we better do a good job of printing those descriptions out to stdout / artifacts!
Potential solutions
Components:
test name (e.g. not_null, unique)
model
column_name (if available)
Other non-config arguments., e.g. values for accepted_values, to + field for relationships
Why not configs? I don't think changing the severity of a test should change the unique_id of that test node. That has the effect of making it seem, to dbt, that you've removed and added a genuinely different test. (I do think that's the effect of changing the values in accepted_values, however.) If this distinction gives us too much trouble, however, we should throw it away and figure out something simpler.
Options:
Concatenate using a character uncommon in SQL databases (not_), such as . or -. Unfortunately, those characters are also unwieldy in python. How wild would it be to use URL encoding for this?
Hash together the concatenation of the test components
Some combination: not_null__base__extension_id__0a9c45d3f7560da45492b85afa1f41b5, i.e. just enough to be readable plus md5(concat('not_null','-','model=model.my_package.base','-','column_name=extension_id'))
The text was updated successfully, but these errors were encountered:
Describe the feature
(Yes, I know I need to come up with a better phrase than "specific instances of generic tests")
Users will know that they cannot:
But they should not be responsible for worrying about namespace clashes between specific instances of generic tests!
Over in #2308 (which I'm closing in favor of this new issue), we had a great example of this. A
not_null
test onbase.extension_id
andbase_extension.id
both get the same:test.my_package.not_null_base_extension_id
test.schema_test.not_null_base_extension_id
Only one of these tests can be selected, executed, seen in
manifest.json
, etc.Q & A
Do they need to be globally unique? No, I think we'd rather the
unique_id
is consistent from run to run. It should be created from the components of the test definition / instantiation.Do they need to be human-readable? I don't think this is so important. We already hash the file names of schema tests with too many arguments. In a world where generic tests produce really good descriptions (#3249) and failure messages, the node name / unique_id feels less important: But we better do a good job of printing those descriptions out to stdout / artifacts!
Potential solutions
Components:
not_null
,unique
)model
column_name
(if available)values
foraccepted_values
,to
+field
forrelationships
unique_id
of that test node. That has the effect of making it seem, to dbt, that you've removed and added a genuinely different test. (I do think that's the effect of changing thevalues
inaccepted_values
, however.) If this distinction gives us too much trouble, however, we should throw it away and figure out something simpler.Options:
_
), such as.
or-
. Unfortunately, those characters are also unwieldy in python. How wild would it be to use URL encoding for this?not_null__base__extension_id__0a9c45d3f7560da45492b85afa1f41b5
, i.e. just enough to be readable plusmd5(concat('not_null','-','model=model.my_package.base','-','column_name=extension_id'))
The text was updated successfully, but these errors were encountered: