Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt 0.21.0-b2: test aliases don't play well with anchored yaml #3815

Closed
joellabes opened this issue Aug 26, 2021 · 4 comments
Closed

dbt 0.21.0-b2: test aliases don't play well with anchored yaml #3815

joellabes opened this issue Aug 26, 2021 · 4 comments
Labels
dbt tests Issues related to built-in dbt testing functionality stale Issues that have gone stale

Comments

@joellabes
Copy link
Contributor

joellabes commented Aug 26, 2021

At #3626 (comment), @jtcohen6 suggested:

As it happens, thanks to the work in #3616, v0.21 will actually include support for the "long-term option" I described above: defining a custom alias under each specific test, which would enable you to avoid collisions with --store-failures. What do you think about that as a workable resolution here?

I gave it a go, and realised that hardcoded aliases collide with anchored YAML such as this:

  - name: hubspot_revenue__revenue_lines
    columns:
      - name: total_unweighted
        tests: &revenue_range_tests
          - not_null
          - accepted_range:
              alias: hubspot_revenue_test__total_unweighted_not_below_0
              where: "is_resub_adjustment is false"
              min_value: 0
              tags: 
                - hubspot_bad_revenue
          - accepted_range:
              alias: hubspot_revenue_test__total_unweighted_resub_adjustments_not_above_0
              where: "is_resub_adjustment is true"
              max_value: 0
              tags: 
                - hubspot_bad_revenue
      - name: resub_unweighted
        tests: *revenue_range_tests
      - name: resub_weighted
        tests: *revenue_range_tests
      - name: sale_unweighted
        tests: *revenue_range_tests
      - name: sale_weighted
        tests: *revenue_range_tests

Which gives the unsurprising error:

Compilation Error
  dbt found two resources with the database representation ""reporting"."dev_jlabes_dbt_test__audit"."hubspot_revenue_test__total_unweighted_not_below_0"".
  dbt cannot create two resources with identical database representations. To fix this,
  change the configuration of one of these resources:
  - test.educationperfect.accepted_range_hubspot_revenue__revenue_lines_total_unweighted__0.7a4a0581e3 (models\marts\revenue\hubspot_revenue\hubspot_revenue.yml)
  - test.educationperfect.accepted_range_hubspot_revenue__revenue_lines_resub_unweighted__0.9ac72c83e9 (models\marts\revenue\hubspot_revenue\hubspot_revenue.yml)

Knowing as I do that anchors are pretty inflexible, I don't think I have much choice other than to swap the templated code out for a bunch more copy-pasted stuff, right?

@jtcohen6
Copy link
Contributor

jtcohen6 commented Aug 26, 2021

Oy! Thanks for opening @joellabes. I wouldn't consider this a true bug / blocke. At the same time, I totally appreciate the desire to keep using YAML anchors and have dynamic, sensible, non-colliding aliases.

The really cool answer probably looks like supporting model and column_name in the test config rendering context. Very similar to our ideas for #3249! Then, you could do something like:

- name: hubspot_revenue__revenue_lines
    columns:
      - name: total_unweighted
        tests: &revenue_range_tests
          - not_null
          - accepted_range:
              alias: "{{ model }}__{{ column_name }}_not_below_0"
              where: "is_resub_adjustment is false"
              min_value: 0
              tags: 
                - hubspot_bad_revenue
          - accepted_range:
              alias: "{{ model }}__{{ column_name }}_resub_adjustments_not_above_0"
              where: "is_resub_adjustment is true"
              max_value: 0
              tags: 
                - hubspot_bad_revenue
      - name: resub_unweighted
        tests: *revenue_range_tests
      - name: resub_weighted
        tests: *revenue_range_tests
      - name: sale_unweighted
        tests: *revenue_range_tests
      - name: sale_weighted
        tests: *revenue_range_tests

@gshank had the idea of creating a special rendering context to support this. I think it's a pretty good idea, and I'd like to revisit it around the time that we add support for generic test descriptions, too.

In the meantime, I'm trying to think if there are potential workarounds that allow you to keep your anchors:

  1. In the generic test definition, set the default alias config, since model + column_name are available in this context, if you can come up with a general rule you're happy with. Then, remove the alias config from the specific tests.
{% test test_accepted_range(model, column_name, min_value=none, max_value=none, inclusive=true) %}

  {% set min_qualifier = 'not_below_' + min_value if min_value else '' %}
  {% set max_qualifier = 'not_above_' + max_value if max_value else '' %}
  {% set my_alias_rule = model + '__' + column_name + '_' + min_qualifier + max_qualifier %}

{{ config(alias = my_alias_rule) }}

... rest of test definition ...

{% endtest %}

This almost works... except that, in an attempt to be helpful, we're checking the length of each specific test's alias, and overriding it with the shorter hash if it's too long, before the generic test definition's alias has a chance to take effect:

https://github.com/dbt-labs/dbt/blob/650b34ae246b4c60cde395fcb123418c7807f9aa/core/dbt/parser/schema_test_builders.py#L253-L255

Let's think about whether there's a different place we can do this, so that an alias set in the generic test definition has a fighting chance.

  1. Alternatively, do something a bit hacky in the generate_alias_name macro to define a rule for tests that prepends your custom alias with the column name. (Unfortunately, grabbing the model name is much harder than it should be, because of how generic tests are templated, and how we ret-conned the where config.)
{% macro generate_alias_name(custom_alias_name=none, node=none) -%}

    {%- if custom_alias_name is none -%}

        {{ node.name }}

    {%- elif node.resource_type == 'test' -%}
    {# for tests only, prepend the custom alias with column_name #}
    
       {% set column_name = node.test_metadata.kwargs.column_name %}
       {{ column_name }}__{{ custom_alias_name | trim }}
    
    {%- else -%}

        {{ custom_alias_name | trim }}

    {%- endif -%}

{%- endmacro %}

This one actually works today, it just feels pretty bad.

@jtcohen6 jtcohen6 added dbt tests Issues related to built-in dbt testing functionality and removed triage labels Aug 26, 2021
@joellabes
Copy link
Contributor Author

The really cool answer probably looks like supporting model and column_name in the test config rendering context. Very similar to our ideas for #3249!

Yes this sounds great! Happy to settle for non-anchor life and wait for the right-er approach, I'm pretty conservative on rolling out hacky workarounds 💤 Appreciate your help coming up with options though!

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label May 30, 2022
@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2022

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest; add a comment to notify the maintainers.

@github-actions github-actions bot closed this as completed Jun 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dbt tests Issues related to built-in dbt testing functionality stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

2 participants