Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Postgres 15+ unique index NULLS NOT DISTINCT #9472

Open
3 tasks done
FrankTub opened this issue Jan 29, 2024 · 4 comments
Open
3 tasks done

[Feature] Postgres 15+ unique index NULLS NOT DISTINCT #9472

FrankTub opened this issue Jan 29, 2024 · 4 comments
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors

Comments

@FrankTub
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

On some of the models we have we have created an unique index. By default the creation of an unique index (on multiple columns) on Postgres was done with ignoring NULL values as unique values. See this for more info.

Since Postgres 15 it is an option to create an index with the option NULLS NOT DISTINCT. This has benefits for query plans in Postgres since with the current setup of an index in Postgres it does not know for sure if a combination of columns is truly unique. In the Postgres docs you will also find:

NULLS NOT DISTINCT
Specifies whether for a unique index, null values should be considered distinct (not equal). The default is that they are > distinct, so that a unique index could contain multiple null values in a column.

Current situation

So currently I have a model with something like:

{{ config(
    indexes=[
      {'columns': ['date_actual', 'customer_no'], 'unique': True},
    ]
) }}

This generate the following SQL:

    create unique index if not exists
  "0dbafe9be87c73a825dcef744a055fd8"
  on "staging"."transformations"."daily_customer_balances" 
  (date_actual, customer_no);

Requested new situation

After my proposal I would like to have the option to set a flag at creation of the index, like something below:

{{ config(
    indexes=[
      {'columns': ['date_actual', 'customer_no'], 'unique': True, 'nulls_not_distinct': True},
    ]
) }}

And this should generate the following SQL:

    create unique index if not exists
  "0dbafe9be87c73a825dcef744a055fd8"
  on "staging"."transformations"."daily_customer_balances" 
  (date_actual, customer_no) nulls not distinct;

It would be perfectly acceptable to leave the default option of nulls_not_distinct to False.

Describe alternatives you've considered

There are two ways to do this anyways, but I think the proposed solution is more neat for my use case.

  1. Create a post-hook on your dbt model and create the index yourself writing plain SQL.
  2. Enforce a dbt model contract

Post-hook

This defies the purpose of the config index block in my opinion.

Model contract

This enforces a user to force all sorts of other constraints the data should look like that is not relevant to my use case.

The whole purpose of this request is to enhance downstream query plans and thereby improving performance of your transformation pipeline.

Who will this benefit?

Only users that use the dbt-postgres adapter that use Postgres 15+.

Are you interested in contributing this feature?

Yes, but really not sure where to start or how to do it.

Anything else?

No response

@FrankTub FrankTub added enhancement New feature or request triage labels Jan 29, 2024
@dbeatty10
Copy link
Contributor

Thanks for raising this @FrankTub !

This wouldn't be a high priority for us to implement, so I'm going to label it as "help wanted".

Your proposed syntax looks good to me 👍:

{{ config(
    indexes=[
      {'columns': ['date_actual', 'customer_no'], 'unique': True, 'nulls_not_distinct': True},
    ]
) }}

Implementation hints

Some of the relevant code is here.

To hard-code your project to always use nulls not distinct for unique indexes, you could add something like this to your project (untested!):

{% macro postgres__get_create_index_sql(relation, index_dict) -%}
  {%- set index_config = adapter.parse_index(index_dict) -%}
  {%- set comma_separated_columns = ", ".join(index_config.columns) -%}
  {%- set index_name = index_config.render(relation) -%}

  create {% if index_config.unique -%}
    unique
  {%- endif %} index if not exists
  "{{ index_name }}"
  on {{ relation }} {% if index_config.type -%}
    using {{ index_config.type }}
  {%- endif %}
  ({{ comma_separated_columns }}){% if index_config.unique -%} nulls not distinct{%- endif %};
{%- endmacro %}

Alternatively, to make it configurable, you could try changing {% if index_config.unique -%} nulls not distinct{%- endif %} to be {% if index_config.nulls_not_distinct -%} nulls not distinct{%- endif %} instead. I'm not sure how strict the config validation is here (via parse_index), so that may not work as-is and need further code changes.

Side note: While not directly related to your feature proposal, #6997 also involves SQL's three-valued logic.

@dbeatty10 dbeatty10 added help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors and removed triage labels Jan 29, 2024
@FrankTub
Copy link
Author

Thanks for such a quick response @dbeatty10 ! Totally agree that this is not a high prio case. Will definitely test out to make it work in our project.

Do you see any value in adding it as I proposed? If so I will open up a PR, otherwise just keep it in my own project.

@dbeatty10
Copy link
Contributor

Do you see any value in adding it as I proposed? If so I will open up a PR

Your proposed syntax looks good to me, and you can open a PR 👍

We might not be able to review it right away due to priority level though.

If you can get it to work with a simple macro override (and share the details), at least it will give other people code to use in the meantime.

@dbeatty10
Copy link
Contributor

Something to note: our integration tests don't currently use Postgres 15. So #8273 would need to be implemented in order to actually add it to the tests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help_wanted Trickier changes, with a clear starting point, good for previous/experienced contributors
Projects
None yet
Development

No branches or pull requests

2 participants