Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] postgres get_columns_in_relation does not include MVs #9419

Open
2 tasks done
dave-connors-3 opened this issue Jan 22, 2024 · 3 comments · May be fixed by #9433
Open
2 tasks done

[Bug] postgres get_columns_in_relation does not include MVs #9419

dave-connors-3 opened this issue Jan 22, 2024 · 3 comments · May be fixed by #9433
Labels
bug Something isn't working materialized_views Team:Adapters Issues designated for the adapter area of the code

Comments

@dave-connors-3
Copy link
Contributor

dave-connors-3 commented Jan 22, 2024

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

calling adapter.get_columns_in_relation on a materialized view returns 0 columns.

Expected Behavior

calling adapter.get_columns_in_relation on a materialized view returns all the columns in the materialized view.

Steps To Reproduce

Create a Materialized view

{{ config(
    materialized="materialized_view"
) }}

(SELECT
    1 as int_col,
    'one' as string_col,
    FALSE as bool_col)
UNION
(SELECT
    2 as int_col,
    'two' as string_col,
    TRUE as bool_col)
UNION
(SELECT
    3 as int_col,
    'three' as string_col,
    FALSE as bool_col)

Build the Model
dbt build -s mv_gene_yaml_test

Run the macro
{{ codegen.generate_model_yaml(model_names=['mv_gene_yaml_test']) }}
or
dbt run-operation codegen.generate_model_yaml --args '{"model_names": ["mv_gene_yaml_test"]}'

(this is copied from a dbt-codegen issue. This macro uses adapter.get_columns_in_relation under the hood. You could update the last step to call the macro directly.

per @drewblinn in that issue:

The macro queries information_schema.columns which does not store data about materialized views. Column information for MVs can be found in pg_catalog.pg_attribute

Relevant log output

No response

Environment

- OS:
- Python:
- dbt: 1.7.4

Which database adapter are you using with dbt?

postgres

Additional Context

No response

@dbeatty10
Copy link
Contributor

Thanks for opening this @dave-connors-3 🦞

TLDR

After doing a little bit of digging, I think this is actually a bug (or short-sighted implementation) within the information_schema.columns view in postgres.

Details

dbt-postgres uses that information_schema.columns view here.

It's a simple query to look at the exact definition that postgres is using for information_schema.columns:

select definition 
from pg_catalog.pg_views 
where schemaname = 'information_schema' 
  and viewname = 'columns'

It's a gnarly definition, but if you examine the where clause you'll see something like this:

c.relkind = ANY (ARRAY[
                       'r'::"char",
                       'v'::"char",
                       'f'::"char",
                       'p'::"char"])

Comparing it with here, do you notice anything missing?

It's got these covered:

o[r]dinary table, [v]iew, [f]oreign table, [p]artitioned table

But it's missing this one:

[m]aterialized view

So all that big ol' definition needs is a little 'm'::"char" added in just the right spot and information_schema.columns can be all gucci again.

Workaround

We'll want to be able to handle this somehow since the maintainers of postgres might not be interested in making this change and it will take a while to roll out even if they are.

To make get_columns_in_relation work with MVs, all I needed to do is add this macro to my project:

macros/get_columns_in_relation.sql

{% macro postgres__get_columns_in_relation(relation) -%}
  {% call statement('get_columns_in_relation', fetch_result=True) %}

      with information_schema__columns as (

SELECT (current_database())::information_schema.sql_identifier AS table_catalog,
       (nc.nspname)::information_schema.sql_identifier AS table_schema,
       (c.relname)::information_schema.sql_identifier AS TABLE_NAME,
       (a.attname)::information_schema.sql_identifier AS COLUMN_NAME,
       (a.attnum)::information_schema.cardinal_number AS ordinal_position,
       (CASE
            WHEN (a.attgenerated = ''::"char") THEN pg_get_expr(ad.adbin, ad.adrelid)
            ELSE NULL::text
        END)::information_schema.character_data AS column_default,
       (CASE
            WHEN (a.attnotnull
                  OR ((t.typtype = 'd'::"char")
                      AND t.typnotnull)) THEN 'NO'::text
            ELSE 'YES'::text
        END)::information_schema.yes_or_no AS is_nullable,
       (CASE
            WHEN (t.typtype = 'd'::"char") THEN CASE
                                                    WHEN ((bt.typelem <> (0)::oid)
                                                          AND (bt.typlen = '-1'::integer)) THEN 'ARRAY'::text
                                                    WHEN (nbt.nspname = 'pg_catalog'::name) THEN format_type(t.typbasetype, NULL::integer)
                                                    ELSE 'USER-DEFINED'::text
                                                END
            ELSE CASE
                     WHEN ((t.typelem <> (0)::oid)
                           AND (t.typlen = '-1'::integer)) THEN 'ARRAY'::text
                     WHEN (nt.nspname = 'pg_catalog'::name) THEN format_type(a.atttypid, NULL::integer)
                     ELSE 'USER-DEFINED'::text
                 END
        END)::information_schema.character_data AS data_type,
       (information_schema._pg_char_max_length(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*)))::information_schema.cardinal_number AS character_maximum_length,
       (information_schema._pg_char_octet_length(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*)))::information_schema.cardinal_number AS character_octet_length,
       (information_schema._pg_numeric_precision(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*)))::information_schema.cardinal_number AS numeric_precision,
       (information_schema._pg_numeric_precision_radix(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*)))::information_schema.cardinal_number AS numeric_precision_radix,
       (information_schema._pg_numeric_scale(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*)))::information_schema.cardinal_number AS numeric_scale,
       (information_schema._pg_datetime_precision(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*)))::information_schema.cardinal_number AS datetime_precision,
       (information_schema._pg_interval_type(information_schema._pg_truetypid(a.*, t.*), information_schema._pg_truetypmod(a.*, t.*)))::information_schema.character_data AS interval_type,
       (NULL::integer)::information_schema.cardinal_number AS interval_precision,
       (NULL::name)::information_schema.sql_identifier AS
CHARACTER_SET_CATALOG,
       (NULL::name)::information_schema.sql_identifier AS
CHARACTER_SET_SCHEMA,
       (NULL::name)::information_schema.sql_identifier AS
CHARACTER_SET_NAME,
       (CASE
            WHEN (nco.nspname IS NOT NULL) THEN current_database()
            ELSE NULL::name
        END)::information_schema.sql_identifier AS COLLATION_CATALOG,
       (nco.nspname)::information_schema.sql_identifier AS COLLATION_SCHEMA,
       (co.collname)::information_schema.sql_identifier AS COLLATION_NAME,
       (CASE
            WHEN (t.typtype = 'd'::"char") THEN current_database()
            ELSE NULL::name
        END)::information_schema.sql_identifier AS domain_catalog,
       (CASE
            WHEN (t.typtype = 'd'::"char") THEN nt.nspname
            ELSE NULL::name
        END)::information_schema.sql_identifier AS domain_schema,
       (CASE
            WHEN (t.typtype = 'd'::"char") THEN t.typname
            ELSE NULL::name
        END)::information_schema.sql_identifier AS domain_name,
       (current_database())::information_schema.sql_identifier AS udt_catalog,
       (COALESCE(nbt.nspname, nt.nspname))::information_schema.sql_identifier AS udt_schema,
       (COALESCE(bt.typname, t.typname))::information_schema.sql_identifier AS udt_name,
       (NULL::name)::information_schema.sql_identifier AS scope_catalog,
       (NULL::name)::information_schema.sql_identifier AS scope_schema,
       (NULL::name)::information_schema.sql_identifier AS scope_name,
       (NULL::integer)::information_schema.cardinal_number AS maximum_cardinality,
       (a.attnum)::information_schema.sql_identifier AS dtd_identifier,
       ('NO'::CHARACTER varying)::information_schema.yes_or_no AS is_self_referencing,
       (CASE
            WHEN (a.attidentity = ANY (ARRAY['a'::"char",
                                             'd'::"char"])) THEN 'YES'::text
            ELSE 'NO'::text
        END)::information_schema.yes_or_no AS is_identity,
       (CASE a.attidentity
            WHEN 'a'::"char" THEN 'ALWAYS'::text
            WHEN 'd'::"char" THEN 'BY DEFAULT'::text
            ELSE NULL::text
        END)::information_schema.character_data AS identity_generation,
       (seq.seqstart)::information_schema.character_data AS identity_start,
       (seq.seqincrement)::information_schema.character_data AS identity_increment,
       (seq.seqmax)::information_schema.character_data AS identity_maximum,
       (seq.seqmin)::information_schema.character_data AS identity_minimum,
       (CASE
            WHEN seq.seqcycle THEN 'YES'::text
            ELSE 'NO'::text
        END)::information_schema.yes_or_no AS identity_cycle,
       (CASE
            WHEN (a.attgenerated <> ''::"char") THEN 'ALWAYS'::text
            ELSE 'NEVER'::text
        END)::information_schema.character_data AS is_generated,
       (CASE
            WHEN (a.attgenerated <> ''::"char") THEN pg_get_expr(ad.adbin, ad.adrelid)
            ELSE NULL::text
        END)::information_schema.character_data AS generation_expression,
       (CASE
            WHEN ((c.relkind = ANY (ARRAY['r'::"char",
                                          'p'::"char"]))
                  OR ((c.relkind = ANY (ARRAY['v'::"char",
                                              'f'::"char"]))
                      AND pg_column_is_updatable((c.oid)::regclass, a.attnum, FALSE))) THEN 'YES'::text
            ELSE 'NO'::text
        END)::information_schema.yes_or_no AS is_updatable

FROM ((((((pg_attribute a
           LEFT JOIN pg_attrdef ad ON (((a.attrelid = ad.adrelid)
                                        AND (a.attnum = ad.adnum))))
          JOIN (pg_class c
                JOIN pg_namespace nc ON ((c.relnamespace = nc.oid))) ON ((a.attrelid = c.oid)))
         JOIN (pg_type t
               JOIN pg_namespace nt ON ((t.typnamespace = nt.oid))) ON ((a.atttypid = t.oid)))
        LEFT JOIN (pg_type bt
                   JOIN pg_namespace nbt ON ((bt.typnamespace = nbt.oid))) ON (((t.typtype = 'd'::"char")
                                                                                AND (t.typbasetype = bt.oid))))
       LEFT JOIN (pg_collation co
                  JOIN pg_namespace nco ON ((co.collnamespace = nco.oid))) ON (((a.attcollation = co.oid)
                                                                                AND ((nco.nspname <> 'pg_catalog'::name)
                                                                                     OR (co.collname <> 'default'::name)))))
      LEFT JOIN (pg_depend dep
                 JOIN pg_sequence seq ON (((dep.classid = ('pg_class'::regclass)::oid)
                                           AND (dep.objid = seq.seqrelid)
                                           AND (dep.deptype = 'i'::"char")))) ON (((dep.refclassid = ('pg_class'::regclass)::oid)
                                                                                   AND (dep.refobjid = c.oid)
                                                                                   AND (dep.refobjsubid = a.attnum))))
WHERE ((NOT pg_is_other_temp_schema(nc.oid))
       AND (a.attnum > 0)
       AND (NOT a.attisdropped)
       AND (c.relkind = ANY (ARRAY['r'::"char",
                                   'v'::"char",
                                   'f'::"char",
                                   'm'::"char",
                                   'p'::"char"]))
       AND (pg_has_role(c.relowner, 'USAGE'::text)
            OR has_column_privilege(c.oid, a.attnum, 'SELECT, INSERT, UPDATE, REFERENCES'::text)))

      )

      select
          column_name,
          data_type,
          character_maximum_length,
          numeric_precision,
          numeric_scale

      from information_schema__columns
      where table_name = '{{ relation.identifier }}'
        {% if relation.schema %}
        and table_schema = '{{ relation.schema }}'
        {% endif %}
      order by ordinal_position

  {% endcall %}
  {% set table = load_result('get_columns_in_relation').table %}
  {{ return(sql_convert_columns_in_relation(table)) }}
{% endmacro %}

@dbeatty10 dbeatty10 removed the triage label Jan 23, 2024
@dbeatty10 dbeatty10 removed their assignment Jan 23, 2024
@dbeatty10 dbeatty10 added Team:Adapters Issues designated for the adapter area of the code materialized_views labels Jan 23, 2024
@dbeatty10
Copy link
Contributor

Reprex

models/my_mv.sql

{{ config(materialized="materialized_view") }}

select
    1 as int_col,
    'one' as string_col,
    FALSE as bool_col

analyses/get_columns_in_relation.sql

{%- set relation = ref("my_mv") -%}
{%- set columns = adapter.get_columns_in_relation(relation) -%}

{% if columns | length == 0 -%}
  {{ "No columns found in " ~ relation }}
{% endif %}

{% for column in columns -%}
  {{ "Column: " ~ column }}
{% endfor %}

Then run this command:

dbt compile --select analyses/get_columns_in_relation.sql

The output _should be:

Column: <Column int_col (integer)>
Column: <Column string_col (text)>
Column: <Column bool_col (boolean)>

But without the workaround above, you'll get this instead:

No columns found in "postgres"."your_schema_here"."my_mv"

@mateusz
Copy link

mateusz commented Nov 26, 2024

Breaks Elementary data tests such as column_anomalies. I've filed it here elementary-data/elementary#1684 (they tracked it down to this exact issue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working materialized_views Team:Adapters Issues designated for the adapter area of the code
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants