-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add support for datashared objects and external tables for redshift l… #399
Conversation
…ist_relation_without_caching
@dbt-labs/core-adapters -- I already made an internal Slack thread, but, to reiterate, I'd love some help getting @jiezhen-chen started writing test cases for these changes. Is there an exemplar of something similar in another adapter repo? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here's some comments just about the SQL. separately we talked about the need for the integrations tests which we'll tackle in the coming week or two
{% macro redshift__list_relations_without_caching(schema_relation) %} | ||
{{ return(postgres__list_relations_without_caching(schema_relation)) }} | ||
{% call statement('list_relations_without_caching', fetch_result=True) -%} | ||
select | ||
database_name as database, | ||
table_name as name, | ||
schema_name as schema, | ||
'table' as type | ||
from SVV_REDSHIFT_TABLES | ||
where schema_name ilike '{{ schema_relation.schema }}' | ||
and database_name ilike '{{ schema_relation.database }}' | ||
and table_type = 'TABLE' | ||
union all | ||
select | ||
database_name as database, | ||
table_name as name, | ||
schema_name as schema, | ||
'view' as type | ||
from SVV_REDSHIFT_TABLES | ||
where schema_name ilike '{{ schema_relation.schema }}' | ||
and database_name ilike '{{ schema_relation.database }}' | ||
and table_type = 'VIEW' | ||
union all | ||
select | ||
'{{ schema_relation.database }}' as database, | ||
tablename as name, | ||
schemaname as schema, | ||
'external_tables' as type | ||
from SVV_EXTERNAL_TABLES | ||
where schemaname ilike '{{ schema_relation.schema }}' | ||
{% endcall %} | ||
{{ return(load_result('list_relations_without_caching').table) }} | ||
{% endmacro %} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiezhen-chen is it possible to have SVV_EXTERNAL_TABLES be an opt-in behavior rather than the default? I ask because based on #179 (comment) it looks as if SVV_REDSHIFT_TABLES
is ~50X slower than pg_tables
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you mean by having SVV_EXTERNAL_TABLES as an opt-in behavior? like adding a parameter for users to choose whether to select external_tables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did some investigation myself and timed the run time of these queries:
This query had the run time of 00 m 02 s
select
table_catalog as database,
table_name as name,
table_schema as schema,
table_type as type
from SVV_TABLES
where table_catalog ilike (mydatabasename)
and table_schema ilike (myschemaname)
The query below had the run time of 00 m 01 s
select
database_name as database,
table_name as name,
schema_name as schema
from SVV_redshift_tables
where database_name ilike (mydatabasename)
and schema_name ilike (myschemaname)
Not sure what causes the discrepancy between the runtimes of these queries and the runtime difference mentioned in #179 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jiezhen-chen what about when you use pg_tables
instead of SVV_TABLES
or SVV_REDSHIFT_TABLES
?
data_share as ( | ||
select | ||
a.ordinal_position as columnnum, | ||
a.schema_name as schemaname, | ||
a.column_name as columnname, | ||
case | ||
when a.data_type ilike 'character varying%' then | ||
'character varying' | ||
when a.data_type ilike 'numeric%' then 'numeric' | ||
else a.data_type | ||
end as col_type, | ||
case | ||
when a.data_type like 'character%' | ||
then nullif(REGEXP_SUBSTR(a.character_maximum_length, '[0-9]+'), '')::int | ||
else null | ||
end as character_maximum_length, | ||
case | ||
when a.data_type like 'numeric%' or a.data_type ilike 'integer%' | ||
then nullif( | ||
SPLIT_PART(REGEXP_SUBSTR(a.numeric_precision, '[0-9,]+'), ',', 1), | ||
'')::int | ||
end as numeric_precision, | ||
case | ||
when a.data_type like 'numeric%' or a.data_type ilike 'integer%' | ||
then nullif( | ||
SPLIT_PART(REGEXP_SUBSTR(a.numeric_scale, '[0-9,]+'), ',', 2), | ||
'')::int | ||
else null | ||
end as numeric_scale | ||
from svv_all_columns a | ||
inner join ( | ||
select object_name | ||
from svv_datashare_objects | ||
where share_type = 'INBOUND' | ||
and object_type in ('table', 'view', 'materialized view', 'late binding view') | ||
and object_name = '{{ relation.schema }}' || '.' || '{{ relation.identifier }}' | ||
) b on a.schema_name || '.' || a.table_name = b.object_name | ||
inner join ( | ||
select consumer_database from SVV_DATASHARES | ||
where share_type = 'INBOUND' | ||
) c on c.consumer_database = a.database_name | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at first glance this looks very similar to the SELECT query for external_views
and was going to suggest having the boilerplate pulled out into it's own macro, so that the difference between external_views
and datashare
is more clearly evident.
however looking at the diff now, I'm seeing a lot more changes... maybe even too many 🙃 . i'll open some more comments
…ft into fix_get_relation
…datashared objects of a connected database
i agree that:
my concern is what I mentioned in #399 (comment)
This change will effect all users of dbt-redshift. My guess is that:
This is why I'm suggesting an opt-in config that a user could put into their vars:
list_external_relations: true in this way, users who want this feature can opt-in, but for others it will not matter. |
Change
redshift__list_relation_without_caching
andredshift__get_columns_in_relation
to solve the following issues:adapter.get_relation
for relations in other databases [RA3] #179redshift__list_relation_without_caching
now queries fromsvv
tables and returns cross-database objects, datashared objects, and external tables.redshift__get_columns_in_relation
now includesdatashared
objects.Description
Macros changed:
redshift__list_relation_without_caching
to query from svv tables instead of leveraging postgres macro. This enables cross-database querying to retrieve relations of an unconnected database. It also retrieves relations of datashared objects and external tables.redshift__get_columns_in_relation
to include columns in a datashared relationChecklist
changie new
to create a changelog entry