Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for datashared objects and external tables for redshift l… #399

Closed
wants to merge 10 commits into from

Conversation

jiezhen-chen
Copy link
Contributor

@jiezhen-chen jiezhen-chen commented Apr 7, 2023

Change redshift__list_relation_without_caching and redshift__get_columns_in_relation to solve the following issues:

redshift__list_relation_without_caching now queries from svv tables and returns cross-database objects, datashared objects, and external tables.

redshift__get_columns_in_relation now includes datashared objects.

Description

Macros changed:

  • Changed redshift__list_relation_without_caching to query from svv tables instead of leveraging postgres macro. This enables cross-database querying to retrieve relations of an unconnected database. It also retrieves relations of datashared objects and external tables.
  • Changed redshift__get_columns_in_relation to include columns in a datashared relation

Checklist

@jiezhen-chen jiezhen-chen requested a review from a team as a code owner April 7, 2023 17:31
@jiezhen-chen jiezhen-chen requested a review from VersusFacit April 7, 2023 17:31
@cla-bot cla-bot bot added the cla:yes label Apr 7, 2023
@dataders dataders added ok_to_test ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering ok to test labels Apr 12, 2023
@dataders
Copy link
Contributor

@dbt-labs/core-adapters -- I already made an internal Slack thread, but, to reiterate, I'd love some help getting @jiezhen-chen started writing test cases for these changes. Is there an exemplar of something similar in another adapter repo?

Copy link
Contributor

@dataders dataders left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here's some comments just about the SQL. separately we talked about the need for the integrations tests which we'll tackle in the coming week or two

Comment on lines 272 to 303
{% macro redshift__list_relations_without_caching(schema_relation) %}
{{ return(postgres__list_relations_without_caching(schema_relation)) }}
{% call statement('list_relations_without_caching', fetch_result=True) -%}
select
database_name as database,
table_name as name,
schema_name as schema,
'table' as type
from SVV_REDSHIFT_TABLES
where schema_name ilike '{{ schema_relation.schema }}'
and database_name ilike '{{ schema_relation.database }}'
and table_type = 'TABLE'
union all
select
database_name as database,
table_name as name,
schema_name as schema,
'view' as type
from SVV_REDSHIFT_TABLES
where schema_name ilike '{{ schema_relation.schema }}'
and database_name ilike '{{ schema_relation.database }}'
and table_type = 'VIEW'
union all
select
'{{ schema_relation.database }}' as database,
tablename as name,
schemaname as schema,
'external_tables' as type
from SVV_EXTERNAL_TABLES
where schemaname ilike '{{ schema_relation.schema }}'
{% endcall %}
{{ return(load_result('list_relations_without_caching').table) }}
{% endmacro %}
Copy link
Contributor

@dataders dataders Apr 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiezhen-chen is it possible to have SVV_EXTERNAL_TABLES be an opt-in behavior rather than the default? I ask because based on #179 (comment) it looks as if SVV_REDSHIFT_TABLES is ~50X slower than pg_tables.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean by having SVV_EXTERNAL_TABLES as an opt-in behavior? like adding a parameter for users to choose whether to select external_tables?

Copy link
Contributor Author

@jiezhen-chen jiezhen-chen Apr 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did some investigation myself and timed the run time of these queries:

This query had the run time of 00 m 02 s

select
table_catalog as database,
table_name as name,
table_schema as schema,
table_type as type
from SVV_TABLES
where table_catalog ilike (mydatabasename)
and table_schema ilike (myschemaname)

The query below had the run time of 00 m 01 s

select
database_name as database,
table_name as name,
schema_name as schema
from SVV_redshift_tables
where database_name ilike (mydatabasename)
and schema_name ilike (myschemaname)

Not sure what causes the discrepancy between the runtimes of these queries and the runtime difference mentioned in #179 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jiezhen-chen what about when you use pg_tables instead of SVV_TABLES or SVV_REDSHIFT_TABLES?

dbt/include/redshift/macros/adapters.sql Outdated Show resolved Hide resolved
Comment on lines 201 to 242
data_share as (
select
a.ordinal_position as columnnum,
a.schema_name as schemaname,
a.column_name as columnname,
case
when a.data_type ilike 'character varying%' then
'character varying'
when a.data_type ilike 'numeric%' then 'numeric'
else a.data_type
end as col_type,
case
when a.data_type like 'character%'
then nullif(REGEXP_SUBSTR(a.character_maximum_length, '[0-9]+'), '')::int
else null
end as character_maximum_length,
case
when a.data_type like 'numeric%' or a.data_type ilike 'integer%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(a.numeric_precision, '[0-9,]+'), ',', 1),
'')::int
end as numeric_precision,
case
when a.data_type like 'numeric%' or a.data_type ilike 'integer%'
then nullif(
SPLIT_PART(REGEXP_SUBSTR(a.numeric_scale, '[0-9,]+'), ',', 2),
'')::int
else null
end as numeric_scale
from svv_all_columns a
inner join (
select object_name
from svv_datashare_objects
where share_type = 'INBOUND'
and object_type in ('table', 'view', 'materialized view', 'late binding view')
and object_name = '{{ relation.schema }}' || '.' || '{{ relation.identifier }}'
) b on a.schema_name || '.' || a.table_name = b.object_name
inner join (
select consumer_database from SVV_DATASHARES
where share_type = 'INBOUND'
) c on c.consumer_database = a.database_name
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at first glance this looks very similar to the SELECT query for external_views and was going to suggest having the boilerplate pulled out into it's own macro, so that the difference between external_views and datashare is more clearly evident.

however looking at the diff now, I'm seeing a lot more changes... maybe even too many 🙃 . i'll open some more comments

image

dbt/include/redshift/macros/adapters.sql Outdated Show resolved Hide resolved
@dataders
Copy link
Contributor

i agree that:

  1. this change will now get all the external tables that weren't previously returned, and
  2. this meets many users' requests.

my concern is what I mentioned in #399 (comment)

is it possible to have the use of SVV_EXTERNAL_TABLES be an opt-in behavior rather than the default? I ask because based on #179 (comment) it looks as if SVV_REDSHIFT_TABLES is ~50X slower than pg_tables.

This change will effect all users of dbt-redshift. My guess is that:

  1. there is a non-insignificant amount of users who do not use external tables at all, and
  2. for that set of users, this change will result in slower performance with no upside for them.

This is why I'm suggesting an opt-in config that a user could put into their dbt_project.yml. Something like

vars:
  list_external_relations: true

in this way, users who want this feature can opt-in, but for others it will not matter.

@jiezhen-chen jiezhen-chen marked this pull request as draft April 24, 2023 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla:yes ok to test ready_for_review Externally contributed PR has functional approval, ready for code review from Core engineering
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants