-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-1122] [CT-1116] [Bug] adapter.get_relation
for relations in other databases [RA3]
#179
Comments
adapter.get_relation
ignores database
argumentadapter.get_relation
ignores database
argument
Okay, I get some more info. I have added
and then I got:
But now, what's interesting - the During the first visit, I got:
which is correct. But - during the second visit (so the one which started just after
so the cause is hidden somewhere here. |
Looking at docs that behaviour actually makes sense, because it's related to the difference between the parse phase and the execution phase: |
@jaklan Thanks for the thorough writeup, and for detailing your follow-up investigation! An important piece of this puzzle: This is a cross-database query on Redshift. I'm assuming that you're using RA3 nodes. I have a sense of what's going on here. Could you check the logs that are run during the second visit (execution time)? I would expect dbt to be running a query along these lines, since you're asking it to perform a lookup of select
'integrated' as database,
tablename as name,
schemaname as schema,
'table' as type
from pg_tables
where schemaname ilike 'some_schema'
union all
select
'integrated' as database,
viewname as name,
schemaname as schema,
'view' as type
from pg_views
where schemaname ilike 'some_schema' See the issue? It's trying to access information about
Proposed resolutionIn cases where we're trying to access metadata on relations outside the current database, we need I don't have a good sense of whether accessing that view is faster or slower than the I'm going to mark this one a Logistical note: We don't currently have RA3 nodes running in CI, since they're still much more expensive at the lowest end than 2nd-gen node types, but we should take another look at finally setting some up to test this sort of functionality. |
adapter.get_relation
ignores database
argumentadapter.get_relation
for relations in other databases [RA3]
adapter.get_relation
for relations in other databases [RA3]adapter.get_relation
for relations in other databases [RA3]
Hello @jtcohen6, thanks for the research and the detailed explanation! It definitely seems to be the direct reason and yes, we use Such usage of Regarding the implementation dilemmas - is there at least chance you decide on some recommended approach internally, so it would be more clear from the community perspective what MR would be accepted? |
A warning is definitely better than the status quo — though I think the actual resolution to this bug may be quick here as well. My instinct here is that
So I think the right move will be to reimplement {% macro redshift__list_relations_without_caching(schema_relation) %}
{% if schema_relation.database == target.database %}
{{ return(postgres__list_relations_without_caching(schema_relation) }}
{% else %}
{% call statement('list_relations_without_caching', fetch_result=True) -%}
select
table_catalog as database,
table_name as name,
table_schema as schema,
table_type as type
from svv_tables
where table_catalog ilike '{{ schema_relation.database }}'
and table_schema ilike '{{ schema_relation.schema }}'
{% endcall %}
{{ return(load_result('list_relations_without_caching').table) }}
{% endmacro %} You can actually give that a try by copy-pasting that macro code into your own project first — dbt will prefer your version over its own built-in one — to see if that resolves the bug. |
Hi @jtcohen6, I've returned to the issue recently as this a key blocker for us to adopt a local development workflow, which is based on creating proper proxy views in There's also another solution for local development which was presented to us today by So, I have tested the custom
I will try to debug that and find some solution in the upcoming days, but I would really appreciate prioritising that issue - especially when it directly affects approaches recommended by |
Hi @jaklan, I believe that the error is because of this
The function is calling itself recursively forever if the first With a bit of copy/pasting from the original macro it could look like:
|
Sorry! I meant to name the macro in my example above |
Back to you guys - I was able to make it work by using {% macro redshift__list_relations_without_caching(schema_relation) %}
{% if schema_relation.database == target.database %}
{{ return(postgres__list_relations_without_caching(schema_relation)) }}
{% else %}
{% call statement('list_relations_without_caching', fetch_result=True) -%}
select
database_name as database,
table_name as name,
schema_name as schema,
table_type as type
from svv_redshift_tables
where database_name ilike '{{ schema_relation.database }}'
and schema_name ilike '{{ schema_relation.schema }}'
{% endcall %}
{{ return(load_result('list_relations_without_caching').table) }}
{% endif %}
{% endmacro %}
There's also |
Could you share a ballpark figure of how long querying |
takes
takes For the context:
gives
returns
gives
returns |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
Bump to un-stale |
re-opening bc this was not fixed by #428 only addressed the |
Is this a new bug in dbt-core?
Current Behavior
adapter.get_relation
seems to ignoredatabase
argument and use the target database anyway.Expected Behavior
The right database is used.
Steps To Reproduce
I have a model, defined as below, which checks if it already exists in the same schema, but in another table. If it exists - it prints a log message and reads data from there:
I have 2 databases:
dev
andintegrated
.In
dev
I have that table already created, inintegrated
- not.The database specified in the active target is
dev
.Now, when running
dbt run -m "<model_identifier>"
, I would expect it to do nothing indev
database. But, instead of that I get:so
adapter.get_relation
claims to be able to find the table in theintegrated
database - although it doesn't exist there. And because it doesn't exist - the run fails when it tries to actually read the data from there.PS that's only a dummy example to visualise the issue, please don't focus on the logic itself
Relevant log output
No response
Environment
Which database adapter are you using with dbt?
redshift
Additional Context
No response
The text was updated successfully, but these errors were encountered: