-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dbt-databricks 1.6.X overeager introspection (describe extended ...
) of all tables in schema
#442
Comments
describe extended ...
) of all tables in schemadescribe extended ...
) of all tables in schema
Fwiw, I tested
|
describe extended ...
) of all tables in schemadescribe extended ...
) of all tables in schema
@susodapop I thought we fixed this...:/ |
Thanks for reporting @jeremyyeo |
@jeremyyeo, shot in the dark, does the error reproduce if you specify a catalog in the profile? My interpretation of #231, is that the fasting caching only happens if this is the case, but I have yet to see the conditional code path that |
It appears to be a hive-metastore specific thing. In my personal testing, I don't hit this issue if I specify some other catalog, but when specifying hive-metastore, it reproduces. |
Describe the bug
dbt-databricks==1.6.2
appears to be aggressively introspecting all tables in a schema prior to running any models. This causes a huge bottleneck and a large performance degradation vs prior versions (i.e.dbt-databricks~=1.5.0
).Steps To Reproduce
Use a schema with hundreds of random tables (below,
hive_metastore.dbt_jyeo
has 300+ random delta tables):Setup dbt project:
foo
with 1.6.latest:Expected behavior
We should not be running
describe extended
on everything in the schema... only the relevant models (likefoo
).Screenshots and log output
See above.
System information
The output of
dbt --version
:The operating system you're using:
The output of
python --version
:pip freeze
on 1.6Additional context
If we downgrade to the dbt-databricks 1.5.latest and rerun:
^ ~37 seconds vs ~215 seconds.
The implications here are significant actually - i.e. some folks write all their dbt Cloud Slim CI tables to the same schema - so with many open PR's the schema get's filled up more and more and having to
describe extended
over potentially many thousands of objects before a model is even run is a huge performance impact.pip freeze
on 1.5The text was updated successfully, but these errors were encountered: