Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS Databricks - 1.6.0 causing dbt commands to describe tables and schemas unrelated to the operation #403

Closed
jjardis opened this issue Aug 2, 2023 · 7 comments · Fixed by #404
Assignees
Labels
bug Something isn't working

Comments

@jjardis
Copy link

jjardis commented Aug 2, 2023

Describe the bug

Our jobs used to only scan tables related to a job, for example, dbt run --select my_model would show DESCRIBE statements for tables within the schema that this model was concerned with.

It is now describing tables within schemas the run doesn't manage, which is both adding execution time and causing errors when the user doesn't have access to the storage location of that data.

Steps To Reproduce

  1. Create a basic SQL endpoint with default permissions in AWS
  2. Create a simple DBT project and run it in databricks, preferrably with two different schemas that contain tables. Call this SCHEMA_A and SCHEMA_B
  3. Run a DEEP COPY into a table in SCHEMA_B that is stored in an external storage location that databricks doesn't have access to by default.
  4. Run the models for SCHEMA_A, this will show a failure.

Expected behavior

This should only run DESCRIBE on tables that have been selected to be ran by the job, not all tables or unrelated tables to the project.

System information

I'm aware this isn't the same command, but I don't have access to this directly.
The output of dbt --version:

+ dbt deps
14:08:06  Running with dbt=1.6.0
14:08:06  Installing dbt-labs/dbt_utils
14:08:07  Installed from version 0.9.2
14:08:07  Updated version available: 1.1.1
14:08:07  
14:08:07  Updates available for packages: ['dbt-labs/dbt_utils']                 
Update your versions in packages.yml, then run dbt deps

The operating system you're using:
Databricks jobs: 11.2 runtime version

The output of python --version:
3.9.5

Additional context

I believe that this has something to do with the following change: #326, it is the only change to this I could find in the release between dbt-core / dbt-databricks 1.6.0

@jjardis jjardis added the bug Something isn't working label Aug 2, 2023
@susodapop susodapop self-assigned this Aug 2, 2023
@susodapop
Copy link

Thanks for opening this issue and for the reproduction steps. Investigating now.

@susodapop
Copy link

Yep, #326 is the culprit. Fix incoming.

@susodapop
Copy link

Okay the quick fix is that we're going to revert #326 and cut release 1.6.1 with it reverted. Then to solve the original issue that #326 tries to solve, we'll restore the behaviour behind a config so that AWS Glue Catalog users can turn it on as needed.

@susodapop
Copy link

Revert has been completed and release 1.6.1 is live on pypi and on github: https://github.com/databricks/dbt-databricks/releases/tag/v1.6.1

pip install dbt-databricks==1.6.1

@susodapop
Copy link

Please reopen the issue if the error persists.

@jjardis
Copy link
Author

jjardis commented Aug 9, 2023

@susodapop The issue has persisted, we have resolved by rolling back to 1.5.4 as that was the last working version reported in our systems.

@jeremyyeo
Copy link

Just filed #442 before finding this issue - probably the same thing so see that issue for a reproduction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
3 participants