Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dbt logs not getting generated for dbt commands in Databricks (dbt-databricks) #423

Closed
abhishek-7781 opened this issue Aug 25, 2023 · 12 comments
Labels
bug Something isn't working

Comments

@abhishek-7781
Copy link

abhishek-7781 commented Aug 25, 2023

Describe the bug

I'm running a dbt project (which reads data from ADLS and writes back to ADLS) using a Databricks Workflow. When I configure log-path in dbt-project.yml to generate dbt logs in a specific DBFS location, dbt is generating logs only for dbt deps command and not for dbt debug, dbt run or dbt test etc.
Although it works fine while running it from local Python virtual environment (through VSCode).

Steps To Reproduce

  1. Setup a dbt project for Databricks
  2. profiles.yml
project_name:
    output:
        databricks_job:
              type: databricks
              method: http
              threads: 4
              schema: "{{ env_var('target_schema') }}"
              host: "{{ env_var('hostname') }}"
              http_path: "{{ env_var('http_path') }}"
              token: "{{ env_var('token') }}"
  1. dbt_projectyml
name: 'project_name'
version: '1.0.0'
config-version: 2
profile: 'profile_name'
model-paths: ["models"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros","dbt_packages/shared_repo_name/macros"]
snapshot-paths: ["snapshots"]

target-path: /dbfs/dbt-databricks/target/
log-path: /dbfs/dbt-databricks/logs/
  1. Run the following commands
dbt deps
dbt debug
dbt run --select mode_name
dbt test --select model_name

Expected behavior

dbt should have generated logs for all the above mentioned commands in the ./logs/dbt.log file.

System information

- Environment/Platform: Databricks
- Python: 3.9.5
- dbt-core: 1.3.5
- dbt-databricks: 1.3.2
- databricks-sql-connector: 2.1.0

Additional context

System information of the local machine where this works fine:

- Python: 3.10.9
- dbt-core: 1.3.5/1.6.0
- dbt-databricks: 1.3.2/1.6.1
- databricks-sql-connector: 2.7.0

Error screenshots

image
image
image

image

@abhishek-7781 abhishek-7781 added the bug Something isn't working label Aug 25, 2023
@susodapop
Copy link

The error message indicates this is due to a non-posix compliant windows feature, likely an artefact of ADLS. When you run it locally are you also reading and writing from the same ADLS locations? If so, I would expect the error to reproduce locally as well.

Either way, if this is to be resolved it needs to be handled by the dbt-core maintainers, rather than in this adapter. There have been significant changes to the event handling in dbt-core since version 1.3 (the latest release is 1.6). Your issue could resolve itself by upgrading to a newer version of dbt.

Also keep in mind that the version you're using today will reach EOL in two months so it could make sense to upgrade anyway.

@abhishek-7781
Copy link
Author

abhishek-7781 commented Aug 26, 2023

@susodapop Yes, I'm writing to the same ADLS location from my local system as well.

I won't disagree with you but as I had raised another issue already that on Databricks (which runs on Python 3.9.5), dbt-databricks 1.6 doesn't work yet and that's why I'm stuck with 1.3.

#410

@abhishek-7781
Copy link
Author

abhishek-7781 commented Aug 28, 2023

@susodapop Hey, I think I know the root cause of this issue.

Root Cause Analysis
dbt while running from Databricks allows multiple dbt commands to run from a Databricks Workflow. Now when the first command runs (which usually is dbt deps), dbt creates a dbt.log file to write the logs into it. The problem occurs when dbt fails to close the dbt.log (or dbt.log.legacy) file properly and hence for the subsequent dbt commands, dbt fails to write the logs into the log file and hence the OSError. This issue only occurs when a DBFS location is provided in log-path parameter in dbt_project.yml and hence while running it from a local machine, dbt generates and stores the log file locally and not on DBFS.
However, when no custom log-path location is provided, dbt generates all the artefacts including the logs in a temporary directory which doesn't create this issue.
Similar Issue: [https://github.com/dbt-labs/dbt-core/issues/4629]

This is where the log file name is generated in dbt-core.
image

Solution:

  1. dbt must close the log file properly after every command run to write logs for the subsequent commands as well in the same log file.

Possible Workarounds:

  1. Writing a macro to generate separate log file with different file names (suffixed by dbt command, e.g. dbt_test.log, dbt_deps.log, dbt_run.log) for each dbt command [NOT AVAILABLE]

  2. Separate log files with different file names (suffixed by dbt command, e.g. dbt_test.log, dbt_deps.log, dbt_run.log) for each dbt command, fetching the dbt command name (and parameters) using a variable/ environment variable and using that variable to change the full log file path (including file name) in dbt_project.yml [NOT AVAILABLE]
    log-path: /dbfs/dbt-databricks/logs/dbt_{{ var("variable") }}.log

  3. Separate log files with different file names (suffixed by dbt command, e.g. dbt_test.log, dbt_deps.log, dbt_run.log) for each dbt command, passing an additional parameter with each dbt command and using that variable in the log file path in dbt-project.yml to create a separate directory for each dbt command storing individual log files [CAN BE DONE]
    log-path: /dbfs/dbt-databricks/logs/{{ var("variable") }}/

  4. Passing the log-path parameter with each dbt command with different log-path location [CAN BE DONE]
    dbt --log-path '/dbfs/dbt/logs_debug' debug
    dbt --log-path '/dbfs/dbt/logs_run' run

@abhishek-7781
Copy link
Author

@susodapop Please let me know if this needs to be raised in dbt-core instead of here.

@abhishek-7781 abhishek-7781 changed the title dbt logs not getting generated for dbt debug, dbt run and dbt test commands in Databricks (dbt-databricks) dbt logs not getting generated for dbt run and dbt test commands in Databricks (dbt-databricks) Sep 1, 2023
@abhishek-7781 abhishek-7781 changed the title dbt logs not getting generated for dbt run and dbt test commands in Databricks (dbt-databricks) dbt logs not getting generated for dbt commands in Databricks (dbt-databricks) Sep 4, 2023
@benc-db
Copy link
Collaborator

benc-db commented Sep 5, 2023

Does this still happen on 1.6.x?

@abhishek-7781
Copy link
Author

@benc-db Yes, it's happening on both 1.3.x and 1.6.x

@abhishek-7781
Copy link
Author

@benc-db Since, the following package combinations are working on Databricks, this issue still occurs for both:

  1. databricks-sql-connector==2.9.3, dbt-databricks==1.6.1
  2. databricks-sql-connector==2.1.0, dbt-databricks==1.3.2

image

@benc-db
Copy link
Collaborator

benc-db commented Sep 7, 2023

Thanks for the report 👍

@abhishek-7781
Copy link
Author

@benc-db Sorry, not trying to push you on this. But any luck with this bug?
Also, can you please let me know which module in the dbt-databricks package creates and writes the log files?

@benc-db
Copy link
Collaborator

benc-db commented Sep 8, 2023

dbt-databricks doesn't house the logic for writing the log files, dbt-core does. I haven't had an opportunity to try reproducing yet.

@abhishek-7781
Copy link
Author

dbt-databricks doesn't house the logic for writing the log files, dbt-core does. I haven't had an opportunity to try reproducing yet.

@benc-db Yes, I thought so. In that case, let me raise it in dbt-core instead and close this one.

@abhishek-7781
Copy link
Author

Closing this issue since this comes under the scope of dbt-core. A new issue #8608 has been raised there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants