Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2935] [Regression] DBT stdout behavior regression in a 1.5 minor version, breaks some compatibility in SQLFluff #8317

Closed
2 tasks done
ttusing opened this issue Aug 3, 2023 · 4 comments
Labels
bug Something isn't working regression

Comments

@ttusing
Copy link
Contributor

ttusing commented Aug 3, 2023

Is this a regression in a recent version of dbt-core?

  • I believe this is a regression in dbt-core functionality
  • I have searched the existing issues, and I could not find an existing issue for this regression

Current Behavior

This issue is somewhat related to observations here in an old sqlfluff-github-actions issue: sqlfluff/sqlfluff-github-actions#15 (comment) . This is probably of primary concern to users of Github actions running SQLfluff on their DBT models.

I think this could also be considered a bug in SQLfluff.

(and perhaps this feature request: #7994 )

When running SQLFluff with the dbt templater, SQLFluff invokes DBT. The common command (described in most of these workflows in some form https://github.com/sqlfluff/sqlfluff-github-actions/tree/main) being run in Github actions is :
sqlfluff lint --format github-annotation --annotation-level failure --nofail ${{ steps.get_files_to_lint.outputs.lintees }} > annotations.json

The problem is that, if DBT is giving some log output to stdout, this gets injected into the output file and corrupts the JSON output.

This did not happen in dbt 1.5.0 as long as you had generated a manifest to prevent partial parsing in a previous workflow step. One of the DBT minor versions(1.5.2?) and/or a dbt-snowflake minor version added the adapter version to the output. I.e. this:
21:59:43 Registered adapter: snowflake=1.5.2
ends up in the JSON output.

Therefore, using current minor versions of DBT in 1.5 and/or adapters breaks this workflow.

I'm not quite savvy enough with logging to say exactly how or why this is happening or strictly what reasonable expected behavior is. It seems like one of these could be true/helpful in preventing this compatibility problem:

  1. This kind of consistent stdout set of expectations/behaviors by DBT is not explicitly supported.
  2. Perhaps there is a simple solution as described in the issue using grep that a Github workflow could use to clean up the logs to keep that practice working with newer DBT versions
  3. This actually is primarily a bug in SQLFluff and the fix is to not write DBT log output
  4. Would SQLFluff somehow passing -- quiet somewhere fix this?
  5. Maybe the DBT logger could be improved as described in the issue linked above to use a logger named something like dbt instead of default_stdout (not sure exactly what this means)

I may also open a ticket in SQLFluff and/or SQLFluff Github actions...

Expected/Previous Behavior

DBT does not add log output to SQLFluff when invoked for JSON output.

Steps To Reproduce

  1. Run a standard SQLFluff github workflow using DBT 1.5.4
  2. Observe log output

Relevant log output

From Github actions:

Run cat annotations.json

21:17:32  Registered adapter: snowflake=1.6.0
21:17:32  Unable to do partial parsing because saved manifest not found. Starting full parse.
[{"file": "myfile.sql", "line": 1, "start_column": 1, "end_column": 1, "title": "SQLFluff", "message": "LT05: Line is too long (110 > 100).", "annotation_level": "failure"}, {"file": "myfile.sql", "line": 2, "start_column": 1, "end_column": 1, "title": "SQLFluff", "message": "LT05: Line is too long (103 > 100).", "annotation_level": "failure"}, {"file": "myfile.sql", "line": 32, "start_column": 1, "end_column": 1, "title": "SQLFluff", "message": "LT02: Expected indent of 8 spaces.", "annotation_level": "failure"}]

^^ this is the JSON output with bad DBT logs at the top. Adding a dbt ls step removed the full parse part, adapter part does not seem able to be worked around, except maybe with grep.

Environment

- OS: Github Actions
- Python: 3.9.12
- dbt (working version): 1.5.0
- dbt (regression version): 1.5.4 (or earlier)

Which database adapter are you using with dbt?

snowflake

Additional Context

No response

@ttusing ttusing added bug Something isn't working regression triage labels Aug 3, 2023
@github-actions github-actions bot changed the title [Regression] DBT stdout behavior regression in a 1.5 minor version, breaks some compatibility in SQLFluff [CT-2935] [Regression] DBT stdout behavior regression in a 1.5 minor version, breaks some compatibility in SQLFluff Aug 3, 2023
@dbeatty10
Copy link
Contributor

This is great info -- thanks for the effort of putting it all together @ttusing 🤩 !

The outcome of sqlfluff/sqlfluff-github-actions#15 (comment) was being able to suppress non-error logs in output via --quiet.

I'm wondering if sqlfluff isn't using --quiet somewhere. Could you open up a bug report there and reference this issue? Then we can sort out if it's something that the sqlfluff folks can solve directly or if there's something they need us to unlock for them.

I like #7994 😎 , but we'll need to learn more from the sqlfluff folks what they need here.

@ttusing
Copy link
Contributor Author

ttusing commented Aug 5, 2023

@dbeatty10 I added an issue to sqlfluff.

When is --quiet expected to work? I am looking for documentation - is it only in run-operation?

Searching around for potentially related issues, also found this
#6749

@dbeatty10
Copy link
Contributor

When is --quiet expected to work?

--quiet / -q should work for all dbt subcommands: run-operation, list / ls, seed, test, run, build, etc.

Personally, I use it most often with list when I want JSON output which I often post-process with jq to pretty-print like this:

dbt --quiet list --output json | jq .

@dbeatty10 I added an issue to sqlfluff.

Thanks for opening sqlfluff/sqlfluff#5054 @ttusing! I've subscribed to that issue so I'll be notified of replies. Since this looks like an issue that sqlfluff should be able to handle via --quiet 🤞, I'm going to close this in favor of the issue you opened. We can always re-open this one if needed.

@dbeatty10 dbeatty10 removed the triage label Aug 6, 2023
@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Aug 6, 2023
@dbeatty10
Copy link
Contributor

For anyone watching this issue, sqlfluff merged a PR intended to address this: sqlfluff/sqlfluff#5070

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working regression
Projects
None yet
Development

No branches or pull requests

2 participants