-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Regression] Partial parsing is disabled by "secret" env vars in profile connection info #10571
Comments
I tried modifying / adding to the test I added some debug-output of the connection info from the |
I modified the test class |
Actually, the test fails before getting to the partial parsing logic, since now it's not able to connect to Postgres with the Edit: Fixed this by setting the search path as a secret env var (instead of the username), i.e.
|
Any ideas or pointers on how to go about fixing this issue? Should it fetch the connection info values from somewhere else that handles masking of the secret env vars? (I assume they have to be in cleartext in the connection info itself if that is used for the DB connections?) Any thoughts @dbeatty10? |
Thanks for reaching out @fredriv ! While we recognize that the behavior changed for you, we did make this change intentionally [1, 2], and we won't be reinstating the workaround to avoid full reparses by using "secret" env vars. We've opened #10578 and #10579 as follow-ups. Is this something that just affects your local development set ups? Or does it affect your production runs as well? It would be helpful to hear more detail why you chose to configure See below for a couple alternatives you could try out instead -- let me know what you think! Alternative approach 1aYou can achieve a similar effect by using the It will allow you to specify where to store separate copies of the In your case, you'll want to use separate target directories for each different value of Here's how you can use it: export DBT_TARGET_PATH=artifacts-dev-small
DBT_ENV_SECRET_SNOWFLAKE_WAREHOUSE=small dbt parse
DBT_ENV_SECRET_SNOWFLAKE_WAREHOUSE=small dbt ls -s my_model The Then you can switch to use a different partial parsing file like this: export DBT_TARGET_PATH=artifacts-dev-large
DBT_ENV_SECRET_SNOWFLAKE_WAREHOUSE=large dbt parse
DBT_ENV_SECRET_SNOWFLAKE_WAREHOUSE=large dbt ls -s my_model Alternative approach 1bYou may find it more convenient to switch from the Set up separate output targets per warehouse size:
YOUR_PROFILE_NAME:
target: dev
outputs:
dev:
warehouse: large
# ...
large:
warehouse: large
# ...
small:
warehouse: small
# ... Then use commands like this to use the desired target (along with the desired path for the partial parsing file): export DBT_TARGET_PATH=artifacts-dev-large
dbt parse --target large
dbt ls -s my_model --target large |
Thanks for the detailed answer @dbeatty10! This currently affects both our production environment and CI/CD jobs (and to some degree dev environments). We have a setup with a single dbt monorepo with about 2.300 models. Model ownership is distributed between ~10 data teams in a "data mesh" like fashion, and these teams own and run a total of ~80 dbt cron jobs of various sizes and complexity. We use a lot of differently sized Snowflake warehouses for these dbt jobs to optimize cost vs. performance, all the way from X-Small to 3X-Large. To avoid full re-parses for each dbt job run, we have a Github action which runs in production and generates the dbt manifest and The secret env var for setting the Snowflake warehouse allowed us to do this across these various cron jobs while avoiding a full project parse, which would add another ~5 minutes to the job run. We also do something similar in our CI/CD jobs and for running ad hoc dbt jobs. To set up something similar using separate targets (approach 1b) would then mean specifying all combinations of environments (prod/qa/dev) and warehouse sizes as separate targets? This would also require generating, storing and keeping track of a lot more dbt artifacts (in the Github action + dbt cron jobs mentioned above), in addition to a lot of duplicate Snowflake configuration in |
Are you using the
There's some more description of various options here. 1.
|
(Re-)made a PR to fix the slow full parses: dbt-labs/dbt-common#189 So if that gets merged and released we should no longer have need for the workarounds described above to avoid full parses 🙂 |
@fredriv Thanks for opening dbt-labs/dbt-common#189 to resolve #9037 🤩 I'm going to close this one as not planned since we intentionally made the related change. |
Is this a regression in a recent version of dbt-core?
Current Behavior
We're using a "secret" env var (
DBT_ENV_SECRET_...
) in our dbt profile to avoid full parse when changing e.g. Snowflake warehouse between dbt runs, which does not require a full parse.This stopped working after dbt 1.6.12 due to the change in #9844. It now creates a profile hash of all the keys in the connection info but does not mask values that come from "secret" vars into
*****
(unlike e.g.dbt debug
).Due to the size of our dbt project (2000+ models), this adds around 4-5 minutes of parse time to our dbt runs.
Have tested and found this behavior with dbt 1.6.18, 1.7.18 and 1.8.4.
Expected/Previous Behavior
The issue (#9795) related to the change above mentions the workaround to avoid full reparses by using "secret" env vars, and states in the expected behavior that
Users should retan the ability to "opt" out additional fields by setting them to "secret" env vars, the values of which dbt does not track
. However this is not the case since dbt 1.6.12.We expect the previous functionality to work, being able to avoid full reparses by using "secret" env vars in the profile connection info.
Steps To Reproduce
With dbt 1.6.12 or later release, create a
profile.yml
with a "secret" env var in one of the connection parameters, e.g.Run a dbt command that requires parsing, such as
dbt ls -s my_model
. Re-run the command but set the environment variable to a different value, e.g.Observe that this triggers a full parse and prints the message
Unable to do partial parsing because profile has changed
.Relevant log output
No response
Environment
Which database adapter are you using with dbt?
snowflake
Additional Context
No response
The text was updated successfully, but these errors were encountered: