Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] CLI Parameter for packages-install-path #9932

Closed
3 tasks done
stevenayers opened this issue Apr 13, 2024 · 3 comments
Closed
3 tasks done

[Feature] CLI Parameter for packages-install-path #9932

stevenayers opened this issue Apr 13, 2024 · 3 comments
Labels
enhancement New feature or request wontfix Not a bug or out of scope for dbt-core

Comments

@stevenayers
Copy link

stevenayers commented Apr 13, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Add a CLI parameter for the packages-install-path, similar to how target-path has one.

In the docs, under target-path, it says:

Just like other global configs, it is possible to override these values for your environment or invocation by using the CLI option (--target-path) or environment variables (DBT_TARGET_PATH).

Describe alternatives you've considered

Using the env var DBT_PACKAGES_INSTALL_PATH.

The issue here is that some orchestration tools, such as Databricks DBT Workflows make setting environment variables very difficult. By adding this cli parameter, we maintain consistency across global configs.

Who will this benefit?

People using orchestration tools with awkward limitations.

Are you interested in contributing this feature?

Yes, the PR is #9933

@dbeatty10
Copy link
Contributor

Thanks for opening this @stevenayers !

Can you share more about the specific use cases where combining a CLI flag with an environment variable is necessary or beneficial versus just merely including the packages-install-path configuration in dbt_project.yml?

@stevenayers
Copy link
Author

Hi @dbeatty10, sure no problem! Let me break this down a bit.

Hardcoding packages-install-path

1. In scenarios when docker containers are being used this can raise difficulties. I won't go into too much detail because it's been documented quite well in this issue #1710.

2. When you are dealing with a lot of orchestration/workflow systems you will often find that the working directory of each step does not share the same working directory as the previous, and they can often be dynamic. Take this pipeline as an example:

Loading
  graph LR;
      A[dbt debug]-->B[dbt run];
      B-->C[dbt test];
      C-->D[dbt docs generate];

Each working directory could look something like /tmp/job-id/step-id

  • dbt debug: /tmp/1ad0ceb/ee74a60082b34c3a3d0df8a0d5d5cbfd7ec5ed6a
  • dbt run: /tmp/1ad0ceb/607646b627e80fe5e45545589fc8c09482010978
  • dbt run: /tmp/1ad0ceb/7e164e3ab723c357cb638ad6c1e1beef19a7fec6
  • dbt test: /tmp/1ad0ceb/cb56f4fdc16d5a79953af3003645a1af5a000926

With this, you don't want to be re-installing your deps at every stage, and likely want to reuse them. This is where, like in issue #1710, you will want to use an environment variable like:

config-version: 2
packages-install-path: "{{ env_var('DBT_PACKAGES_INSTALL_PATH', 'dbt_packages') }}"

You could set packages-install-path: "../dbt_packages", but that's making assumptions when you sometimes need to use shell script logic to figure out what that directory path needs to be.

3. Say you have set packages-install-path to /tmp/my_custom_packages_path so it can be shared between steps. What if you're also running your CI/CD test pipeline in that environment?

Your packages.yml is changed in your feature branch, which updates the package contents in /tmp/my_custom_packages_path. Your live data pipeline is in the middle of running, and when it goes to run, it fails because your feature branch has removed packages your live data pipeline was using when it was running.

This is where you'll want to do something like:

config-version: 2
packages-install-path: "{{ env_var('DBT_PACKAGES_INSTALL_PATH', 'dbt_packages') }}"

and in your pipeline you'll want to set DBT_PACKAGES_INSTALL_PATH to something like /tmp/${ENVIRONMENT}/dbt_packages.

Flag vs env var for packages-install-path

As I mentioned in the original issue, sometimes setting an environment variable can be a pain in some workflow systems. This also isn't very consistent or clean:
DBT_PACKAGES_INSTALL_PATH=/tmp/${ENVIRONMENT}/dbt_packages dbt run --target-path /tmp/${ENVIRONMENT}/target

You're setting config paths via two different methods.

@dbeatty10
Copy link
Contributor

Yesterday @jtcohen6 and myself had a chance to discuss the proposed new CLI flag + environment variable.

Summary

The general case

We've approached where flags can be set differently depending on use-case:

  • configuration settings in dbt_project.yml file are reserved for things that don't change (very often) and are shared across users and invocations, whereas
  • CLI flags are used for things that may change very often (i.e. per invocation)

So generally, we don't let these be set in both places, and it would take a really compelling case for us to do so.

This specific case

In this case, it sounds like the main barrier is that setting environment variables is difficult within Databricks DBT Workflows. If this is the primary barrier, then we'd prefer not to add a new feature to dbt in order to work around it.

So we're closing this and the associated PR in #9933 as not planned.

But if anyone can provide additional examples why should consider supporting a new --packages-install-path CLI flag (and associated DBT_PACKAGES_INSTALL_PATH environment variable) outside of Databricks DBT Workflows, we'd be willing to take another look.

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 10, 2024
@dbeatty10 dbeatty10 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request wontfix Not a bug or out of scope for dbt-core
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants