Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-3297] [Bug] --select matches models with the same name from all installed packages #8954

Closed
2 tasks done
jaklan opened this issue Oct 31, 2023 · 5 comments
Closed
2 tasks done
Labels
bug Something isn't working wontfix Not a bug or out of scope for dbt-core

Comments

@jaklan
Copy link

jaklan commented Oct 31, 2023

Is this a new bug in dbt-core?

  • I believe this is a new bug in dbt-core
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When you have a model with the same name in both current project and packages installed as dependencies and you run e.g. dbt compile -s some_model - dbt will compile all models with such name (== models from all packages).

Access modifiers doesn't change the above behaviour - even if an installed package has restrict-access set to True, such model can be defined as any of private, protected, public and the effect would be exactly the same.

Expected Behavior

I would expect dbt to only select a model from the current project (namespace).

  • If you want to match a model with the same name from an installed package - you should specify it explicitly with -s "another_package.some_model".
  • If you want to match all models with given name from all packages - you should be able to specify it as -s "*.some_model".

Moreover, I believe it shouldn't be possible to select private and protected models from a package with restrict-access: True at all.

Steps To Reproduce

  • create package_a with restrict-access: True
  • create model model_a in package_a
    • you can specify any access type: private, protected, public - effect would be the same
  • create package_b
  • specify package_a as a dependency in package_b
  • create model_a in package_b
  • run dbt compile -s model_a inside package_b
  • dbt compiles model_a from both packages

Relevant log output

No response

Environment

- OS: macOS
- Python: 3.10
- dbt: both 1.6 & 1.7

Which database adapter are you using with dbt?

redshift

Additional Context

No response

@jaklan jaklan added bug Something isn't working triage labels Oct 31, 2023
@github-actions github-actions bot changed the title [Bug] "--select" matches models with the same name from all installed packages [CT-3297] [Bug] "--select" matches models with the same name from all installed packages Oct 31, 2023
@jaklan
Copy link
Author

jaklan commented Oct 31, 2023

cc: @jtcohen6 @MichelleArk, as you were working on restrict-access: True

@dbeatty10 dbeatty10 self-assigned this Nov 1, 2023
@jaklan jaklan changed the title [CT-3297] [Bug] "--select" matches models with the same name from all installed packages [CT-3297] [Bug] --select matches models with the same name from all installed packages Nov 1, 2023
@dbeatty10
Copy link
Contributor

Thanks for raising this @jaklan !

You listed (4) main expectations:

  1. I would expect dbt to only select a model from the current project (namespace).
  2. If you want to match a model with the same name from an installed package - you should specify it explicitly with -s "another_package.some_model".
  3. If you want to match all models with given name from all packages - you should be able to specify it as -s "*.some_model".
  4. I believe it shouldn't be possible to select private and protected models from a package with restrict-access: True at all.

I think that we currently fulfill the middle two expectations and it's the first and last ones in which you were surprised by the current behavior.

Met expectations

2.

2. If you want to match a model with the same name from an installed package - you should specify it explicitly with -s "another_package.some_model".

Indeed, this can be done currently in a variety of ways. Here's a sampling:

dbt list -s package_a.my_model
dbt list -s "package_a.my_model"
dbt list -s 'package_a.my_model'
dbt list -s my_model,package:package_a
dbt list -s package:package_a,my_model

3.

3. If you want to match all models with given name from all packages - you should be able to specify it as -s "*.some_model".

This can also be done:

dbt list -s "*.my_model"
dbt list -s '*.my_model'

Unmet expectations

1.

1. I would expect dbt to only select a model from the current project (namespace).

As you pointed out, this will list all the matching nodes:

dbt list -s my_model
$ dbt list -s my_model
package_a.my_model
package_b.my_model

That's just the the way selection works; by default it will match across all the installed packages unless you specifically restrict it.

#6891 would give a convention that would restrict selections to the current package.

Either alternatively (or in conjunction), we could choose to create a global config that would allow a project-wide setting to restrict selections to the current package only. Something like --select-packages / DBT_SELECT_PACKAGES (with ["all" | "self"] as the two valid values).

4.

4. I believe it shouldn't be possible to select private and protected models from a package with restrict-access: True at all.

restrict-access only comes into play when a model is referenced via ref.

A this point in time, we don't have plans to expand the scope to affect selection as well.

The difference between access levels and ability to select a node:

  • access modifiers determine if one model can reference another model. So it's being checked at a very local level (within the context of a single model)
  • selection is pattern matching for nodes within a graph (specifically the DAG). So it's being checked a very high level (within the context of all the known node / resources)

Summary

I'm going to close this as "wont_fix" because it looks like dbt is currently behaving how we intend and expect.

However, I'll leave it up to you if you want to pursue opening one or more feature requests if there's net-new functionality you are hoping to see (like #6891 or a new global config environment variable like DBT_SELECT_PACKAGES, etc.).

@dbeatty10 dbeatty10 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 1, 2023
@dbeatty10 dbeatty10 added wontfix Not a bug or out of scope for dbt-core and removed triage labels Nov 1, 2023
@dbeatty10 dbeatty10 removed their assignment Nov 1, 2023
@jaklan
Copy link
Author

jaklan commented Nov 2, 2023

@dbeatty10 thank you again for the detailed response!

  • ad 2. and 3. expectation - yeah, I know it's currently possible. My suggestion was simply about making it the only way to select models from installed packages, so no implicit selection can happen 😉

  • ad 1. expectation - thanks for pointing out #6891, I will add some comments there (although it seems to be a bit stale)

  • ad 4. expectation - I get your point, but tbh I find it pretty confusing - when I make my model private or protected and set restrict-access as True, I would expect it can be neither referenced, selected nor executed in downstream packages - so it's simply "hidden" from them. You mentioned that:

    A this point in time, we don't have plans to expand the scope to affect selection as well.

    but maybe we can at least create a feature request where we could discuss it further?

@dbeatty10
Copy link
Contributor

a 4. expectation

I would expect it can be neither referenced, selected nor executed in downstream packages - so it's simply "hidden" from them

One way to do this would be a new config within dbt_project.yml like restrict-visibility or restriction-selection. I haven't thought through the implications and trade-offs of separating restrict-visibility from restrict-access, but offering it as one idea anyways.

It could behave like this:

When restrict-visibility: True:

  • Protected or private models from outside the package will not be included within selections.
  • Only models from outside the package that have access: public can be included within selections.

When restrict-visibility: False (default):

  • Models in the package with access: protected or access: private can be included within selections, as if they were defined in the same project

Feature requests for further discussion

but maybe we can at least create a feature request where we could discuss it further?

💡 Absolutely! Appreciate all your perspectives on this @jaklan. Feel free to create feature requests any time. Even when any idea has already been discussed previously, we're comfortable revisiting past decisions and either affirming them or updating our thinking.

At this time, I'm not personally planning to submit a feature request for the restrict-visibility idea from above, but you or anyone else should feel free to.

Great ideas within #6891 (comment). We'll probably end up converting that comment into one or more stand-alone feature requests for further discussion.

@jaklan
Copy link
Author

jaklan commented Nov 2, 2023

@dbeatty10 imho that's a very good proposal! I would definitely create a feature request based on that idea this week, I will link it here when ready.

Ad #6891 - sure, please let me know if you need any help with that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working wontfix Not a bug or out of scope for dbt-core
Projects
None yet
Development

No branches or pull requests

2 participants