-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CT-278] [Feature] Allow meta property to be accessible at compile time #4776
Comments
I think the problem isn't so much that meta isn't available at compile time, it's that the config from properties.yml isn't available when the ref is processed. A dbt project is parsed in the order: 1) macros, 2) models, 3) schema files. When the model is "parsed", we extract the ref, source, and config calls and use them to set information in the nodes. It doesn't even help to put the meta config in the model, because that data is extracted at the same time as the ref. The model that is ref'd must resolve to a string at parse time, or we cannot construct the DAG properly. We do have discussions on a regular basis about whether there are advantages to tweaking the order that things are processed, or about moving some flavors of setting to a different place. But these are all pretty complicated and delicate things to change since they are so central to the way that dbt works. Let us know if you have other ideas about how to make this easier to work with. |
Apologies, accidentally closed. Thanks for your response. Back in the old days we could scope variables specifically to models, and these variables were provided in the This was good but it quickly became hard to manage as the We should not revert to that, but I'm wondering if model-scoped variables and being able to provide these in # properties.yml
version: 2
models:
- name: my_table
variables:
param_1: COLUMN_1
param_2: COLUMN_2
param_3: COLUMN_3
param_4: COLUMN_4
source_model_name: my_stage -- my_table.sql
{{ config(materialized='table') }}
SELECT 1 AS "{{ var('param_1') }}"
, 2 AS "{{ var('param_1') }}"
, 3 AS "{{ var('param_1') }}"
, 4 AS "{{ var('param_1') }}"
FROM {{ ref(var('source_model_name')) }} Perhaps then, adding variable processing to the same level as macros or before or after, but before models, would be a minimal solution to this and not have undesirable impact on the fundamental operation of dbt as a whole? Really eager to find a solution to this, as it will have huge implications, as described in my initial proposal. EDIT: I've thought about this a little more and the downside I can see with this is that it's perhaps not intuitive, as these parameters are fundamentally for configuring the models, rather than variables per-say. |
@DVAlexHiggs does replacing e.g.:
work for you as of now? Otherwise can you provide a minimal working example of a project, and I could check if your issue is the same I ran across. Edit: This is actually adviced against here. The workaround would be to use the execute flag as adviced, but that requires you to add |
Hi @scajanus. Yes advice against this is the reason I had not explored that option. I can confirm it does not work reliably, due to the reasons mentioned in the docs you linked (incomplete graph). There must be a way we can get this feature implemented in a dbt-esque way |
Putting the meta in the model's config block also doesn't seem to work as desired: Datavault-UK/automate-dv#106 (comment) @gshank Odd that using a |
@DVAlexHiggs my understanding is that none of the config is treated as final until the full graph is parsed, i.e. there is no (pre)parser that would check if a part of the config is already safe to use -- as in the model case. The easiest way forward would seem to be allowing defining globals (or template/model locals) that cannot be changed by the graph/template parsing. Jinja has The Global Namespace and Template Globals Otherwise you would need to define the order in which each of the tags are expanded and do multiple passes of parsing, which seems nasty; as you would need to also check if config is mutated and determine if that is allowed/what should happen then? If I've understood the discussion here correctly, that is :) Either your config is fixed before parsing the graph (as it used to be?), or you allow the graph to change the config. If you allow the graph to change the config, you cannot use it to alter the way the graph is parsed (and expect consistent results). So if you want to use some variables during the parsing, they need to be immutable by the parsing, as the current variables are. We seem to be looking for a way to have variables/config that is model-specific but immutable. I went through some of the related issues/comments from before (#2377 (comment), #2401, #1790) but none of those change the fact that variables are package global and config can be changed by the graph. They just allow new locations for setting them. Subsequently the feature requested here should not apply for config, subset of config ('meta') or variables, but be its own entity. |
I've found a semi-working solution for this, though it's a bit messy due to the need for Primarily, the need for # properties.yml
version: 2
models:
- name: my_meta_table
meta:
param_1: COLUMN_1
param_2: COLUMN_2
param_3: COLUMN_3
param_4: COLUMN_4
source_model_name: v_stg_orders -- my_meta_table.sql
{{ config(materialized='table') }}
{% do log('meta: ' ~ config.get('meta'), true) %}
{% do log('param_1: ' ~ config.get('meta')['param_1'], true) %}
{% do log('param_2: ' ~ config.get('meta')['param_2'], true) %}
{% do log('param_3: ' ~ config.get('meta')['param_3'], true) %}
{% do log('param_4: ' ~ config.get('meta')['param_4'], true) %}
{% do log('source_model_name: ' ~ config.get('meta')['source_model_name'], true) %}
{% set param_1 = config.get('meta')['param_1'] -%}
{% set param_2 = config.get('meta')['param_2'] -%}
{% set param_3 = config.get('meta')['param_3'] -%}
{% set param_4 = config.get('meta')['param_4'] -%}
{% set source_model_name = config.get('meta')['source_model_name'] -%}
-- depends_on: {{ ref('v_stg_orders') }}
{% if execute %}
SELECT 1 AS "{{ param_1 }}"
, 2 AS "{{ param_2 }}"
, 3 AS "{{ param_3 }}"
, 4 AS "{{ param_4 }}"
FROM {{ ref(source_model_name) }}
{% endif %} Logs: 08:56:46 Running with dbt=1.0.6
08:56:46 meta:
08:56:46 param_1:
08:56:46 param_2:
08:56:46 param_3:
08:56:46 param_4:
08:56:46 source_model_name:
08:56:46 Found 32 models, 0 tests, 0 snapshots, 0 analyses, 497 macros, 0 operations, 0 seed files, 8 sources, 0 exposures, 0 metrics
08:56:46
08:56:48 Concurrency: 4 threads (target='dev')
08:56:48
08:56:48 1 of 1 START table model ALEX_HIGGS.my_meta_table............................... [RUN]
08:56:48 meta: {'param_1': 'COLUMN_1', 'param_2': 'COLUMN_2', 'param_3': 'COLUMN_3', 'param_4': 'COLUMN_4', 'source_model_name': 'v_stg_orders'}
08:56:48 param_1: COLUMN_1
08:56:48 param_2: COLUMN_2
08:56:48 param_3: COLUMN_3
08:56:48 param_4: COLUMN_4
08:56:48 source_model_name: v_stg_orders
08:56:52 1 of 1 OK created table model ALEX_HIGGS.my_meta_table.......................... [SUCCESS 1 in 3.66s]
08:56:52
08:56:52 Finished running 1 table model in 5.63s.
08:56:52
08:56:52 Completed successfully
08:56:52
08:56:52 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1 Compiled SQL: -- depends_on: SANDBOX.ALEX_HIGGS.v_stg_orders
SELECT 1 AS "COLUMN_1"
, 2 AS "COLUMN_2"
, 3 AS "COLUMN_3"
, 4 AS "COLUMN_4"
FROM SANDBOX.ALEX_HIGGS.v_stg_orders |
I can only second this feature request. We are looking for a solution which would allow us to reuse as much as possible from existing marts' models for new marts that are schematically identical but which only differ by some filter or source model names. Thus, we seen to inject the mart's name into the models and into the ref() functions therein. However, extracting the schema name of the current model is harder than I had expected:
I am mentioning this use case and the not-working solutions at this point, as I feel that the proposed solutions above fail for the same reason. On a side note, I also want to mention that folder-level variables might serve some of the abovementioned use cases, as long as they would be available at compile time: #4938 (comment) |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
This is too useful to be closed due to staleness... |
Yes this should remain open |
I'm confused as to why schema files are parsed after the model. Furthermore, I can print the variable, just not use it. Which also feels odd. |
This would be a powerful addition to dbt, and be highly valuable. ⬆ |
Voting for this as well |
Voting for this |
voting for this |
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days. |
Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers. |
I do not believe this should be closed. |
Upvote this one! |
Is there an existing feature request for this?
Describe the Feature
It would be great if the
config.meta
property was populated at compile time. This would allow users to use themeta
dictionary to generate SQL for their dbt models at scale using the meta dictionary as parameters. For highly parameterised SQL via macros, such as in dbtvault, the following approaches are currently possible:fromyaml
) exampledbt_project.yml
All three of these approaches work, but aren't particularly scalable or easy to maintain.
Some downsides of the above approaches:
fromyaml
is sensitive to formatting and is harder to debug because it's a YAML string rather than a YAML filedbt_project.yml
file can quickly grow and become hard or even impossible to manage.I requested a similar feature here and I believe allowing
config
andmeta
to be declared in property (.yml
) files was one of the answers to this idea originally. Whilst this is a fantastic feature and has ensured dbt has come on leaps and bounds with metadata management, I believe it's not quite there yet.I recently tried using
meta
to replace the previously described approaches (fromyaml, params in model, global variables) and unfortunately came across the issue that we couldn't access the data at compile time. The snippets below demonstrate this:Expected/Desired outcome
Current outcome (dbt 1.0.3)
I realise the original intent was probably that metadata would be static data to have alongside models, similar to tags and documentation, however there is a definite use case for it to drive SQL generation, as well.
Thank you!
Describe alternatives you've considered
fromyaml
) exampledbt_project.yml
As descried above.
Who will this benefit?
Anything else?
The text was updated successfully, but these errors were encountered: