-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Authorized Views Bug When Using Schema Property on Relationship #87
Comments
@BenLiyanage Thanks for the detailed write-up, and sorry for the delay getting back to you! ExplanationYou're doing everything right here. The problem has to do with how dbt resolves and stores configuration values. Crucially, this happens at parse time. Configs aren't recalculated at execution time, with the special exception of pre/post hooks that wrapped in extra curly braces. At parse time, dbt hasn't yet resolved the configs of every other model, including This is quite confusing behavior that trips up lots of folks in dbt (dbt-labs/dbt-core#2921, dbt-labs/dbt-core#2793, etc). The ultimate resolution might look like:
WorkaroundIn the meantime, in order to keep the table location and the authorized view's access grant in sync, your best bet is to use a
The var / env var is fully resolved and available at parse time, so dbt will calculate and store the correct value in both places. Okay, what would it look like to actually solve this problem? I think your suggestions are spot on. More visibility
Agreed — let's add some more logging to the logger.debug(f"On {entity}: Attempting to grant {access_entry}")
if access_entry in access_entries:
logger.debug(f"Access entry {access_entry} "
f"already exists in dataset")
return
access_entries.append(AccessEntry(role, entity_type, entity))
dataset.access_entries = access_entries
client.update_dataset(dataset, ['access_entries'])
logger.debug(f"On {entity}: Succeeding in granting {access_entry}") Better overall approach
You're really onto something! This feels quite ergonomic, and it honestly wouldn't be so far-fetched to implement. Let's say you were able to define a config like: {{ config(materialized = 'view', authorized_view = true) }} Then, we could add logic to the view materialization: {% if config.get('authorized_view') %}
{# Create a list of all ref/source relations this model depends on #}
{% set source_array = [] %}
{% for ref_table in model.refs %}
{{ source_array.append(ref(*ref_table)) }}
{% endfor %}
{% for src_table in model.sources %}
{{ source_array.append(source(*src_table)) }}
{% endfor %}
{# For each ref/source, grant access to its dataset. This could be cleverer to avoid duplicative grants #}
{% for src_tbl in source_array %}
{% set grant_target_dict = {'project': src_tbl.project, 'dataset': src_tbl.dataset} %}
{% do log(grant_target_dict, info = true) %}
{% do adapter.grant_access_to(this, 'view', None, grant_target_dict) %}
{% endfor %}
{% endif %} With that logic in place, I manage just run the view, and it grants access to every In that world, users probably wouldn't need to pass the While this is a tricky feature, the code is nicely self-contained, so I'm going to mark this a |
Hey @jtcohen6 --can you provide an example of the project version of this setup? Re: a better implementation -- We are an enterprise customer. Ideally this is put onto a backlog since we are paying a hefty sum for this product. We pay for this so we don't have to build our own internal version, and can focus on building our own product differentiator. Let me know if there are any dots that need to be connected around that. I believe @matt-winkler was one of our onboarding reps. |
Another callout around authorized views: in our home grown system we also did what you guys are doing--authorizing views individually via an append strategy. This is incredibly slow compared to updating a datasets list once at the end with the exhaustive list of views to authorize. |
Another callout--I was really hoping to make a macro that just authorizes a bunch of refs. If we have to hard code the locations of all the refs that seems like it would make it hard to do, or atleast require a lot of variables. |
@BenLiyanage Appreciate the callouts. My goal here is to balance between refining an improvement to the general implementation that would make sense for most users (I think you suggested a good one), and giving you the information and tools to unlock the behavior you're after, given your specific needs. If you're willing to include a custom macro in your project for this, there are folks on the dbt Labs team who can certainly help with the solution-ing work to make it happen. I believe it would be totally doable, by combining a few ingredients:
I can follow up a bit later with some example code to sketch out what such an approach could look like in your project |
This is decidedly spaghetti code, but hopefully it's a useful demonstration of what's possible when writing custom macros that tap into existing functionality in the
-- macros/grant_operation.sql
{% macro grant_access_to(entities, entity_type, role, grant_target_dict) -%}
{# This macro takes a list of "entities" (authorized views) and grants them access to a dataset (specified by grant_target_dict) #}
{# Currently, the dbt-biquery (Python) method `grant_access_to` appends entities one-by-one #}
{# It sounds like we could speed this up by making it possible to append many entities at once #}
{% for entity in entities %}
{% set entity_name = entity.database + "." + entity.schema + "." + entity.alias %}
{% do log("... to " + entity_name, info = true) %}
{% do adapter.grant_access_to(entity, entity_type, role, grant_target_dict) %}
{% endfor %}
{% endmacro %}
{% macro parse_auth_view_upstream_datasets_from_results(results) %}
{# This macro takes the `Results` object, available in the `on-run-end` context, identifies models #}
{# configured with `authorized_view: true`, and parses the database location of their upstream model dependencies #}
{# via the `graph.nodes` context variable #}
{# It returns a dictionary containing one entry per dataset to grant access on, and each of those entries #}
{# contains a list of authorized views that need access to it #}
{% if execute %}
{% set datasets_to_grant_access_on = {} %}
{% for result in results %}
{% set node = result.node %}
{% if node.config.get('authorized_view') %}
{% for upstream_id in node.depends_on.nodes %}
{% for upstream_node in graph.nodes.values() %}
{% if upstream_node['unique_id'] == upstream_id %}
{% set dataset_fqn = upstream_node.database + '.' + upstream_node.schema %}
{% if dataset_fqn in datasets_to_grant_access_on.keys() %}
{% do datasets_to_grant_access_on[dataset_fqn]['needs_access'].append(node) %}
{% else %}
{% do datasets_to_grant_access_on.update({dataset_fqn: {
'project': upstream_node.database,
'dataset': upstream_node.schema,
'needs_access': [node]
}}) %}
{% endif %}
{% endif %}
{% endfor %}
{% endfor %}
{% endif %}
{% endfor %}
{{ return(datasets_to_grant_access_on) }}
{% endif %}
{% endmacro %}
{% macro grant_access_for_all_authorized_views(results) %}
{# This macro is the entrypoint to the operation. It receives results, #}
{# parses them via the macro defined above, and finally loops over each dataset entry #}
{# to grant access #}
{% set datasets_to_grant_access_on = parse_auth_view_upstream_datasets_from_results(results) %}
{% for dataset_grant in datasets_to_grant_access_on %}
{% set grant_target_dict = {
'project': datasets_to_grant_access_on[dataset_grant]['project'],
'dataset': datasets_to_grant_access_on[dataset_grant]['dataset']
} %}
{% do log("Granting access on dataset " + dataset_grant, info = true) %}
{% do grant_access_to(
entities = datasets_to_grant_access_on[dataset_grant]['needs_access'],
entity_type = 'view',
role = None,
grant_target_dict = grant_target_dict
) %}
{% endfor %}
{% endmacro %} As an example, I define two table models in different schemas: -- models/my_table.sql
{{ config(materialized = 'table', schema = 'other_schema') }}
select 1 as id -- models/my_other_table.sql
{{ config(materialized = 'table', schema = 'other_other_schema') }}
select 1 as id I define two authorized views in my main schema (note that {{ config(materialized = 'view', authorized_view = true) }}
select * from {{ ref('my_table') }}
union all
select * from {{ ref('my_other_table') }} {{ config(materialized = 'view', authorized_view = true) }}
select * from {{ ref('my_table') }} Finally, I add the entry-point macro as a project # dbt_project.yml
on-run-end:
- "{{ grant_access_for_all_authorized_views(results) }}" And:
|
This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days. |
I'd still like to figure out a better path forward for this, since it is awkward in dbt today. I sense there's a pretty clear trade-off between:
|
Hello @jtcohen6 and @BenLiyanage , Thank you for the above illustration. I'm new to DBT and trying to implement authorised views in Bigquery and its I tried to implement the same example which showed this PR but i'm getting the error while running it. Could you please check and advise me. Can you please share the configuration to run authorised permissions at every model level instead of on-run-end? step-1 : Created a macro - grant_operation.sql {% for entity in entities %} {% endmacro %} {% macro parse_auth_view_upstream_datasets_from_results(results) %} {% if execute %} {% set datasets_to_grant_access_on = {} %} {% for result in results %}
{% endfor %} {{ return(datasets_to_grant_access_on) }} {% endif %} {% endmacro %} {% macro grant_access_for_all_authorized_views(results) %} {% set datasets_to_grant_access_on = parse_auth_view_upstream_datasets_from_results(results) %} {% for dataset_grant in datasets_to_grant_access_on %}
{% endfor %} {% endmacro %}` step-2 : Created a model with table config- my_table.sql
Error is
|
I opened a new issue proposing a way we could proceed here: #267 Going to close this issue in favor of centralizing the conversation over there. |
@jtcohen6 thanks for suggesting the macro approach, it's working for us with slight changes to the script. We ran into errors if We've had success with moving |
Describe the bug
BigQuery Authorized views cannot be dynamically granted.
Steps To Reproduce
This config should grant access to the underlying tables:
The access grant successfully works with
ref('pharmacy_claims_recon_yearmonth').database
.The access grant DOES NOT successfully work with
ref('pharmacy_claims_recon_yearmonth').schema
. Dataset will work with a variable declared with the{% set variable='value' %}
command, but no variation on the schema property appears to work, including explicitly casting it to a string likeref('pharmacy_claims_recon_yearmonth').schema|string
.Expected behavior
Access grant works correctly off of the schema value of a relationship.
Screenshots and log output
If applicable, add screenshots or log output to help explain your problem.
System information
The output of
dbt --version
:In build configs we use:
0.20.2
Additional context
Here are a couple other observations from troubleshooting this:
Access entry <AccessEntry: role=None, view={'projectId': 'test-dbt-vida', 'datasetId': 'dbt_bliyanage_clinops', 'tableId': 'v_pharmacy_claims_recon_yearmonth'}> already exists in dataset
, even if there is no authorization.In an ideal world, we would just be able to specify
grant_access
, and dbt would be able to resolve all ref's in a particular query, and grant access to the appropriate underlying datasets without them having to be explicitly named by a person.The text was updated successfully, but these errors were encountered: