Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Authorized Views Bug When Using Schema Property on Relationship #87

Closed
BenLiyanage opened this issue Dec 10, 2021 · 11 comments
Closed

Authorized Views Bug When Using Schema Property on Relationship #87

BenLiyanage opened this issue Dec 10, 2021 · 11 comments
Labels
bug Something isn't working good_first_issue Good for newcomers

Comments

@BenLiyanage
Copy link

Describe the bug

BigQuery Authorized views cannot be dynamically granted.

Steps To Reproduce

This config should grant access to the underlying tables:

{{ config(
  grant_access_to=[
    {'project': ref('pharmacy_claims_recon_yearmonth').database , 'dataset': ref('pharmacy_claims_recon_yearmonth').schema}
  ]
  )
}}

select * 
from {{ ref('pharmacy_claims_recon_yearmonth') }}

The access grant successfully works with ref('pharmacy_claims_recon_yearmonth').database.

The access grant DOES NOT successfully work with ref('pharmacy_claims_recon_yearmonth').schema. Dataset will work with a variable declared with the {% set variable='value' %} command, but no variation on the schema property appears to work, including explicitly casting it to a string like ref('pharmacy_claims_recon_yearmonth').schema|string.

Expected behavior

Access grant works correctly off of the schema value of a relationship.

Screenshots and log output

If applicable, add screenshots or log output to help explain your problem.

2021-12-10T23:49:04.768410Z: Began running node model.vida.v_pharmacy_claims_recon_yearmonth
2021-12-10T23:49:04.768702Z: 23:49:04 | 1 of 1 START view model dbt_bliyanage_clinops.v_pharmacy_claims_recon_yearmonth [RUN]
2021-12-10T23:49:04.768933Z: Acquiring new bigquery connection "model.vida.v_pharmacy_claims_recon_yearmonth".
2021-12-10T23:49:04.769013Z: Compiling model.vida.v_pharmacy_claims_recon_yearmonth
2021-12-10T23:49:04.775412Z: Writing injected SQL for node "model.vida.v_pharmacy_claims_recon_yearmonth"
2021-12-10T23:49:04.790750Z: finished collecting timing info
2021-12-10T23:49:04.813317Z: Writing runtime SQL for node "model.vida.v_pharmacy_claims_recon_yearmonth"
2021-12-10T23:49:04.826940Z: Opening a new connection, currently in state closed
2021-12-10T23:49:04.827102Z: On model.vida.v_pharmacy_claims_recon_yearmonth: /* {"app": "dbt", "dbt_version": "0.20.2", "profile_name": "user", "target_name": "default", "node_id": "model.vida.v_pharmacy_claims_recon_yearmonth"} */


  create or replace view `test-dbt-vida`.`dbt_bliyanage_clinops`.`v_pharmacy_claims_recon_yearmonth`
  OPTIONS()
  as 






select * 
from `test-dbt-vida`.`dbt_bliyanage_claims`.`pharmacy_claims_recon_yearmonth`;


2021-12-10T23:49:05.944456Z: Access entry <AccessEntry: role=None, view={'projectId': 'test-dbt-vida', 'datasetId': 'dbt_bliyanage_clinops', 'tableId': 'v_pharmacy_claims_recon_yearmonth'}> already exists in dataset
2021-12-10T23:49:05.944813Z: finished collecting timing info
2021-12-10T23:49:05.945195Z: Sending event: {'category': 'dbt', 'action': 'run_model', 'label': '27b7581e-1241-47f3-ad36-bfb5e7099b84', 'context': [<snowplow_tracker.self_describing_json.SelfDescribingJson object at 0x7fc1c9436e80>]}
2021-12-10T23:49:05.945507Z: 23:49:05 | 1 of 1 OK created view model dbt_bliyanage_clinops.v_pharmacy_claims_recon_yearmonth [OK� in 1.18s]
2021-12-10T23:49:05.945589Z: Finished running node model.vida.v_pharmacy_claims_recon_yearmonth

System information

The output of dbt --version:

image

In build configs we use: 0.20.2

Additional context

Here are a couple other observations from troubleshooting this:

  • When this fails to authorize a view (for example, because of a bad value) the error is eaten and never raised, with either a log or error message. This makes it very hard to troubleshoot when this is working.
  • It would be nice if there was a log message confirming the authorization grant when it was successful as well.
  • It appears when there is an error, the logs report Access entry <AccessEntry: role=None, view={'projectId': 'test-dbt-vida', 'datasetId': 'dbt_bliyanage_clinops', 'tableId': 'v_pharmacy_claims_recon_yearmonth'}> already exists in dataset, even if there is no authorization.

In an ideal world, we would just be able to specify grant_access, and dbt would be able to resolve all ref's in a particular query, and grant access to the appropriate underlying datasets without them having to be explicitly named by a person.

@BenLiyanage BenLiyanage added bug Something isn't working triage labels Dec 10, 2021
@jtcohen6 jtcohen6 removed the triage label Dec 16, 2021
@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 16, 2021

@BenLiyanage Thanks for the detailed write-up, and sorry for the delay getting back to you!

Explanation

You're doing everything right here. The problem has to do with how dbt resolves and stores configuration values. Crucially, this happens at parse time. Configs aren't recalculated at execution time, with the special exception of pre/post hooks that wrapped in extra curly braces.

At parse time, dbt hasn't yet resolved the configs of every other model, including ref('pharmacy_claims_recon_yearmonth'), so it just uses some placeholder values for that ref() instead—specifically, it uses the default database + schema of the current model. Those placeholder values are saved in the grant_access_to config, so they're what's used when it comes time later on to actually grant access.

This is quite confusing behavior that trips up lots of folks in dbt (dbt-labs/dbt-core#2921, dbt-labs/dbt-core#2793, etc). The ultimate resolution might look like:

  • Supporting "nested curlies" for more configs, telling dbt to re-render them at execution time (ugh)
  • Re-rendering all configs at execution time, at the risk of the config: selection method yielding inconsistent results
  • Explicitly disallowing use of ref() within the config()` block, because it almost always does the wrong thing

Workaround

In the meantime, in order to keep the table location and the authorized view's access grant in sync, your best bet is to use a var() (defined in dbt_project.yml) or env_var() for both:

  • Configuring the database + schema for pharmacy_claims_recon_yearmonth
  • Configuring grant_access_to for the authorized view

The var / env var is fully resolved and available at parse time, so dbt will calculate and store the correct value in both places.


Okay, what would it look like to actually solve this problem? I think your suggestions are spot on.

More visibility

It would be nice if there was a log message confirming the authorization grant when it was successful as well.

Agreed — let's add some more logging to the grant_access_to method, so that it logs before every grant attempt, and after successful grants. It may add a bit of noise to the debug-level logs, but I think it's worth it for functionality this subtle:

        logger.debug(f"On {entity}: Attempting to grant {access_entry}")

        if access_entry in access_entries:
            logger.debug(f"Access entry {access_entry} "
                         f"already exists in dataset")
            return

        access_entries.append(AccessEntry(role, entity_type, entity))
        dataset.access_entries = access_entries
        client.update_dataset(dataset, ['access_entries'])
        logger.debug(f"On {entity}: Succeeding in granting {access_entry}")

Better overall approach

In an ideal world, we would just be able to specify grant_access, and dbt would be able to resolve all ref's in a particular query, and grant access to the appropriate underlying datasets without them having to be explicitly named by a person.

You're really onto something! This feels quite ergonomic, and it honestly wouldn't be so far-fetched to implement.

Let's say you were able to define a config like:

{{ config(materialized = 'view', authorized_view = true) }}

Then, we could add logic to the view materialization:

    {% if config.get('authorized_view') %}
      {# Create a list of all ref/source relations this model depends on #}
      {% set source_array = [] %}
      {% for ref_table in model.refs %}
        {{ source_array.append(ref(*ref_table)) }}
      {% endfor %}
      {% for src_table in model.sources %}
        {{ source_array.append(source(*src_table)) }}
      {% endfor %}

      {# For each ref/source, grant access to its dataset. This could be cleverer to avoid duplicative grants #}
      {% for src_tbl in source_array %}
        {% set grant_target_dict = {'project': src_tbl.project, 'dataset': src_tbl.dataset} %}
        {% do log(grant_target_dict, info = true) %}
        {% do adapter.grant_access_to(this, 'view', None, grant_target_dict) %}
      {% endfor %}
    {% endif %}

With that logic in place, I manage just run the view, and it grants access to every project.dataset pair I need!

In that world, users probably wouldn't need to pass the grant_access_to config explicitly—but you could always go to "full-control" mode if need be.

While this is a tricky feature, the code is nicely self-contained, so I'm going to mark this a good first issue. Is this something you'd have any interest working on / contributing to? :)

@jtcohen6 jtcohen6 added the good_first_issue Good for newcomers label Dec 16, 2021
@BenLiyanage
Copy link
Author

Hey @jtcohen6 --can you provide an example of the project version of this setup?

Re: a better implementation -- We are an enterprise customer. Ideally this is put onto a backlog since we are paying a hefty sum for this product. We pay for this so we don't have to build our own internal version, and can focus on building our own product differentiator. Let me know if there are any dots that need to be connected around that. I believe @matt-winkler was one of our onboarding reps.

@BenLiyanage
Copy link
Author

Another callout around authorized views: in our home grown system we also did what you guys are doing--authorizing views individually via an append strategy. This is incredibly slow compared to updating a datasets list once at the end with the exhaustive list of views to authorize.

@BenLiyanage
Copy link
Author

Another callout--I was really hoping to make a macro that just authorizes a bunch of refs. If we have to hard code the locations of all the refs that seems like it would make it hard to do, or atleast require a lot of variables.

@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 17, 2021

@BenLiyanage Appreciate the callouts. My goal here is to balance between refining an improvement to the general implementation that would make sense for most users (I think you suggested a good one), and giving you the information and tools to unlock the behavior you're after, given your specific needs.

If you're willing to include a custom macro in your project for this, there are folks on the dbt Labs team who can certainly help with the solution-ing work to make it happen. I believe it would be totally doable, by combining a few ingredients:

  • the grant_access_to macro / context method, which is available for custom use, above and beyond the current implementation in the BigQuery view materialization. Given your comment, it sounds like a concrete improvement would be allowing grant_access_to to accept multiple entities (= authorized views), and then (given a common dataset) granting access to all of them at once
  • the Results object, available in the on-run-end hook (i.e. run once at the end of dbt run), which has access to full configuration and information about the models that just ran — including which ones are configured to be authorized views, and their direct upstream dependencies

I can follow up a bit later with some example code to sketch out what such an approach could look like in your project

@jtcohen6
Copy link
Contributor

jtcohen6 commented Dec 17, 2021

This is decidedly spaghetti code, but hopefully it's a useful demonstration of what's possible when writing custom macros that tap into existing functionality in the dbt-bigquery plugin. I see two clear avenues for improvement:

  • More user-friendly default behavior, which grants access to one auth view's upstream dependencies (as described above)
  • Exposing the capability, in the grant_access_to adapter method, to grant access to multiple auth views at once, for users who want to batch grants for speedier results at scale
-- macros/grant_operation.sql

{% macro grant_access_to(entities, entity_type, role, grant_target_dict) -%}
  {# This macro takes a list of "entities" (authorized views) and grants them access to a dataset (specified by grant_target_dict) #}
  {# Currently, the dbt-biquery (Python) method `grant_access_to` appends entities one-by-one #}
  {# It sounds like we could speed this up by making it possible to append many entities at once #}
  
  {% for entity in entities %}
    {% set entity_name = entity.database + "." + entity.schema + "." + entity.alias %}
    {% do log("... to " + entity_name, info = true) %}
    {% do adapter.grant_access_to(entity, entity_type, role, grant_target_dict) %}
  {% endfor %}
    
{% endmacro %}

{% macro parse_auth_view_upstream_datasets_from_results(results) %}
  {# This macro takes the `Results` object, available in the `on-run-end` context, identifies models #}
  {# configured with `authorized_view: true`, and parses the database location of their upstream model dependencies #}
  {# via the `graph.nodes` context variable #}
  {# It returns a dictionary containing one entry per dataset to grant access on, and each of those entries #}
  {# contains a list of authorized views that need access to it #}

  {% if execute %}

  {% set datasets_to_grant_access_on = {} %}

  {% for result in results %}
  
    {% set node = result.node %}
  
    {% if node.config.get('authorized_view') %}
    
      {% for upstream_id in node.depends_on.nodes %}
        {% for upstream_node in graph.nodes.values() %}
          {% if upstream_node['unique_id'] == upstream_id %}
            {% set dataset_fqn = upstream_node.database + '.' + upstream_node.schema %}
            {% if dataset_fqn in datasets_to_grant_access_on.keys() %}
              {% do datasets_to_grant_access_on[dataset_fqn]['needs_access'].append(node) %}
            {% else %}
              {% do datasets_to_grant_access_on.update({dataset_fqn: {
                'project': upstream_node.database,
                'dataset': upstream_node.schema,
                'needs_access': [node]
              }}) %}
            {% endif %}
          {% endif %}
        {% endfor %}
      {% endfor %}
      
    {% endif %}
    
  {% endfor %}
  
  {{ return(datasets_to_grant_access_on) }}
  
  {% endif %}
  
{% endmacro %}

{% macro grant_access_for_all_authorized_views(results) %}
  {# This macro is the entrypoint to the operation. It receives results, #}
  {# parses them via the macro defined above, and finally loops over each dataset entry #}
  {# to grant access #}
  
  {% set datasets_to_grant_access_on = parse_auth_view_upstream_datasets_from_results(results) %}
  
  {% for dataset_grant in datasets_to_grant_access_on %}
  
    {% set grant_target_dict = {
      'project': datasets_to_grant_access_on[dataset_grant]['project'],
      'dataset': datasets_to_grant_access_on[dataset_grant]['dataset']
    } %}
    
    {% do log("Granting access on dataset " + dataset_grant, info = true) %}
  
    {% do grant_access_to(
      entities = datasets_to_grant_access_on[dataset_grant]['needs_access'],
      entity_type = 'view',
      role = None,
      grant_target_dict = grant_target_dict
    ) %}
  
  {% endfor %}

{% endmacro %}

As an example, I define two table models in different schemas:

-- models/my_table.sql
{{ config(materialized = 'table', schema = 'other_schema') }}
select 1 as id
-- models/my_other_table.sql
{{ config(materialized = 'table', schema = 'other_other_schema') }}
select 1 as id

I define two authorized views in my main schema (note that authorized_view is a config I've made up, not defined in dbt-core or dbt-bigquery — it's just a sentinel for my result-parsing macro):

{{ config(materialized = 'view', authorized_view = true) }}
select * from {{ ref('my_table') }}
union all
select * from {{ ref('my_other_table') }}
{{ config(materialized = 'view', authorized_view = true) }}
select * from {{ ref('my_table') }}

Finally, I add the entry-point macro as a project on-run-end hook:

# dbt_project.yml
on-run-end:
  - "{{ grant_access_for_all_authorized_views(results) }}"

And:

$ dbt run
20:56:25  Running with dbt=1.0.0
20:56:25  Found 4 models, 0 tests, 0 snapshots, 0 analyses, 191 macros, 1 operation, 0 seed files, 0 sources, 0 exposures, 0 metrics
20:56:25
20:56:28  Concurrency: 5 threads (target='dev')
20:56:28
20:56:28  1 of 4 START table model dbt_jcohen_other_other_schema.my_other_table........... [RUN]
20:56:28  2 of 4 START table model dbt_jcohen_other_schema.my_table....................... [RUN]
20:56:31  2 of 4 OK created table model dbt_jcohen_other_schema.my_table.................. [CREATE TABLE (1.0 rows, 0 processed) in 2.77s]
20:56:31  1 of 4 OK created table model dbt_jcohen_other_other_schema.my_other_table...... [CREATE TABLE (1.0 rows, 0 processed) in 2.78s]
20:56:31  3 of 4 START view model dbt_jcohen.my_other_auth_view........................... [RUN]
20:56:31  4 of 4 START view model dbt_jcohen.my_authorized_view........................... [RUN]
20:56:32  4 of 4 OK created view model dbt_jcohen.my_authorized_view...................... [OK in 0.99s]
20:56:32  3 of 4 OK created view model dbt_jcohen.my_other_auth_view...................... [OK in 1.06s]
20:56:32
20:56:32  Running 1 on-run-end hook
20:56:32  Granting access on dataset dbt-dev-168022.dbt_jcohen_other_schema
20:56:32  ... to dbt-dev-168022.dbt_jcohen.my_authorized_view
20:56:33  ... to dbt-dev-168022.dbt_jcohen.my_other_auth_view
20:56:34  Granting access on dataset dbt-dev-168022.dbt_jcohen_other_other_schema
20:56:34  ... to dbt-dev-168022.dbt_jcohen.my_authorized_view
20:56:35  1 of 1 START hook: testy.on-run-end.0........................................... [RUN]
20:56:35  1 of 1 OK hook: testy.on-run-end.0.............................................. [OK in 0.00s]
20:56:35
20:56:35
20:56:35  Finished running 2 table models, 2 view models, 1 hook in 9.77s.
20:56:35
20:56:35  Completed successfully
20:56:35
20:56:35  Done. PASS=4 WARN=0 ERROR=0 SKIP=0 TOTAL=4

@github-actions
Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please remove the stale label or comment on the issue, or it will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jun 16, 2022
@jtcohen6
Copy link
Contributor

I'd still like to figure out a better path forward for this, since it is awkward in dbt today. I sense there's a pretty clear trade-off between:

  • performance hit while dbt is running, to authorize each view immediately after creation
  • latency between view creation + authorization, if the update_dataset calls are batched into an end-of-run hook

@jtcohen6 jtcohen6 removed the Stale label Jun 16, 2022
@vijay-tumati
Copy link

vijay-tumati commented Jun 23, 2022

This is decidedly spaghetti code, but hopefully it's a useful demonstration of what's possible when writing custom macros that tap into existing functionality in the dbt-bigquery plugin. I see two clear avenues for improvement:

  • More user-friendly default behavior, which grants access to one auth view's upstream dependencies (as described above)
  • Exposing the capability, in the grant_access_to adapter method, to grant access to multiple auth views at once, for users who want to batch grants for speedier results at scale
-- macros/grant_operation.sql

{% macro grant_access_to(entities, entity_type, role, grant_target_dict) -%}
  {# This macro takes a list of "entities" (authorized views) and grants them access to a dataset (specified by grant_target_dict) #}
  {# Currently, the dbt-biquery (Python) method `grant_access_to` appends entities one-by-one #}
  {# It sounds like we could speed this up by making it possible to append many entities at once #}
  
  {% for entity in entities %}
    {% set entity_name = entity.database + "." + entity.schema + "." + entity.alias %}
    {% do log("... to " + entity_name, info = true) %}
    {% do adapter.grant_access_to(entity, entity_type, role, grant_target_dict) %}
  {% endfor %}
    
{% endmacro %}

{% macro parse_auth_view_upstream_datasets_from_results(results) %}
  {# This macro takes the `Results` object, available in the `on-run-end` context, identifies models #}
  {# configured with `authorized_view: true`, and parses the database location of their upstream model dependencies #}
  {# via the `graph.nodes` context variable #}
  {# It returns a dictionary containing one entry per dataset to grant access on, and each of those entries #}
  {# contains a list of authorized views that need access to it #}

  {% if execute %}

  {% set datasets_to_grant_access_on = {} %}

  {% for result in results %}
  
    {% set node = result.node %}
  
    {% if node.config.get('authorized_view') %}
    
      {% for upstream_id in node.depends_on.nodes %}
        {% for upstream_node in graph.nodes.values() %}
          {% if upstream_node['unique_id'] == upstream_id %}
            {% set dataset_fqn = upstream_node.database + '.' + upstream_node.schema %}
            {% if dataset_fqn in datasets_to_grant_access_on.keys() %}
              {% do datasets_to_grant_access_on[dataset_fqn]['needs_access'].append(node) %}
            {% else %}
              {% do datasets_to_grant_access_on.update({dataset_fqn: {
                'project': upstream_node.database,
                'dataset': upstream_node.schema,
                'needs_access': [node]
              }}) %}
            {% endif %}
          {% endif %}
        {% endfor %}
      {% endfor %}
      
    {% endif %}
    
  {% endfor %}
  
  {{ return(datasets_to_grant_access_on) }}
  
  {% endif %}
  
{% endmacro %}

{% macro grant_access_for_all_authorized_views(results) %}
  {# This macro is the entrypoint to the operation. It receives results, #}
  {# parses them via the macro defined above, and finally loops over each dataset entry #}
  {# to grant access #}
  
  {% set datasets_to_grant_access_on = parse_auth_view_upstream_datasets_from_results(results) %}
  
  {% for dataset_grant in datasets_to_grant_access_on %}
  
    {% set grant_target_dict = {
      'project': datasets_to_grant_access_on[dataset_grant]['project'],
      'dataset': datasets_to_grant_access_on[dataset_grant]['dataset']
    } %}
    
    {% do log("Granting access on dataset " + dataset_grant, info = true) %}
  
    {% do grant_access_to(
      entities = datasets_to_grant_access_on[dataset_grant]['needs_access'],
      entity_type = 'view',
      role = None,
      grant_target_dict = grant_target_dict
    ) %}
  
  {% endfor %}

{% endmacro %}

As an example, I define two table models in different schemas:

-- models/my_table.sql
{{ config(materialized = 'table', schema = 'other_schema') }}
select 1 as id
-- models/my_other_table.sql
{{ config(materialized = 'table', schema = 'other_other_schema') }}
select 1 as id

I define two authorized views in my main schema (note that authorized_view is a config I've made up, not defined in dbt-core or dbt-bigquery — it's just a sentinel for my result-parsing macro):

{{ config(materialized = 'view', authorized_view = true) }}
select * from {{ ref('my_table') }}
union all
select * from {{ ref('my_other_table') }}
{{ config(materialized = 'view', authorized_view = true) }}
select * from {{ ref('my_table') }}

Finally, I add the entry-point macro as a project on-run-end hook:

# dbt_project.yml
on-run-end:
  - "{{ grant_access_for_all_authorized_views(results) }}"

And:

$ dbt run
20:56:25  Running with dbt=1.0.0
20:56:25  Found 4 models, 0 tests, 0 snapshots, 0 analyses, 191 macros, 1 operation, 0 seed files, 0 sources, 0 exposures, 0 metrics
20:56:25
20:56:28  Concurrency: 5 threads (target='dev')
20:56:28
20:56:28  1 of 4 START table model dbt_jcohen_other_other_schema.my_other_table........... [RUN]
20:56:28  2 of 4 START table model dbt_jcohen_other_schema.my_table....................... [RUN]
20:56:31  2 of 4 OK created table model dbt_jcohen_other_schema.my_table.................. [CREATE TABLE (1.0 rows, 0 processed) in 2.77s]
20:56:31  1 of 4 OK created table model dbt_jcohen_other_other_schema.my_other_table...... [CREATE TABLE (1.0 rows, 0 processed) in 2.78s]
20:56:31  3 of 4 START view model dbt_jcohen.my_other_auth_view........................... [RUN]
20:56:31  4 of 4 START view model dbt_jcohen.my_authorized_view........................... [RUN]
20:56:32  4 of 4 OK created view model dbt_jcohen.my_authorized_view...................... [OK in 0.99s]
20:56:32  3 of 4 OK created view model dbt_jcohen.my_other_auth_view...................... [OK in 1.06s]
20:56:32
20:56:32  Running 1 on-run-end hook
20:56:32  Granting access on dataset dbt-dev-168022.dbt_jcohen_other_schema
20:56:32  ... to dbt-dev-168022.dbt_jcohen.my_authorized_view
20:56:33  ... to dbt-dev-168022.dbt_jcohen.my_other_auth_view
20:56:34  Granting access on dataset dbt-dev-168022.dbt_jcohen_other_other_schema
20:56:34  ... to dbt-dev-168022.dbt_jcohen.my_authorized_view
20:56:35  1 of 1 START hook: testy.on-run-end.0........................................... [RUN]
20:56:35  1 of 1 OK hook: testy.on-run-end.0.............................................. [OK in 0.00s]
20:56:35
20:56:35
20:56:35  Finished running 2 table models, 2 view models, 1 hook in 9.77s.
20:56:35
20:56:35  Completed successfully
20:56:35
20:56:35  Done. PASS=4 WARN=0 ERROR=0 SKIP=0 TOTAL=4

Hello @jtcohen6 and @BenLiyanage ,

Thank you for the above illustration.

I'm new to DBT and trying to implement authorised views in Bigquery and its

I tried to implement the same example which showed this PR but i'm getting the error while running it. Could you please check and advise me.

Can you please share the configuration to run authorised permissions at every model level instead of on-run-end?

step-1 : Created a macro - grant_operation.sql
{% macro grant_access_to(entities, entity_type, role, grant_target_dict) -%} {# This macro takes a list of "entities" (authorized views) and grants them access to a dataset (specified by grant_target_dict) #} {# Currently, the dbt-biquery (Python) method grant_access_to` appends entities one-by-one #}
{# It sounds like we could speed this up by making it possible to append many entities at once #}

{% for entity in entities %}
{% set entity_name = entity.database + "." + entity.schema + "." + entity.alias %}
{% do log("... to " + entity_name, info = true) %}
{% do adapter.grant_access_to(entity, entity_type, role, grant_target_dict) %}
{% endfor %}

{% endmacro %}

{% macro parse_auth_view_upstream_datasets_from_results(results) %}
{# This macro takes the Results object, available in the on-run-end context, identifies models #}
{# configured with authorized_view: true, and parses the database location of their upstream model dependencies #}
{# via the graph.nodes context variable #}
{# It returns a dictionary containing one entry per dataset to grant access on, and each of those entries #}
{# contains a list of authorized views that need access to it #}

{% if execute %}

{% set datasets_to_grant_access_on = {} %}

{% for result in results %}

{% set node = result.node %}

{% if node.config.get('authorized_view') %}

  {% for upstream_id in node.depends_on.nodes %}
    {% for upstream_node in graph.nodes.values() %}
      {% if upstream_node['unique_id'] == upstream_id %}
        {% set dataset_fqn = upstream_node.database + '.' + upstream_node.schema %}
        {% if dataset_fqn in datasets_to_grant_access_on.keys() %}
          {% do datasets_to_grant_access_on[dataset_fqn]['needs_access'].append(node) %}
        {% else %}
          {% do datasets_to_grant_access_on.update({dataset_fqn: {
            'project': upstream_node.database,
            'dataset': upstream_node.schema,
            'needs_access': [node]
          }}) %}
        {% endif %}
      {% endif %}
    {% endfor %}
  {% endfor %}
  
{% endif %}

{% endfor %}

{{ return(datasets_to_grant_access_on) }}

{% endif %}

{% endmacro %}

{% macro grant_access_for_all_authorized_views(results) %}
{# This macro is the entrypoint to the operation. It receives results, #}
{# parses them via the macro defined above, and finally loops over each dataset entry #}
{# to grant access #}

{% set datasets_to_grant_access_on = parse_auth_view_upstream_datasets_from_results(results) %}

{% for dataset_grant in datasets_to_grant_access_on %}

{% set grant_target_dict = {
  'project': datasets_to_grant_access_on[dataset_grant]['project'],
  'dataset': datasets_to_grant_access_on[dataset_grant]['dataset']
} %}

{% do log("Granting access on dataset " + dataset_grant, info = true) %}

{% do grant_access_to(
  entities = datasets_to_grant_access_on[dataset_grant]['needs_access'],
  entity_type = 'view',
  role = None,
  grant_target_dict = grant_target_dict
) %}

{% endfor %}

{% endmacro %}`

step-2 : Created a model with table config- my_table.sql
{{ config( materialized = 'table', schema = 'junifer_real_time', tags=['auth_view_test'], ) }} select 1 as id
step-3 : Created a model with authorised view config -
{{ config( materialized = 'view', authorized_view = true, tags=['auth_view_test'] ) }} select * from {{ ref('my_table') }}
step-4 : Added the on run end config to dbt_project.yml file
`on-run-end:

  • "{{ grant_access_for_all_authorized_views(results) }}"`

Error is
`(base) ➜ dbt git:(dataplat-509-dbt-dw) ✗ bulb_data_tools dbt run --models tag:auth_view_test
master: Pulling from bulb-data/data-warehouse-dbt
08:32:01 Running with dbt=1.0.3
08:32:35 Encountered an error:
Compilation Error in operation data_warehouse-on-run-end-0 (./dbt_project.yml)
'str object' has no attribute '\n'

in macro grant_access_for_all_authorized_views (macros/grant_operation.sql)
called by operation data_warehouse-on-run-end-0 (./dbt_project.yml)
Traceback (most recent call last):
File "/Users/vijaytumati/opt/miniconda3/bin/bulb_data_tools", line 33, in
sys.exit(load_entry_point('bulb-data-tools', 'console_scripts', 'bulb_data_tools')())
File "/Users/vijaytumati/bulb-data-tools/bulb_data_tools/cli/cli.py", line 296, in _main
fire.Fire(cli)
File "/Users/vijaytumati/opt/miniconda3/lib/python3.9/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
File "/Users/vijaytumati/opt/miniconda3/lib/python3.9/site-packages/fire/core.py", line 466, in _Fire
component, remaining_args = _CallAndUpdateTrace(
File "/Users/vijaytumati/opt/miniconda3/lib/python3.9/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
File "/Users/vijaytumati/bulb-data-tools/bulb_data_tools/cli/dbt/dbt.py", line 160, in run
_run_docker_command(_generate_docker_command("run", command))
File "/Users/vijaytumati/bulb-data-tools/bulb_data_tools/cli/dbt/dbt.py", line 56, in _run_docker_command
subprocess.run(terminal_command, shell=True, check=True)
File "/Users/vijaytumati/opt/miniconda3/lib/python3.9/subprocess.py", line 528, in run
raise CalledProcessError(retcode, process.args,

@jtcohen6
Copy link
Contributor

jtcohen6 commented Aug 11, 2022

I opened a new issue proposing a way we could proceed here: #267

Going to close this issue in favor of centralizing the conversation over there.

@jtcohen6 jtcohen6 closed this as not planned Won't fix, can't repro, duplicate, stale Aug 11, 2022
@qoire
Copy link

qoire commented Oct 6, 2022

@jtcohen6 thanks for suggesting the macro approach, it's working for us with slight changes to the script. We ran into errors if parse_auth_view_upstream_datasets_from_results macro returns nothing/undefined.

We've had success with moving {% set datasets_to_grant_access_on = {} %}, and the associated return out of the if block (so it always at least returns an empty dictionary).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good_first_issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants