Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix audits, add project as a parameter #42

Open
wants to merge 8 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.idea
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,11 @@ To install the package add the package path to the `packages.yml` file in your d

In order to use the model audit post-hook the following variables have to be set in your `dbt_project.yml` file.

| Variable | Description |
| --------------------- | -------------------------- |
| `dbt_ml:audit_schema` | Schema of the audit table. |
| `dbt_ml:audit_table` | Name of the audit table. |
| Variable | Description |
| ---------------------- | --------------------------------- |
| `dbt_ml:audit_database`| Name of the GCP Project to use. |
| `dbt_ml:audit_schema` | Schema of the audit table. |
| `dbt_ml:audit_table` | Name of the audit table. |

You will also need to specify the post-hook in your `dbt_project.yml` file<sup>[1]</sup> as `{{ dbt_ml.model_audit() }}`. Optionally, you can use the `dbt_ml.create_model_audit_table()` macro to create the audit table automatically if it does not exist - for example in an on-run-start hook.

Expand All @@ -20,6 +21,7 @@ Example config for `dbt_project.yml` below:
vars:
"dbt_ml:audit_schema": "audit"
"dbt_ml:audit_table": "ml_models"
"dbt_ml:audit_database": "database"
on-run-start:
- '{% do adapter.create_schema(api.Relation.create(target.project, "audit")) %}'
- "{{ dbt_ml.create_model_audit_table() }}"
Expand Down
6 changes: 3 additions & 3 deletions macros/hooks/model_audit.sql
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
'schema': 'string',
'created_at': type_timestamp(),
'training_info': 'array<struct<training_run int64, iteration int64, loss float64, eval_loss float64, learning_rate float64, duration_ms int64, cluster_info array<struct<centroid_id int64, cluster_radius float64, cluster_size int64>>>>',
'feature_info': 'array<struct<input string, min float64, max float64, mean float64, median float64, stddev float64, category_count int64, null_count int64>>',
'feature_info': 'array<struct<input string, min float64, max float64, mean float64, median float64, stddev float64, category_count int64, null_count int64, dimension int64>>',
'weights': 'array<struct<processed_input string, weight float64, category_weights array<struct<category string, weight float64>>>>',
'evaluate': 'array<struct<precision float64, recall float64, accuracy float64, f1_score float64, log_loss float64, roc_auc float64>>',
}) %}
Expand Down Expand Up @@ -100,7 +100,7 @@ tensorflow: {}

{% set info_types = ['training_info', 'feature_info', 'weights', 'evaluate'] %}

insert `{{ target.database }}.{{ var('dbt_ml:audit_schema') }}.{{ var('dbt_ml:audit_table') }}`
insert `{{ var('dbt_ml:audit_database') }}.{{ var('dbt_ml:audit_schema') }}.{{ var('dbt_ml:audit_table') }}`
(model, schema, created_at, {{ info_types | join(', ') }})

select
Expand All @@ -125,7 +125,7 @@ tensorflow: {}
{% macro create_model_audit_table() %}
{%- set audit_table =
api.Relation.create(
database=target.database,
database=var('dbt_ml:audit_database'),
schema=var('dbt_ml:audit_schema'),
identifier=var('dbt_ml:audit_table'),
type='table'
Expand Down
9 changes: 1 addition & 8 deletions macros/materializations/model.sql
Original file line number Diff line number Diff line change
Expand Up @@ -19,18 +19,11 @@
{% endmacro %}

{% macro model_options(ml_config, labels) %}
{%- if labels -%}
{%- set label_list = [] -%}
{%- for label, value in labels.items() -%}
{%- do label_list.append((label, value)) -%}
{%- endfor -%}
{%- do ml_config.update({'labels': label_list}) -%}
{%- endif -%}

{% set options -%}
options(
{%- for opt_key, opt_val in ml_config.items() -%}
{%- if opt_val is sequence and not (opt_val | first) is number and (opt_val | first).startswith('hparam_') -%}
{%- if opt_val is sequence and (opt_val | first) is string and (opt_val | first).startswith('hparam_') -%}
{{ opt_key }}={{ opt_val[0] }}({{ opt_val[1:] | join(', ') }})
{%- else -%}
{{ opt_key }}={{ (opt_val | tojson) if opt_val is string else opt_val }}
Expand Down