Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add include_data_types argument to generate_model_yaml macro #122

Merged
merged 26 commits into from
Sep 25, 2023
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
0c3c737
Add column data types
Apr 7, 2023
e2f5b09
Update tests to include data_type field
Apr 7, 2023
6acccf5
Update macros to produce lowercase strings
Apr 8, 2023
ad019b2
Update generate_source_yaml to lower data_type field
Apr 8, 2023
0de3eb7
Add include_data_types arg to generate_model_yaml
Apr 8, 2023
12f21b4
Update default include_data_types in generate_source
Apr 8, 2023
4ff820a
Update README and changelog
Apr 8, 2023
43e7b7d
source -> model
Apr 8, 2023
a76c139
Use dbt.formt_column or fall back to vendored version
Apr 28, 2023
f638309
Fix typo
Apr 28, 2023
1cebade
Add missing spaces
Apr 28, 2023
39b88ab
Remove args from text_type_value and integer_type_value
Apr 28, 2023
1e55d67
Final newline
Apr 28, 2023
c1f5344
We need these parentheses
Apr 28, 2023
354865f
snowflake wants varchar
Apr 28, 2023
8320981
elif
Apr 28, 2023
e61af4c
Merge branch 'main' into add_types_to_generate_yaml
dbeatty10 Aug 30, 2023
5e08a1a
Vendor the `format_column` macro from dbt-core into dbt-codegen
dbeatty10 Sep 25, 2023
b2f463c
Completely rely on vendored version of the `format_column` macro
dbeatty10 Sep 25, 2023
09ba319
Default `include_data_types=True` for the `generate_source` macro
dbeatty10 Sep 25, 2023
db521b2
Alert to change in behaviors for `generate_source`
dbeatty10 Sep 25, 2023
d5aaa2b
Fix example by using a boolean instead of a string that will not work
dbeatty10 Sep 25, 2023
8b4f258
Example how to configure `include_data_types` to align with previous …
dbeatty10 Sep 25, 2023
2471363
Standardize formatting of default values
dbeatty10 Sep 25, 2023
c3abd44
Fix format_column is undefined error
dbeatty10 Sep 25, 2023
6f6aea6
Enable custom overrides of data type formatting via multiple dispatch
dbeatty10 Sep 25, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@
This macro generates a series of terminal commands (appended w) bash script which creates a new file in your dbt project based off the results of the [generate_base_model](macros/generate_base_model.sql) macro. Therefore, instead of outputting in the terminal, it will create the file for you.
- Add `include_data_types` flag to `generate_source` macro ([#76](https://github.com/dbt-labs/dbt-codegen/pull/76))
- Add `get_models` macro in helper macros. This macro retrieves a list of models with specified prefix at the specified directory. It is designed to make creating yamls for multiple models easier.
- Add `include_data_types` flag to `generate_model_yaml` macro ([#122](https://github.com/dbt-labs/dbt-codegen/pull/122))

## Fixes
- Fix handling of nested `STRUCT` fields in BigQuery ([#98](https://github.com/dbt-labs/dbt-codegen/issues/98), [#105](https://github.com/dbt-labs/dbt-codegen/pull/105))
Expand All @@ -27,6 +28,7 @@ This macro generates a series of terminal commands (appended w) bash script whic
## Contributors:
- [@fivetran-joemarkiewicz](https://github.com/fivetran-joemarkiewicz) (#83)
- [@GSokol](https://github.com/GSokol) (#76)
- [@linbug](https://github.com/linbug) (#120)

# dbt-codegen v0.9.0

Expand Down
10 changes: 7 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,7 +50,7 @@ source data is in.
column names to your source definition.
* `include_descriptions` (optional, default=False): Whether you want to add
description placeholders to your source definition.
* `include_data_types` (optional, default=False): Whether you want to add data
* `include_data_types` (optional, default=True): Whether you want to add data
linbug marked this conversation as resolved.
Show resolved Hide resolved
types to your source columns definitions.
* `table_pattern` (optional, default='%'): A table prefix / postfix that you
want to subselect from all available tables within a given schema.
Expand All @@ -77,10 +77,10 @@ or
$ dbt run-operation generate_source --args '{"schema_name": "jaffle_shop", "database_name": "raw", "table_names":["table_1", "table_2"]}'
```

Including data types:
or if you want to include column names and data types:

```
$ dbt run-operation generate_source --args '{"schema_name": "jaffle_shop", "generate_columns": "true", "include_data_types": "true"}'
$ dbt run-operation generate_source --args '{"schema_name": "jaffle_shop", "generate_columns": "true"}'
```

2. The YAML for the source will be logged to the command line
Expand Down Expand Up @@ -197,6 +197,7 @@ schema.yml file.
### Arguments:
* `model_names` (required): The model(s) you wish to generate YAML for.
* `upstream_descriptions` (optional, default=False): Whether you want to include descriptions for identical column names from upstream models.
* `include_data_types` (optional, default=True): Whether you want to add data types to your model column definitions.

### Usage:
1. Create a model.
Expand Down Expand Up @@ -230,10 +231,13 @@ version: 2

models:
- name: customers
description: ""
columns:
- name: customer_id
data_type: integer
description: ""
- name: customer_name
data_type: text
description: ""
```

Expand Down
6 changes: 3 additions & 3 deletions integration_tests/macros/integer_type_value.sql
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
{%- macro integer_type_value() -%}
{%- if target.type == "snowflake" -%}
NUMBER(38,0)
number(38,0)
{%- elif target.type == "bigquery" -%}
INT64
int64
{%- else -%}
INTEGER
integer
{%- endif -%}
{%- endmacro -%}
8 changes: 4 additions & 4 deletions integration_tests/macros/text_type_value.sql
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
{%- macro text_type_value(text_length) -%}
{%- if target.type == "redshift" -%}
CHARACTER VARYING({{ text_length }})
character varying({{ text_length }})
{%- elif target.type == "snowflake" -%}
CHARACTER VARYING(16777216)
character varying(16777216)
{%- elif target.type == "bigquery" -%}
STRING
string
{%- else -%}
TEXT
text
{%- endif -%}
{%- endmacro -%}
3 changes: 2 additions & 1 deletion integration_tests/tests/test_generate_model_struct_yaml.sql
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@
) %}

{% set actual_source_yaml = codegen.generate_model_yaml(
model_names=['model_struct']
model_names=['model_struct'],
include_data_types=False
)
%}

Expand Down
2 changes: 2 additions & 0 deletions integration_tests/tests/test_generate_model_yaml.sql
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@ models:
description: ""
columns:
- name: col_a
data_type: {{ integer_type_value() }}
description: ""

- name: col_b
data_type: {{ text_type_value(1) }}
description: ""

{% endset %}
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
{% set actual_model_yaml = codegen.generate_model_yaml(
model_names=['data__a_relation','data__b_relation']
model_names=['data__a_relation','data__b_relation'],
include_data_types=False
)
%}

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
{% set actual_model_yaml = codegen.generate_model_yaml(
model_names=['child_model'],
upstream_descriptions=True
upstream_descriptions=True,
include_data_types=False
)
%}

Expand Down
3 changes: 2 additions & 1 deletion integration_tests/tests/test_generate_source_some_tables.sql
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
database_name=target.database,
table_names=['data__a_relation'],
generate_columns=True,
include_descriptions=True
include_descriptions=True,
include_data_types=False
) %}


Expand Down
11 changes: 7 additions & 4 deletions macros/generate_model_yaml.sql
Original file line number Diff line number Diff line change
@@ -1,23 +1,26 @@
{% macro generate_column_yaml(column, model_yaml, column_desc_dict, parent_column_name="") %}
{% macro generate_column_yaml(column, model_yaml, column_desc_dict, include_data_types, parent_column_name="") %}
{% if parent_column_name %}
{% set column_name = parent_column_name ~ "." ~ column.name %}
{% else %}
{% set column_name = column.name %}
{% endif %}

{% do model_yaml.append(' - name: ' ~ column_name | lower ) %}
{% if include_data_types %}
{% do model_yaml.append(' data_type: ' ~ column.data_type | lower) %}
Copy link
Contributor

@jtcohen6 jtcohen6 Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the adapter / data platform, we may want this to be column.dtype instead of column.data_type — I believe that will enable this to return text or varchar instead of character varying(1234).

We could use the dbt.format_column macro, which is also what's used when doing the column type assertion for contracted models.

{% set formatted = adapter.dispatch('format_column', 'dbt')(column) %}
{% do model_yaml.append('        data_type: ' ~ formatted['data_type'] | lower) %}

If we do make this change, the format_column macro is new in v1.5, so this would require a version bump to codegen and to its require-dbt-version

Copy link
Contributor

@dbeatty10 dbeatty10 Apr 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jtcohen6 good point about column.dtype vs. column.data_type!

Using dbt.format_column macro

I like your idea of using the logic within the dbt.format_column macro. I think we should aim to preserve the wide range of require-dbt-version: [">=1.0.0", "<2.0.0"] though.

There's multiple approaches we could take to use this logic into codegen without forcing an upgrade to 1.5.0. The quick'n'dirty option is to "vendor" it by just copy-pasting it into codegen. Another option is to inspect the dbt version, and fall back to a vendored version of the macro if it is less than 1.5.0.

Difference between dtype and data_type

My understanding of the difference between the two is dtype gives the database-specific data type with no size, scale, or precision (like varchar or decimal) whereas data_type is intended to give the database-specific data type with those included (like varchar(80) or decimal(18, 2)).

Although dbt model contracts can operate with either input, it will effectively ignore any size/precision/scale that is supplied.

So best would be to exclude size/precision/scale to avoid any false impressions that it will be verified as part of the dbt model contract.

💡 We should fix column.data_type so its value is always valid. It feels very doable to make it work across adapters, and relevant issues opened here, here, and here.

💡 Once column.data_type is fixed, we should consider expanding dbt.format_column() to include it as well.

Copy link
Contributor Author

@linbug linbug Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I added this which I feel should work, but for some reason my tests fail locally because formatted is empty. My dbt version is 1.5.0-b5 which is maybe pre-release and doesn't have dbt.format_columns yet? If you have suggestions for getting around this please let me know, otherwise I can use the vendored version for everything.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbeatty10 @jtcohen6 any thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dave-connors-3 @benmosher we may want to get this across the line now that generate model is live in the IDE. would be a good improvement to the functionality.

{% endif %}
{% do model_yaml.append(' description: "' ~ column_desc_dict.get(column.name | lower,'') ~ '"') %}
{% do model_yaml.append('') %}

{% if column.fields|length > 0 %}
{% for child_column in column.fields %}
{% set model_yaml = codegen.generate_column_yaml(child_column, model_yaml, column_desc_dict, parent_column_name=column_name) %}
{% set model_yaml = codegen.generate_column_yaml(child_column, model_yaml, column_desc_dict, include_data_types, parent_column_name=column_name) %}
{% endfor %}
{% endif %}
{% do return(model_yaml) %}
{% endmacro %}

{% macro generate_model_yaml(model_names=[], upstream_descriptions=False) %}
{% macro generate_model_yaml(model_names=[], upstream_descriptions=False, include_data_types=True) %}

{% set model_yaml=[] %}

Expand All @@ -38,7 +41,7 @@
{% set column_desc_dict = codegen.build_dict_column_descriptions(model) if upstream_descriptions else {} %}

{% for column in columns %}
{% set model_yaml = codegen.generate_column_yaml(column, model_yaml, column_desc_dict) %}
{% set model_yaml = codegen.generate_column_yaml(column, model_yaml, column_desc_dict, include_data_types) %}
{% endfor %}
{% endfor %}
{% endif %}
Expand Down
4 changes: 2 additions & 2 deletions macros/generate_source.sql
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@


---
{% macro generate_source(schema_name, database_name=target.database, generate_columns=False, include_descriptions=False, include_data_types=False, table_pattern='%', exclude='', name=schema_name, table_names=None) %}
{% macro generate_source(schema_name, database_name=target.database, generate_columns=False, include_descriptions=False, include_data_types=True, table_pattern='%', exclude='', name=schema_name, table_names=None) %}

{% set sources_yaml=[] %}
{% do sources_yaml.append('version: 2') %}
Expand Down Expand Up @@ -62,7 +62,7 @@
{% for column in columns %}
{% do sources_yaml.append(' - name: ' ~ column.name | lower ) %}
{% if include_data_types %}
{% do sources_yaml.append(' data_type: ' ~ (column.data_type | upper ) ) %}
{% do sources_yaml.append(' data_type: ' ~ (column.data_type | lower ) ) %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, we may want to use .dtype instead of .data_type, depending on the data platform — and the format_column macro (new in v1.5) could be our friend here.

{% endif %}
{% if include_descriptions %}
{% do sources_yaml.append(' description: ""' ) %}
Expand Down