Include `data_type` in the output of `generate_model_yaml` #120

dbeatty10 · 2023-03-22T16:28:51Z

Describe the feature

When generating output, codegen should have an option to include data_type like the following:

    columns:
      - name: account_id
        data_type: string

Furthermore, this option should probably default to true. It possibly shouldn't even be optional, but just always be included in the generated output.

See here for more context: https://github.com/dbt-labs/internal-analytics/pull/1418#discussion_r1144991800

Describe alternatives you've considered

An alternative is to copy-paste data_type by hand into the YAML. These would be copied from an information_schema.columns query or show columns, etc.

Additional context

Side note to spin out into a separate issue: reconciling greenfield YAML from codegen with pre-existing YAML is annoying / time-consuming. Can we write some kind of guidance on easier ways to reconcile two YAML files?

Who will this benefit?

With the launch of data contracts in the dbt-core 1.5.0 release, it's handy to automatically generate the relevant data_types for pre-existing models.

Are you interested in contributing this feature?

Happy to review the PR if someone else wants to take this on.

The text was updated successfully, but these errors were encountered:

linbug · 2023-04-07T15:27:12Z

@dbeatty10 would something like this be suitable? My concern now is that all tests that invoke this macro will need to include logic for all target types, since they use different data type names (e.g. text in postgres becomes string in bigquery). That might be cumbersome. Do you have any feedback/ suggestions?

dbeatty10 · 2023-04-07T17:36:15Z

That PR is looking good @linbug !

Absolutely agreed that the testing part looks cumbersome. #76 ran into something similar. We discussed and tried a couple options before settling on an approach.

There are a few options for your PR:

Adopt a similar testing approach as #76
Try some other testing approach
Skip tests altogether

Want to take a look at these options and let me know what you think?

You won't hear me say this too often, but I might be comfortable with skipping tests in this case if it has the least downsides of the options.

dbeatty10 · 2023-04-07T17:58:17Z

Always-on vs. optional

In the original issue description, it said:

this option should probably default to true. It possibly shouldn't even be optional, but just always be included in the generated output.

It is easy enough to just add a new include_data_types parameter though (defaulting to true) -- I think we should do it!

Thought process

For users, this would preserve optionality while still putting forward default behavior we're assuming would be best for the greatest number of users.

generate_source already has an include_data_types (optional, default=False) parameter, so it would align those names (even though the default would differ). Maybe we should change the default for generate_source at the same time?

linbug · 2023-04-07T19:54:32Z

Thanks @dbeatty10 for the quick and detailed responses!

For the testing, I had a look through that previous PR that you linked. I think that adopting the same approach (using text_type_value and integer_type_value macros) is the simplest (and easiest to read) solution for now and would align with what has already been shipped for generate_source. This could always be refactored in the future if we find a better approach. I don't think that skipping tests is an option without removing the existing tests, although perhaps we could adopt something similar to the tests for generate_source, where we make a include_data_types parameter and set it to false in the all tests but one (perhaps generate_model_yaml). There isn't additional value to testing this functionality in more than one test anyway.

How does that sound?

I'm aligned on adding a include_data_types parameter and defaulting to true. I do think that it makes sense to have both generate_source and generate_model_yaml take the same default, and personally I'm in favour of this being true (as long as this doesn't break anything for existing users)! I'd be happy to update that in this or a separate PR.

linbug · 2023-04-07T20:53:26Z

Something else I just noticed is that in generate_source, data_type is uppercase whereas here I'm lowering. We should be consistent about this too. Any preference? According to the dbt style guide, we should default to lowercase.

dbeatty10 · 2023-04-08T17:53:27Z

@linbug Your proposals for the testing sound good -- let's do it as you proposed.

For the common include_data_types parameter within generate_source and generate_model_yaml, let's do the following:

default both to true
lowercase both to align with the dbt style guide

It's fine if the changes related to generate_source are made within this PR or within a separate PR -- up to you.

Since there will be changes to how generate_source behaves by default, let's make sure to update the README and also call this out in the changelog.

The current version of dbt_codegen is 0.9.0, and we'll make sure to bump the next version to 0.10.0 (or maybe even 1.0.0!) so these changes don't break anything for folks that have the 0.9.x series as an upper bound.

linbug · 2023-04-14T17:05:54Z

@dbeatty10 I've made those changes now, and signed the CLA. The PR is ready for review.

dbeatty10 added enhancement New feature or request good first issue Good for newcomers labels Mar 22, 2023

linbug mentioned this issue Apr 7, 2023

Add include_data_types argument to generate_model_yaml macro #122

Merged

9 tasks

dbeatty10 closed this as completed in #122 Sep 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Include `data_type` in the output of `generate_model_yaml` #120

Include `data_type` in the output of `generate_model_yaml` #120

dbeatty10 commented Mar 22, 2023

linbug commented Apr 7, 2023 •

edited

Loading

dbeatty10 commented Apr 7, 2023

dbeatty10 commented Apr 7, 2023

linbug commented Apr 7, 2023

linbug commented Apr 7, 2023

dbeatty10 commented Apr 8, 2023

linbug commented Apr 14, 2023

Include data_type in the output of generate_model_yaml #120

Include data_type in the output of generate_model_yaml #120

Comments

dbeatty10 commented Mar 22, 2023

Describe the feature

Describe alternatives you've considered

Additional context

Who will this benefit?

Are you interested in contributing this feature?

linbug commented Apr 7, 2023 • edited Loading

dbeatty10 commented Apr 7, 2023

dbeatty10 commented Apr 7, 2023

Always-on vs. optional

Thought process

linbug commented Apr 7, 2023

linbug commented Apr 7, 2023

dbeatty10 commented Apr 8, 2023

linbug commented Apr 14, 2023

Include `data_type` in the output of `generate_model_yaml` #120

Include `data_type` in the output of `generate_model_yaml` #120

linbug commented Apr 7, 2023 •

edited

Loading