Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable parsing of feature json columns #34

Merged
merged 9 commits into from
Apr 10, 2023
Merged

Conversation

bcodell
Copy link
Collaborator

@bcodell bcodell commented Apr 9, 2023

This PR:

  • adds a function called parse_column which returns table_alias.column and extracts the column from the activity stream's feature_json column if the column name is not identified as a standard column from the Activity Schema, then applies that function to all columns selected from all primary and appended columns in the dataset macro
  • updates an existing test (first_ever_1) to extract json features from the primary and appended activity
  • updates example__activity_stream and relevant output__<dataset> csv files so that feature_json values aren't nested in an array and so that feature_json keys use snake case formatting
  • updates documentation for the included_columns argument in the Readme

@bcodell
Copy link
Collaborator Author

bcodell commented Apr 9, 2023

@tnightengale looking for the following feedback:

  • Was not being able to specify a feature_json key as an included column a bug or did this functionality intentionally not exist? It seems like you intended to implement it since you had the json_unpack_key helper function already in place (asking for semantic versioning purposes)
  • Does the abstraction I put in place make sense, or are there changes you'd like to see?
  • Does this assumption (baked into the code make sense: that if a specified included column isn't identified as one of the standard columns (or project-specific aliases) from the Activity Schema spec, it is assumed to be contained in the feature_json?
  • Are the semantic changes I made to example__activity_stream (removing feature_json from array, snake-casing json keys) reasonable?

@bcodell bcodell requested a review from tnightengale April 9, 2023 15:18
Copy link
Owner

@tnightengale tnightengale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job! Makes sense to me!


{% set columns = dbt_activity_schema.columns() %}
{%- if column not in columns.values() -%}
{%- set parsed_column = dbt_activity_schema.json_unpack_key(table_alias ~ '.' ~ columns.feature_json, column) -%}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nice convenience feature, and we may be able to extend it in the future if we formally define types of packed columns.

@tnightengale
Copy link
Owner

@tnightengale looking for the following feedback:

  • Was not being able to specify a feature_json key as an included column a bug or did this functionality intentionally not exist? It seems like you intended to implement it since you had the json_unpack_key helper function already in place (asking for semantic versioning purposes)
  • Does the abstraction I put in place make sense, or are there changes you'd like to see?
  • Does this assumption (baked into the code make sense: that if a specified included column isn't identified as one of the standard columns (or project-specific aliases) from the Activity Schema spec, it is assumed to be contained in the feature_json?
  • Are the semantic changes I made to example__activity_stream (removing feature_json from array, snake-casing json keys) reasonable?
  1. It was not an intended feature, since folks could just add another CTE below and unpack the columns there. But this is a fine convenience feature.
  2. The abstraction being the parse_column function? Yes I think it makes sense.
  3. The assumption about an included column not in the default columns, being in feature json makes sense.
  4. Lgtm!

@bcodell bcodell changed the title [🐛] Enable parsing of feature json columns Enable parsing of feature json columns Apr 10, 2023
@bcodell
Copy link
Collaborator Author

bcodell commented Apr 10, 2023

Thanks @tnightengale! I'm going to merge this then open a PR to bump the version to 0.4.0. This feature should provide a workaround to #26 and #33 until those bugs are explicitly resolved.

@bcodell bcodell merged commit 76fdfe4 into main Apr 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants