Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversion Metrics #916

Merged
merged 14 commits into from
Dec 16, 2023
Merged

Conversion Metrics #916

merged 14 commits into from
Dec 16, 2023

Conversation

WilliamDee
Copy link
Contributor

@WilliamDee WilliamDee commented Dec 4, 2023

Issue

Context in #252

Changes

Added the ability to define Conversion Metrics

Config skeleton:

metric:
  ...
  type: conversion # required
  type_params:
   conversion_type_params:
      entity: _entity_ # required
      calculation: _calculation_type_ # optional. default: conversion_rate. options: conversions(buys) or conversion_rate (buys/visits) + more to come
      base_measure: _measure_ # required
      conversion_measure: _measure_ # required
      window: _time_window_ # optional. default: inf. window to join the two events on. Follows similar format as time windows elsewhere (i.e. 7 days)
      constant_properties: # optional. List of constant properties default: None
        - base_property: _string_ # required. A reference to a dimension/entity of the semantic model linked to the base_measure
          conversion_property: _string_ # same as base above, but to the semantic model of the conversion_measure

Added AddGenerateUuidNode

For conversion metrics, we need the ability to uniquely identify rows in order to deduplicate events properly. This node simply adds a randomly generated uuid column to enable that.

Added JoinConversionEventsNode

Created a node to handle retrieving all successful conversions. This node takes the conversion and base data set and joins them against an entity and a valid time range to get successful conversions. It then deduplicates opportunities via the window function first_value to take the closest opportunity event to the corresponding conversion event. Then it returns a data set with each row representing a successful conversion. Because an opportunity event can lead to multiple conversions, there would be duplication in the final dataset from this node.

Example of this node,

VISIT DATA SOURCE
| visit_ds            | user     | referrer   
|:--------------------|:---------|:---------------
| 2020-01-01 00:00:00 | u0003141 | homepage_1 
| 2020-01-01 00:00:00 | u0003452 | fb_ad_2        
| 2020-01-01 00:00:00 | u0004114 | fb_ad_1           
| 2020-01-02 00:00:00 | u0003452 | fb_ad_1         
| 2020-01-04 00:00:00 | u0004114 | fb_ad_5
| 2020-01-07 00:00:00 | u0004114 | fb_ad_2     


BUY DATA SOURCE
| buy_ds              | user     | referrer   
|:--------------------|:---------|:---------------     
| 2020-01-02 00:00:00 | u0004114 | fb_ad_1   
| 2020-01-04 00:00:00 | u0003452 | fb_ad_1             
| 2020-01-07 00:00:00 | u0004114 | fb_ad_2 
| 2020-01-07 00:00:00 | u0003141 | homepage_1
| 2020-01-25 00:00:00 | u0004114 | google_ad_1


Result from 'JoinConversionEventsNode' on a 7 day window
| visit_ds            | buy_ds              | user     | referrer   
|:--------------------|:--------------------|:---------|:---------------
| 2020-01-01 00:00:00 | 2020-01-07 00:00:00 | u0003141 | homepage_1       
| 2020-01-01 00:00:00 | 2020-01-02 00:00:00 | u0004114 | fb_ad_1        
| 2020-01-02 00:00:00 | 2020-01-04 00:00:00 | u0003452 | fb_ad_1      
| 2020-01-07 00:00:00 | 2020-01-07 00:00:00 | u0004114 | fb_ad_2 

Conversion Metrics

After getting successful conversions from JoinConversionEventsNode, we join that count with the base opportunity count which gives us the total number of opportunities and total number of conversions that can be used to calculate the conversion_rate or conversion_count.

SELECT
  <group_by_cols>
  , conversions / opportunities AS conversion_rate
  , conversions AS conversion_count
FROM (
  -- Combine Aggregated Outputs
  SELECT
    <group_by_cols>
    , opportunities AS opportunities
    , conversions AS conversions
  FROM (
    SELECT
      <group_by_cols>
      , SUM(opps.opportunities) AS opportunities
    FROM opps
  ) 
  FULL OUTER JOIN (
    SELECT 
      <group_by_cols>
      , SUM(conversions) AS conversions
    FROM conv_node -- (JoinConversionNode)
  )
  ON
    <conv_node.group_by_cols> = <opps.group_by_cols>
  GROUP BY
    <group_by_cols>
)

@cla-bot cla-bot bot added the cla:yes label Dec 4, 2023
Copy link

github-actions bot commented Dec 4, 2023

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

1 similar comment
Copy link

github-actions bot commented Dec 4, 2023

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@WilliamDee WilliamDee force-pushed the will/conversion-metrics-renewed branch 5 times, most recently from 3026a15 to 8f9977a Compare December 8, 2023 23:45
@WilliamDee WilliamDee marked this pull request as ready for review December 8, 2023 23:46
@WilliamDee WilliamDee force-pushed the will/conversion-metrics-renewed branch 7 times, most recently from 48d660f to 42e1848 Compare December 11, 2023 20:51
@WilliamDee
Copy link
Contributor Author

Have remaining snapshots from other engines to update, but I'll regenerate those once we get an initial review

@WilliamDee WilliamDee linked an issue Dec 11, 2023 that may be closed by this pull request
@WilliamDee WilliamDee force-pushed the will/conversion-metrics-renewed branch 5 times, most recently from b74d600 to a38834f Compare December 14, 2023 23:06
Copy link
Contributor

@courtneyholcomb courtneyholcomb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost done reviewing (just have dataflow to SQL left) - leaving my initial comments here and will finish up tonight or tomorrow morning!

metricflow/model/semantics/semantic_model_lookup.py Outdated Show resolved Hide resolved
metricflow/specs/specs.py Show resolved Hide resolved
metricflow/dataflow/builder/dataflow_plan_builder.py Outdated Show resolved Hide resolved
metricflow/dataflow/builder/dataflow_plan_builder.py Outdated Show resolved Hide resolved
under the condition of being within the time range and other conditions (such as constant properties).
This builds the join description to satisfy all those conditions.
"""
window_condition = SqlQueryPlanJoinBuilder._make_time_range_window_join_condition(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this return what we expect if there is no window for the metric ? I'm not sure we have a test case for that yet 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I basically just consolidated _make_time_range_window_join_condition from the cumulative join logic since they were the exact same with some differences in the beginning. It has some logic around None windows. So the logic is essentially this

a.ds <= b.ds AND a.ds > b.ds - <window>
If no window is present we join across all time -> "a.ds <= b.ds"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right I think the part that confused me was - why do we use a.ds <= b.ds for an all time join? 🤔 (I know that's not new logic here though)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not all time join, we use a join with a INF backwards looking window. So instead of looking for any base events that converted within an X day window, we just say any base event that converted. Thus, we have a.ds <= b.ds which would render out to base.ds <= conversion.ds

@WilliamDee WilliamDee force-pushed the will/conversion-metrics-renewed branch 3 times, most recently from ce85cb5 to 50ea3fe Compare December 15, 2023 05:07
@WilliamDee WilliamDee added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Dec 16, 2023
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS December 16, 2023 07:31 — with GitHub Actions Inactive
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS December 16, 2023 07:31 — with GitHub Actions Inactive
@WilliamDee WilliamDee force-pushed the will/conversion-metrics-renewed branch from 2be3d4a to 8eec5a0 Compare December 16, 2023 08:01
@WilliamDee WilliamDee added Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment and removed Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment labels Dec 16, 2023
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS December 16, 2023 08:02 — with GitHub Actions Inactive
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS December 16, 2023 08:02 — with GitHub Actions Inactive
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS December 16, 2023 08:02 — with GitHub Actions Inactive
@WilliamDee WilliamDee removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Dec 16, 2023
@WilliamDee WilliamDee force-pushed the will/conversion-metrics-renewed branch from 8eec5a0 to d14eed6 Compare December 16, 2023 08:32
@WilliamDee WilliamDee added the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Dec 16, 2023
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS December 16, 2023 08:32 — with GitHub Actions Inactive
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS December 16, 2023 08:32 — with GitHub Actions Inactive
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS December 16, 2023 08:32 — with GitHub Actions Inactive
@WilliamDee WilliamDee temporarily deployed to DW_INTEGRATION_TESTS December 16, 2023 08:33 — with GitHub Actions Inactive
@github-actions github-actions bot removed the Run Tests With Other SQL Engines Runs the test suite against the SQL engines in our target environment label Dec 16, 2023
@WilliamDee WilliamDee merged commit b63619a into main Dec 16, 2023
22 checks passed
@WilliamDee WilliamDee deleted the will/conversion-metrics-renewed branch December 16, 2023 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[SL-1290] Conversion Metrics
2 participants