A collaborative filtering based web asset recommendation system. Using implicit user clickstream data, recommend the next best asset to be consumed to maximize user engagement on a website.
The goal of a digital, top-of-the-funnel, b2b marketing team is to maximize the number of contact form submissions generated by unique visitors on the company website to help the business maximize sales leads. Each contact form serves as a gate between the visitor and a single digital asset. Each digital asset corresponds to a piece of content a visitor may find informative, like a pdf of a business case study or a product demo, and can be promoted on a webpage of its own accompanied by the contact form. Thus, a visitor's submission of a contact form represents some degree of future interest in purchasing a company product. By serving more relevant asset webpages to these visitors, we can create a more personalized website experience translating into increased visitor engagement on the website and increased likelihood of a future sales lead.
Using Adobe Analytics, a powerful website analytics platform, we can track which assets are requested via form submission by each visitor as well as the number of forms submitted by each visitor and the cookie of each visitor. We can use this data to create a system that learns the preferences of visitors based on their asset consumption history and recommends an asset webpage aligning best with their interests. In this project, we achieve the above with a technique called collaborative filtering and refer to the resulting system as the recommender.
All form submission data used in the project is authentic and spans the activity of the top 1,000 form submitting visitors over a time period of 2 months on the .com website of the company. The gated_asset_type
field is available for constructing the relative weights of each type of asset. These weights can be useful in biasing the recommender toward a certain class to achieve a certain behaviour. Each page url is hashed in this example and makes no difference for recommendations. This data can be used as is to create a working recommender, or can be changed as needed for different content systems.
These instructions will get you a copy of a working beta for the stated project and will highlight a few relevant sections of the provided notebook which can be run on your local machine for testing and development purposes.
Create and activate a Python virtual environment
virtualenv venv
source venv/Scripts/activate
Install the required packages
pip install -r requirements.txt
Start the Jupyter notebook
jupyter notebook
load the notebook titled:
asset_model.ipynb
Inside the notebook, first load the necessary data:
cookie
: A categorical variable representing a unique website visitorgated_asset_type
: A categorical variable representing the type of asset consumed by the website visitorpage_url
: A categorical variable representing the webpage url promoting the consumed asset
Each row in the above table corresponds to an instance of a website visitor submitting a contact form and obtaining some asset promoted on some webpage. This data was obtained through Adobe Analytics and is representative of only a small fraction of traffic in a small time window.
This table represents the company's mapping of internal business units to asset webpages on the .com website and was extracted from a pre-existing standalone spreadsheet independent of the project time window. By joining this data with the above asset consumption data, we can aggregate results of any analysis on the recommender by business unit. This will prove useful when generating quarterly reports for stakeholders across each business unit after model deployment.
cookie_id
: A categorical variable representing a unique website visitorinteractions
: A quantitative variable representing a unique (up to asset) form submission occurrence by a website visitorform_submissions
: A quantitative variable representing a (potentially duplicated) form submission occurence by a website visitor
This table aggregates the number of interactions and form submissions per unique visitor. This will prove useful in analyzing recommender performance.
This plot describes form similarity for each business unit and serves as an example of an analysis that can be run to understand the performance of the recommender for each business unit. A high average asset similarity score for a business unit may indicate better recommender performance in that business unit.
This plot represents the distribution of hit rate metrics achieved across all visitor recommendations and is useful in assessing the performance of the recommender statistically. When model parameters are updated or tuned to improve performance, this plot will be useful in determining successful changes. Hit rate is defined to be the fraction of top 10 recommended asset webpages matching the business unit promoted on any asset webpages for which a visitor submits a contact form. Thus, the higher the hit rate, the better the performance of the recommender in understanding visitors' preferences.
This plot reflects the relationship between user form submission patterns and recommender performance via hit rate. The assumption in applying collaborative filtering techniques is that more user ratings data will lead to better content recommendations. In using form submissions to weight visitor ratings for assets, an increase in form submissions per visitor should reflect an increase in hit rate. Improvements to the model will be reflected in this plot as well. The goal is to see a linear relationship between form submissions, interactions, and hit rate, i.e. when form submissions and interactions increase, hit rate does as well.
- Inspired by the work of Susan Li (https://towardsdatascience.com/building-a-collaborative-filtering-recommender-system-with-clickstream-data-dffc86c8c65)