Skip to content

A collaborative filtering based web asset recommendation system

Notifications You must be signed in to change notification settings

RickyCordero/recommender-system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

recommender-system

A collaborative filtering based web asset recommendation system. Using implicit user clickstream data, recommend the next best asset to be consumed to maximize user engagement on a website.

Motivation

The goal of a digital, top-of-the-funnel, b2b marketing team is to maximize the number of contact form submissions generated by unique visitors on the company website to help the business maximize sales leads. Each contact form serves as a gate between the visitor and a single digital asset. Each digital asset corresponds to a piece of content a visitor may find informative, like a pdf of a business case study or a product demo, and can be promoted on a webpage of its own accompanied by the contact form. Thus, a visitor's submission of a contact form represents some degree of future interest in purchasing a company product. By serving more relevant asset webpages to these visitors, we can create a more personalized website experience translating into increased visitor engagement on the website and increased likelihood of a future sales lead.

Using Adobe Analytics, a powerful website analytics platform, we can track which assets are requested via form submission by each visitor as well as the number of forms submitted by each visitor and the cookie of each visitor. We can use this data to create a system that learns the preferences of visitors based on their asset consumption history and recommends an asset webpage aligning best with their interests. In this project, we achieve the above with a technique called collaborative filtering and refer to the resulting system as the recommender.

Data

All form submission data used in the project is authentic and spans the activity of the top 1,000 form submitting visitors over a time period of 2 months on the .com website of the company. The gated_asset_type field is available for constructing the relative weights of each type of asset. These weights can be useful in biasing the recommender toward a certain class to achieve a certain behaviour. Each page url is hashed in this example and makes no difference for recommendations. This data can be used as is to create a working recommender, or can be changed as needed for different content systems.

Getting Started

These instructions will get you a copy of a working beta for the stated project and will highlight a few relevant sections of the provided notebook which can be run on your local machine for testing and development purposes.

Installing

Create and activate a Python virtual environment

virtualenv venv
source venv/Scripts/activate

Install the required packages

pip install -r requirements.txt

Running Notebook

Start the Jupyter notebook

jupyter notebook

load the notebook titled:

asset_model.ipynb

Utilizing Notebook

Load the data

Inside the notebook, first load the necessary data:

Alt text

  • cookie: A categorical variable representing a unique website visitor
  • gated_asset_type: A categorical variable representing the type of asset consumed by the website visitor
  • page_url: A categorical variable representing the webpage url promoting the consumed asset

Each row in the above table corresponds to an instance of a website visitor submitting a contact form and obtaining some asset promoted on some webpage. This data was obtained through Adobe Analytics and is representative of only a small fraction of traffic in a small time window.

Alt text

This table represents the company's mapping of internal business units to asset webpages on the .com website and was extracted from a pre-existing standalone spreadsheet independent of the project time window. By joining this data with the above asset consumption data, we can aggregate results of any analysis on the recommender by business unit. This will prove useful when generating quarterly reports for stakeholders across each business unit after model deployment.

Using the data

Alt text

  • cookie_id: A categorical variable representing a unique website visitor
  • interactions: A quantitative variable representing a unique (up to asset) form submission occurrence by a website visitor
  • form_submissions: A quantitative variable representing a (potentially duplicated) form submission occurence by a website visitor

This table aggregates the number of interactions and form submissions per unique visitor. This will prove useful in analyzing recommender performance.

Determining average asset similarity

Alt text

This plot describes form similarity for each business unit and serves as an example of an analysis that can be run to understand the performance of the recommender for each business unit. A high average asset similarity score for a business unit may indicate better recommender performance in that business unit.

Analyzing model performance

Alt text

This plot represents the distribution of hit rate metrics achieved across all visitor recommendations and is useful in assessing the performance of the recommender statistically. When model parameters are updated or tuned to improve performance, this plot will be useful in determining successful changes. Hit rate is defined to be the fraction of top 10 recommended asset webpages matching the business unit promoted on any asset webpages for which a visitor submits a contact form. Thus, the higher the hit rate, the better the performance of the recommender in understanding visitors' preferences.

Alt text

This plot reflects the relationship between user form submission patterns and recommender performance via hit rate. The assumption in applying collaborative filtering techniques is that more user ratings data will lead to better content recommendations. In using form submissions to weight visitor ratings for assets, an increase in form submissions per visitor should reflect an increase in hit rate. Improvements to the model will be reflected in this plot as well. The goal is to see a linear relationship between form submissions, interactions, and hit rate, i.e. when form submissions and interactions increase, hit rate does as well.

Acknowledgments

About

A collaborative filtering based web asset recommendation system

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published