Latent factors refer to the underlying factors d
that actually help to explained the observed data (book ratings R
in our case).
For a given user i
, his rating of book j
, where k
is the number of factors:
And we want to minimize the following error/loss function:
The catch is that our rating matrix R
is sparse. Therefore, there's an inherent bias in that our model only learn from movies and users who provided ratings. To prevent overfitting because of the bias, we can introduce a regularization term, Lambda:
SVD, Singular Value Decomposition, is a common method used for matrix factorization. Despite it being used mainly for dense matrices (as Hug and Davis explained), it does help us understand the basic concepts of how matrices can be decomposed.
Here, SVD decompose the dense rating matrix A
into two unitary matrices U
and V
and a diagonal matrix sigma
:
U
represents how much each user like each feature. sigma
, the diagonal matrix, is essentially the weights of each feature. And V
represents how relevant each feature is to each movie.
As Hug mentioned, SVD in the context of recommendation systems is not real SVD. It is only SVD inspired.
Building on Hug's work and many others in the data science community, we also used SGD (stochastic gradient descent) to decompose the sparse user rating matrix in order to predict users' ratings and create recommendations.
Our Juypter Notebook illustrated the approach we took, and a comparison to Hug's surprise package's results (looking primarily on RMSE). Presentation slides are here
- How Netflix recommend movies? is a great intro video on YouTube by Luis Serrano, Head of Content in AI and Data Science at Udacity
- Understanding matrix factorization for recommendation by Nicolas Hug, the creator of the surprise library, on how his SVD implementation work. In part 3, he discusses at length on how to do SVD on a sparse matrix.
- Recommendations by Ilan Man, Head of Data at TrialSpark
- Matrix Factorization: A Simple Tutorial and Implementation in Python by Albert Yeung, ML engineer at zwoop
- Movie Recommendation using Regularized Matrix factorization: a GitHub Repo
- The 4 Recommendation Engines That Can Predict Your Movie Tastes with example code on Github
- A Gentle Guide to Recommender Systems with Surprise with example code on Github
- Gradient Descent in Python by Toward Data Science with sample code for GD, SGD, and mini batch SGD
- Kat Bailey wrote also implemented Matrix Factorization in TensorFlow along with this awesome GitPitch
- A Gentle Introduction to Recommender Systems with Implicit Feedback by Jesse Steinweg-Woods, a Senior Data Scientist at Honey
- Collaborative Filtering for Implicit Feedback Datasets