Skip to content

Latest commit

 

History

History
 
 

latent_factor_analysis

How Latent Factor Analysis Work

Latent factors refer to the underlying factors d that actually help to explained the observed data (book ratings R in our case).
Image of Latent Factor Matrix

For a given user i, his rating of book j, where k is the number of factors:
function of rij

And we want to minimize the following error/loss function:
error function

The catch is that our rating matrix R is sparse. Therefore, there's an inherent bias in that our model only learn from movies and users who provided ratings. To prevent overfitting because of the bias, we can introduce a regularization term, Lambda:
loss function with lambda

SVD

SVD, Singular Value Decomposition, is a common method used for matrix factorization. Despite it being used mainly for dense matrices (as Hug and Davis explained), it does help us understand the basic concepts of how matrices can be decomposed.

Here, SVD decompose the dense rating matrix A into two unitary matrices U and V and a diagonal matrix sigma:

image of SVD

U represents how much each user like each feature. sigma, the diagonal matrix, is essentially the weights of each feature. And V represents how relevant each feature is to each movie.

SVD for Latent Factor Analysis

As Hug mentioned, SVD in the context of recommendation systems is not real SVD. It is only SVD inspired.

Our model for Latent Factor Analysis

Building on Hug's work and many others in the data science community, we also used SGD (stochastic gradient descent) to decompose the sparse user rating matrix in order to predict users' ratings and create recommendations.

Our Juypter Notebook illustrated the approach we took, and a comparison to Hug's surprise package's results (looking primarily on RMSE). Presentation slides are here

Readings:

Advance: