This project generates Recommender system by collaborative filtering methods. It can be divided into Memory-based Collaborative Filtering and Model-based Collaborative Filtering. In Memory-based Collaborative Filtering The recommendation results are based on the similarity of between users and between movies. To calculate the similarity, we used cosine distance. Otherwise, for the Model-based Collaborative Filtering the used model is Matrix Factorization. The idea behind such models is that attitudes or preferences of a user can be determined by a small number of hidden latent factors. These factors are also called Embeddings, which represent different characteristics for users and items.
The memory-based algorithm contains following steps:
Data Preparation including train_test_split Transform from DataFrame to matrix of users and matrix of movies Calculate the user-user similarity matrix and movie-movie similarity by using Sklearn.pairwise Make prediction for the ratings of each user with regard to each movie Sort the ratings to provide top 10 movies for each user.
The model-based algorithm contains following steps:
Data Preparation including train_test_split Train the model with bias using SGD Calculate the weighting matrix and its biases Predict the ratings based on the weighting matrix and biases obtained from training
The goals for this project are: (1) Implement memory-based(user-based, item-based) and model-based(matrix factorization) for recommender system. (2) Compare the RMSE for both algorithms and find a better one. (3) Do recommendation in real world.
- Python 3.6
- For 100k:
cmd run
python3.6 recommender.py 100k model test
- For 1m:
cmd run
python3.6 recommender.py 1m model test
- For 100k:
cmd run
python3.6 recommender.py 100k run memory
- For 1m:
cmd run
python3.6 recommender.py 1m run memory
- For 100k:
cmd run
python3.6 recommender.py 100k recommand user_name
- For 1m:
cmd run
python3.6 recommender.py 1m recommand user_name
- For test: To test the accuracy and RMSE, there will be instructions for
The large dataset we used contains 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 MovieLens users who joined MovieLens in 2000. The small dataset from MovieLens contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996 and September 24, 2018. This dataset was generated on September 26, 2018.