Skip to content

Machine Learning and Data Mining, Movie Recommender System

Notifications You must be signed in to change notification settings

miloooooz/ML-DM_recommender_system

Repository files navigation

Recommender system using collaborative filtering

Overview

This project generates Recommender system by collaborative filtering methods. It can be divided into Memory-based Collaborative Filtering and Model-based Collaborative Filtering. In Memory-based Collaborative Filtering The recommendation results are based on the similarity of between users and between movies. To calculate the similarity, we used cosine distance. Otherwise, for the Model-based Collaborative Filtering the used model is Matrix Factorization. The idea behind such models is that attitudes or preferences of a user can be determined by a small number of hidden latent factors. These factors are also called Embeddings, which represent different characteristics for users and items.

The memory-based algorithm contains following steps:

Data Preparation including train_test_split Transform from DataFrame to matrix of users and matrix of movies Calculate the user-user similarity matrix and movie-movie similarity by using Sklearn.pairwise Make prediction for the ratings of each user with regard to each movie Sort the ratings to provide top 10 movies for each user.

The model-based algorithm contains following steps:

Data Preparation including train_test_split Train the model with bias using SGD Calculate the weighting matrix and its biases Predict the ratings based on the weighting matrix and biases obtained from training

Goals

The goals for this project are: (1) Implement memory-based(user-based, item-based) and model-based(matrix factorization) for recommender system. (2) Compare the RMSE for both algorithms and find a better one. (3) Do recommendation in real world.

Environment Settings

  • Python 3.6

Instruction

To test the accuracy and RMSE for model-based algorithm:

  • For 100k:

cmd run python3.6 recommender.py 100k model test

  • For 1m:

cmd run python3.6 recommender.py 1m model test

To test the accuracy and RMSE for memory-based algorithm:

  • For 100k:

cmd run python3.6 recommender.py 100k run memory

  • For 1m:

cmd run python3.6 recommender.py 1m run memory

To make recommendation for specific user(user_name) using memory based algorithm:

  • For 100k:

cmd run python3.6 recommender.py 100k recommand user_name

  • For 1m:

cmd run python3.6 recommender.py 1m recommand user_name

Output

  • For test: To test the accuracy and RMSE, there will be instructions for

Dataset

The large dataset we used contains 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 MovieLens users who joined MovieLens in 2000. The small dataset from MovieLens contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996 and September 24, 2018. This dataset was generated on September 26, 2018.

About

Machine Learning and Data Mining, Movie Recommender System

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages