A Content-Based Recommendation System built using Python that suggests books to users based on their description. The system utilizes a Bag of Words (BoW) model and cosine similarity to find books similar to the one selected by the user. The project also includes a Streamlit-based web application to provide an interactive interface for recommendations.
- Content-Based Filtering: Recommends books based on textual descriptions.
- Bag of Words Model: Represents the text data.
- Cosine Similarity: Measures the similarity between book descriptions.
- Streamlit Application: Provides an easy-to-use web interface.
- Collect a dataset containing books and their descriptions.
- Preprocess the text data by:
- Removing stop words.
- Tokenizing the text.
- Stemming/lemmatizing.
The Bag of Words (BoW) model is created to represent the textual data. This involves:
BoW Vector = [Frequency of Term 1, Frequency of Term 2, ..., Frequency of Term N]
Where N is the size of the vocabulary.
Example Representation:
"The Hobbit": [1, 2, 0, ..., 1]
"The Lord of the Rings": [0, 1, 3, ..., 2]
Each vector corresponds to the frequency of terms in the text.
To measure the similarity between book descriptions, we calculate cosine similarity:
Cosine Similarity = (A · B) / (||A|| ||B||)
Where:
A
andB
are the vectors of two book descriptions.||A||
and||B||
are the magnitudes (norms) of the vectors.
Example Calculation:
A = [1, 2, 1], B = [2, 1, 3]
Dot Product (A . B) = (1*2) + (2*1) + (1*3) = 7
Magnitude of A (||A||) = sqrt(1^2 + 2^2 + 1^2) = sqrt(6)
Magnitude of B (||B||) = sqrt(2^2 + 1^2 + 3^2) = sqrt(14)
Cosine Similarity = 7 / (sqrt(6) * sqrt(14)) ≈ 0.89
- Compute the cosine similarity between the user's selected book and all other books.
- Rank the books based on similarity scores.
- Display the top k recommendations to the user.
- Displays the list of available books for selection.
- Allows users to choose a book to get recommendations.
- Shows the top k books similar to the selected book.
- Displays similarity scores alongside recommendations.
- Python 3.10 or above
- Required Python libraries:
pip install streamlit pandas scikit-learn numpy
- Clone the repository:
git clone https://github.com/Srujanrana07/book-recomender-system.git
cd book-recommendation-system
- Run the Streamlit app:
streamlit run app.py
- User selects a book (e.g., "The Hobbit").
- System processes the description and computes cosine similarity.
- Recommended books based on similarity are displayed:
- "The Lord of the Rings"
- "Harry Potter and the Sorcerer's Stone"
- "Percy Jackson & The Olympians"
Vector for "The Hobbit" = [1, 2, 0, ..., 1]
Where each element represents the frequency of a specific word.
A = [1, 2, 1], B = [2, 1, 3]
Cosine Similarity = ((12) + (21) + (1*3)) / sqrt(1^2 + 2^2 + 1^2) * sqrt(2^2 + 1^2 + 3^2) = 0.89
- Incorporate user feedback for better personalization.
- Extend to hybrid recommendation systems.
- Use advanced NLP techniques like TF-IDF or word embeddings for better text representation.
This project is licensed under the MIT License.