Skip to content

KhushiiAgarwal/BreastCancerDetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Breast Cancer Detection

Overview

Breast Cancer is a leading form of cancer in woman often going undetected to advanced stages, thus a detection model is built using 32 feautures. It aims to efficiently detect Breast Cancer using various ML models and demonstrates basic concepts for Machine Learning

Feautures

  • Pandas & numpy library of Python is used for basic mathematical operations on Dataset
  • Sklearn is used for pre-processing data using One Hot Encoding, Variance Threshold & CHI sq
  • Cross Validation is performed during train-test split
  • Sklearn is further used for deploying Linear & Logistic Regression, Decision Tree, Random Forest, Support Vector Machine & K-nearest neighbour
  • Tensorflow is used for Artificial Neural Networks
  • Matplotlib & Seaborn is used to visualise models and their Confusion Matrix for all algorithms used.

Tech Stack


python logo Jupyter logo

Setup & Installation

  • Clone the repo in local Github.
  • Use Google Colab or Install Jupyter
  • Run the .ipnyb file and ensure to give correct path for CSV
  • Now download dataset from this repository and upload it to Google Colaboratory or set appropriate path in Jupyter
  • If using Jupyter/Anaconda download dataset from: https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data
  • Install all libraries in terminal/Command Prompt (using cmd command on Windows) if using Jupyter/Anaconda
    pip install numpy
    
     pip install pandas
    
    pip install matplotlib
    
    pip install seaborn
    
    pip install tensorflow
    
    pip install sklearn
    
  • Ensure dataset is present in same directory and specify the correct path
  • For Google Colab, libraries are downloaded and imported during runtime
  • Run the .ipnyb file

Accuracy

  • Linear Regression: 78.2%
  • Logistic Regression: 98.8%
  • K-nearest neighbour: 95.9%
  • Decision Tree: 95.34%
  • Random Forest: 95.1%
  • Artificial Neural Networks (ANN) : 96.49%

License

This project is licensed under License

If you find my repository helpful, please star⭐ it 🌟.