Breast Cancer is a leading form of cancer in woman often going undetected to advanced stages, thus a detection model is built using 32 feautures. It aims to efficiently detect Breast Cancer using various ML models and demonstrates basic concepts for Machine Learning
- Pandas & numpy library of Python is used for basic mathematical operations on Dataset
- Sklearn is used for pre-processing data using One Hot Encoding, Variance Threshold & CHI sq
- Cross Validation is performed during train-test split
- Sklearn is further used for deploying Linear & Logistic Regression, Decision Tree, Random Forest, Support Vector Machine & K-nearest neighbour
- Tensorflow is used for Artificial Neural Networks
- Matplotlib & Seaborn is used to visualise models and their Confusion Matrix for all algorithms used.
- Clone the repo in local Github.
- Use Google Colab or Install Jupyter
- Run the .ipnyb file and ensure to give correct path for CSV
- Now download dataset from this repository and upload it to Google Colaboratory or set appropriate path in Jupyter
- If using Jupyter/Anaconda download dataset from: https://www.kaggle.com/datasets/uciml/breast-cancer-wisconsin-data
- Install all libraries in terminal/Command Prompt (using cmd command on Windows) if using Jupyter/Anaconda
pip install numpy
pip install pandas
pip install matplotlib
pip install seaborn
pip install tensorflow
pip install sklearn
- Ensure dataset is present in same directory and specify the correct path
- For Google Colab, libraries are downloaded and imported during runtime
- Run the .ipnyb file
- Linear Regression: 78.2%
- Logistic Regression: 98.8%
- K-nearest neighbour: 95.9%
- Decision Tree: 95.34%
- Random Forest: 95.1%
- Artificial Neural Networks (ANN) : 96.49%
This project is licensed under License