This project aims to classify skin cancer using the HAM10000 dataset, leveraging Convolutional Neural Networks (CNN) and Support Vector Machines (SVM). It addresses the challenge of imbalanced data and explores various techniques to enhance model performance across 7 different categories of skin lesions.
The HAM10000 dataset consists of 10,000 training images and 1,000 test images, categorized into 7 types of skin lesions.
The data pre-processing steps include:
- Image Resizing: Resizing all images to a uniform size for consistent input to the models.
- Normalization: Normalizing pixel values for better model convergence.
- Data Augmentation (for CNN): Using image data generators to augment the dataset, creating variations of the images to address class imbalance.
- Dimensionality Reduction (for SVM): Applying PCA for reducing the number of features while retaining essential information.
- CNN with Normal Data: Using standard dataset.
- CNN with Data Image Generator: Employing image data generator for augmented data.
- SVM with PCA: Dimensionality reduction using PCA.
- SVM with GridSearch and SMOTE: Tuning hyperparameters with GridSearch and addressing class imbalance with SMOTE.
- SVM with Reduced Image Size and Image Data Generator: Using reduced image size and data augmentation.
- Python 3.x, TensorFlow, scikit-learn, imbalanced-learn, OpenCV, NumPy, Pandas, Matplotlib, seaborn
pip installtensorflowscikit-learnimbalanced-learnopencv-python numpy pandas matplotlib seaborn
- Clone the repository and navigate to the project directory.
- Download and place the HAM10000 dataset in the data folder.
- Run CNN and SVM models using the provided scripts.
- For Both models:
Skin_Classification.ipynb
- For Both models:
Contributions to this project are welcome. To contribute:
- Fork the repository.
- Create a new branch for your feature or fix.
- Implement your changes.
- Submit a pull request.
For any queries, suggestions, or contributions, please contact [[email protected]].