Welcome to my Support Vector Machine Project! We will be analyzing the famous iris data set.
In this project, we will use Support Vector Machines (SVMs) to classify iris flowers into three species based on their sepal and petal measurements. The dataset we are using is the famous Iris flower data set, introduced by Sir Ronald Fisher in 1936.
The Iris flower data set consists of 150 samples from three species of Iris flowers:
- Iris setosa
- Iris versicolor
- Iris virginica
For each sample, the following four features were measured:
- Sepal length (cm)
- Sepal width (cm)
- Petal length (cm)
- Petal width (cm)
First, we import the necessary libraries required for data manipulation, visualization, and building the SVM model.
We load the Iris dataset using scikit-learn's built-in datasets module.
We perform EDA to understand the dataset better. This involves visualizing the distribution of features and exploring relationships between them.
We preprocess the data by splitting it into training and testing sets. We also scale the features to ensure they have a similar range.
We build and train an SVM model using scikit-learn. SVMs will be used to classify iris flowers into their respective species based on the features provided.
We evaluate the performance of the SVM model using accuracy score and confusion matrix. This helps us understand how well the model is performing and whether adjustments are needed.
This project demonstrates the application of Support Vector Machines for multiclass classification using the Iris flower dataset. It covers data loading, preprocessing, model building, and evaluation, providing a practical introduction to SVMs in machine learning.
- Clone this repository to your local machine.
- Open the Jupyter Notebook file
iris_svm_classifier.ipynb
. - Follow the steps in the notebook to load the dataset, preprocess the data, build and train the SVM model, and evaluate its performance.
This project is licensed under the MIT License. See the LICENSE file for details.
- The Iris flower data set for providing the dataset.
- Scikit-learn and other open-source libraries for their contributions to machine learning and data analysis.