Skip to content

This project aims to classify Iris flowers into three species—setosa, versicolor, and virginica—based on their sepal and petal measurements using machine learning techniques. The dataset comprises 150 samples evenly distributed among these species

License

Notifications You must be signed in to change notification settings

noturlee/Iris-DataAnalyis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Iris Flower Classification Model

Table of Contents

  1. Overview
  2. Models Used
  3. Data Preprocessing
  4. Models Training and Evaluation
  5. Data Visualization
  6. Findings
  7. Ouput
  8. Conclusion

Overview

This project aims to classify Iris flowers into three species—setosa, versicolor, and virginica—based on their sepal and petal measurements using machine learning techniques. The dataset comprises 150 samples evenly distributed among these species, making it a standard benchmark for introductory classification tasks.

Models Used

Two primary models were employed:

  • Logistic Regression: A linear model suitable for binary and multi-class classification tasks.
  • Random Forest Classifier: An ensemble learning method effective for handling complex classification problems.

Data Preprocessing

Data Loading

The Iris dataset was loaded from a CSV file containing 150 records and 5 attributes: sepal length, sepal width, petal length, petal width, and species.

Exploratory Data Analysis (EDA)

  • Summary Statistics: Provided insights into the distribution and variation of sepal and petal measurements.
  • Pair Plot: Visualized relationships between features across different species.
  • Correlation Heatmap: Showed feature correlations, aiding in feature selection.

Models Training and Evaluation

Model Training

  • Splitting Data: The dataset was split into training (80%) and testing (20%) sets.
  • Logistic Regression: Trained a linear model for classification.
  • Random Forest Classifier: Trained an ensemble model to handle complex relationships.

Model Evaluation

  • Best Parameters: The optimal parameters found for the Random Forest Classifier were {'max_depth': 20, 'min_samples_leaf': 2, 'min_samples_split': 2, 'n_estimators': 200}. These parameters were selected based on cross-validation to maximize accuracy.

  • Best Random Forest Accuracy: The model achieved an accuracy of 100% on the test dataset, indicating that it correctly classified all Iris flowers.

Interpretation of Classification Report

The classification report provides a detailed breakdown of how well the model performed for each species:

  • Precision: Measures the accuracy of positive predictions.
  • Recall: Indicates how well the model captures instances of a class.
  • F1-score: Harmonic mean of precision and recall, providing a single metric to evaluate the model's performance.
  • Support: Number of samples in each class.

For example:

  • Iris-setosa: The model correctly classified all 10 samples of Iris-setosa, achieving perfect precision, recall, and F1-score.
  • Iris-versicolor: Similarly, all 9 samples of Iris-versicolor were correctly classified.
  • Iris-virginica: All 11 samples of Iris-virginica were also classified correctly.

The overall accuracy of 100% indicates that the model successfully learned the patterns in the data and accurately classified the Iris flowers into their respective species.

Data Visualization

  • Pair Plot: Visualizes relationships between sepal length, sepal width, petal length, and petal width across different species.
  • Correlation Heatmap: Shows the correlation coefficients between these features, aiding in feature selection and understanding feature importance.

Screenshot 2024-06-15 at 00 38 42 Screenshot 2024-06-15 at 00 39 10

Screenshot 2024-06-15 at 00 39 22 Screenshot 2024-06-15 at 00 40 09

Findings

Data Exploration:

  • Summary statistics provided insights into the distribution and variation of sepal and petal measurements.
  • Pair plots visually represented the clustering of different species based on their measurements.
  • The correlation heatmap highlighted significant relationships between certain features, influencing classification accuracy.

Model Performance:

  • Both Logistic Regression and Random Forest Classifier achieved perfect accuracy of 100% on the test dataset.
  • Precision, recall, and F1-score metrics confirmed the models' ability to effectively distinguish between Iris species.

Output

The output from the models includes:

  • Best Parameters for Random Forest Classifier: {'max_depth': 10, 'min_samples_leaf': 2, 'min_samples_split': 10, 'n_estimators': 50}
  • Best Random Forest Accuracy: 1.0
  • Classification Reports for Logistic Regression and Random Forest Classifier, showing precision, recall, and F1-score metrics for each Iris species.
Screenshot 2024-06-15 at 00 47 36

Screenshot 2024-06-15 at 00 48 03

Conclusion

This project demonstrated the application of machine learning models to classify Iris flowers based on their morphological measurements with high accuracy. The selected models, Logistic Regression and Random Forest Classifier, performed exceptionally well, showcasing their effectiveness for such classification tasks. By leveraging data preprocessing, visualization, and thorough evaluation techniques, this project provides a robust framework for introductory classification tasks.

About

This project aims to classify Iris flowers into three species—setosa, versicolor, and virginica—based on their sepal and petal measurements using machine learning techniques. The dataset comprises 150 samples evenly distributed among these species

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published