Iris-Dataset

Basic Analysis of the Iris Data set Using Python

The data set

The data set contains 150 observations of iris flowers. There are four columns of measurements of the flowers in centimeters. The fifth column is the species of the flower observed. All observed flowers belong to one of three species.

Process

This project was made in order to train myself into Data Science. I used modern libraries and better understood how this kind of data structure works. This is my very first work in this domain.

Summary Statistics Table

This procedure is used to summarize continuous data. Large volumes of such data may be easily summarized in statistical tables of means, counts, standard deviations, etc. Categorical group variables may be used to calculate summaries for individual groups. The tables are similar in structure to those produced by cross tabulation.

Boxplots

The image above is a below. A boxplot is a standardized way of displaying the distribution of data based on a five number summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and “maximum”). It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.

Pairplots and Seaborn

When you generalize joint plots to datasets of larger dimensions, you end up with pair plots. This is very useful for exploring correlations between multidimensional data, when you'd like to plot all pairs of values against each other.

The main idea of Seaborn is that it provides high-level commands to create a variety of plot types useful for statistical data exploration, and even some statistical model fitting.

Machine Learning using scikit-learn

I used the KNN method to create a model to perform machine learning on our dataset. This method will responds correctly in 98% of cases.

References

https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Descriptive_Statistics-Summary_Tables.pdf

https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

https://campus.datacamp.com/courses/intermediate-python-for-data-science/

https://jakevdp.github.io/PythonDataScienceHandbook/04.14-visualization-with-seaborn.html

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
iris.in		iris.in
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Iris-Dataset

The data set

Process

Summary Statistics Table

Boxplots

Pairplots and Seaborn

Machine Learning using scikit-learn

References

About

Releases

Packages

Languages

andreiminca/Iris-Dataset

Folders and files

Latest commit

History

Repository files navigation

Iris-Dataset

The data set

Process

Summary Statistics Table

Boxplots

Pairplots and Seaborn

Machine Learning using scikit-learn

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages