Skip to content

andreiminca/Iris-Dataset

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Iris-Dataset

Basic Analysis of the Iris Data set Using Python

The data set

The data set contains 150 observations of iris flowers. There are four columns of measurements of the flowers in centimeters. The fifth column is the species of the flower observed. All observed flowers belong to one of three species.

Process

This project was made in order to train myself into Data Science. I used modern libraries and better understood how this kind of data structure works. This is my very first work in this domain.

Summary Statistics Table

This procedure is used to summarize continuous data. Large volumes of such data may be easily summarized in statistical tables of means, counts, standard deviations, etc. Categorical group variables may be used to calculate summaries for individual groups. The tables are similar in structure to those produced by cross tabulation.

Boxplots

The image above is a below. A boxplot is a standardized way of displaying the distribution of data based on a five number summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and “maximum”). It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.

Pairplots and Seaborn

When you generalize joint plots to datasets of larger dimensions, you end up with pair plots. This is very useful for exploring correlations between multidimensional data, when you'd like to plot all pairs of values against each other.

The main idea of Seaborn is that it provides high-level commands to create a variety of plot types useful for statistical data exploration, and even some statistical model fitting.

Machine Learning using scikit-learn

I used the KNN method to create a model to perform machine learning on our dataset. This method will responds correctly in 98% of cases.

References

https://ncss-wpengine.netdna-ssl.com/wp-content/themes/ncss/pdf/Procedures/NCSS/Descriptive_Statistics-Summary_Tables.pdf

https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51

https://campus.datacamp.com/courses/intermediate-python-for-data-science/

https://jakevdp.github.io/PythonDataScienceHandbook/04.14-visualization-with-seaborn.html

About

Basic Analysis of the Iris Data set Using Python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages