Basic Analysis of the Iris Data set Using Python
The data set contains 150 observations of iris flowers. There are four columns of measurements of the flowers in centimeters. The fifth column is the species of the flower observed. All observed flowers belong to one of three species.
This project was made in order to train myself into Data Science. I used modern libraries and better understood how this kind of data structure works. This is my very first work in this domain.
This procedure is used to summarize continuous data. Large volumes of such data may be easily summarized in statistical tables of means, counts, standard deviations, etc. Categorical group variables may be used to calculate summaries for individual groups. The tables are similar in structure to those produced by cross tabulation.
The image above is a below. A boxplot is a standardized way of displaying the distribution of data based on a five number summary (“minimum”, first quartile (Q1), median, third quartile (Q3), and “maximum”). It can tell you about your outliers and what their values are. It can also tell you if your data is symmetrical, how tightly your data is grouped, and if and how your data is skewed.
When you generalize joint plots to datasets of larger dimensions, you end up with pair plots. This is very useful for exploring correlations between multidimensional data, when you'd like to plot all pairs of values against each other.
The main idea of Seaborn is that it provides high-level commands to create a variety of plot types useful for statistical data exploration, and even some statistical model fitting.
I used the KNN method to create a model to perform machine learning on our dataset. This method will responds correctly in 98% of cases.
https://towardsdatascience.com/understanding-boxplots-5e2df7bcbd51
https://campus.datacamp.com/courses/intermediate-python-for-data-science/
https://jakevdp.github.io/PythonDataScienceHandbook/04.14-visualization-with-seaborn.html