Skip to content

mohamedyosef101/unemployment-rate-descriptive-stats

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

What is Descriptive Statistics

Descriptive statistics are numbers that are used to describe and summarize the data. They are used to describe the basic features of the data under consideration. They provide simple summary measures which give an overview of the dataset. Summary measures that are commonly used to describe a data set are measures of central tendency and measures of variability or dispersion.

Measures of central tendency include the mean, median and mode. These measures summarize a given data set by providing a single data point. These measures describe the center position of a distribution for a data set. We analyze the frequency of each data point in the distribution and describes it using the mean, median or mode. They provide the average of a data set. They can be either a representation of entire population or a sample of the population.

Measures of variability or dispersion include the variance or standard deviation, coefficient of variation, minimum and maximum values, IQR (Interquartile Range), skewness and kurtosis. These measures help us to analyze how spread-out the distribution is for a dataset. So, they provide the shape of the data set.

A quick guide on how to do descriptive statistics

0. The describe function

When using the describe function, there are two ways; categorical and numerical data. In case of numerical data, you'll get:

  • count: Number of non-NA/null observations
  • mean: The arithmetic average
  • std: The standard deviation
  • min: The smallest (minimum) value
  • 25%: The first quartile (25th percentile)
  • 50%: The median (50th percentile)
  • 75%: The third quartile (75th percentile)
  • max: The largest (maximum) value

But if you have categorical data, you'll see:

  • count: Number of non-NA/null observations
  • unique: Number of unique values
  • top: The most common value (the mode)
  • freq: The frequency of the most common value

1. Measures of central tendancy

Central tendency means a central value which describe a probability distribution. It may also be called a center or location of the distribution. The most common measures of central tendency are mean, median and mode.

  • The most common measure of central tendency is the mean.
  • For skewed distribution or when there is concern about outliers, the median may be preferred. So, median is more robust measure than the mean.

measures of central tendancy

2. Measures of variability

Dispersion is an indicator of how far away from the center, we can find the data values. The most common measures of dispersion are variance, standard deviation and interquartile range (IQR).

  • Variance is the standard measure of spread.
  • The standard deviation is the square root of the variance.

IQR

3. Measures of Shape

Now, we will take a look at measures of shape of distribution. There are two statistical measures that can tell us about the shape of the distribution. These measures are skewness and kurtosis. These measures can be used to convey information about the shape of the distribution of the dataset.

Skewness

  • Skewness is a measure of a distribution's symmetry or more precisely lack of symmetry.
  • It is used to mean the absence of symmetry from the mean of the dataset.
  • It is a characteristic of the deviation from the mean.
  • It is used to indicate the shape of the distribution of data.

skewness

Reference range on skewness values

The rule of thumb for skewness values are:

  • If the skewness is between -0.5 and 0.5, the data are fairly symmetrical.
  • If the skewness is between -1 and – 0.5 or between 0.5 and 1, the data are moderately skewed.
  • If the skewness is less than -1 or greater than 1, the data are highly skewed.

Kurtosis

  • Kurtosis is the degree of peakedness of a distribution.
  • Data sets with high kurtosis tend to have a distinct peak near the mean, decline rather rapidly and have heavy tails.
  • Data sets with low kurtosis tend to have a flat top near the mean rather than a sharp peak.

Kurtosis

Reference range for kurtosis

  • The reference standard is a normal distribution, which has a kurtosis of 3.
  • Often, excess kurtosis is presented instead of kurtosis, where excess kurtosis is simply kurtosis - 3.

Useful Resources


If you have any question, feel free to send me a message on LinkedIn.

About

The unemployment rate is the last years is going down despite the rise of AI and all of these advanced technologies.

Topics

Resources

Stars

Watchers

Forks