Statistics Tutorial (Part III)

Gautam Kumar
7 min readJun 21, 2023

--

What is Center Tendency ?

In statistics, measures of central tendency provide information about the central or average value of a distribution

Commonly used measures of central tendency are listed below:

Mean: The mean, often called the average, is calculated by adding all the values ​​in a data set and dividing the sum by the total number of values. It is sensitive to extreme values ​​and can be affected by outliers.

Mean values can be calculated in different ways few ways is listed below:

  1. Arithmetic Mean
  2. Harmonic Mean
  3. Geometric Mean

Arithmetic Mean : The arithmetic mean, often simply referred to as the mean, is a measure of central tendency that represents the average value of a set of numbers. It is calculated by summing up all the values in the dataset and dividing the sum by the total number of values.

Mathematical Formula:

Arithmetic Mean = (Sum of all values) / (Number of values)

The arithmetic mean is widely used in various fields, such as statistics, mathematics, economics, and science. It provides a representative value that summarizes the data and is often used as a baseline for comparisons or further analyses.

Harmonic Mean : The harmonic mean is another measure of central tendency that is used to find the average of a set of numbers. It is specifically designed to calculate the mean when dealing with quantities that are rates or ratios.

Mathematically, the harmonic mean of a set of numbers is calculated by taking the reciprocal of each number, calculating their arithmetic mean, and then taking the reciprocal of that result.

Mathematical Formula:

Harmonic Mean = (Number of values) / (Sum of reciprocal of values)

Harmonic mean is commonly used in situations involving rates, ratios, or proportions. It gives more weight to smaller values in the dataset, which makes it useful for averaging rates or ratios that depend on each other.

Geometric Mean : The geometric mean is a measure of central tendency that is used to find the average of a set of numbers, particularly when dealing with quantities that are products or exponential growth rates. It is the average rate of change or growth.

Mathematically, the geometric mean of a set of numbers is calculated by taking the nth root of the product of the numbers, where n is the total number of values.

Mathematical Formula:

Geometric Mean = (Product of values)^(1/n)

Or in simple

Geometric Mean = (Value1 * Value2 * … * ValueN)^(1/n)

The geometric mean is often used when dealing with exponential growth rates, such as calculating average investment returns or average compound interest rates over a period of time. It is also useful for averaging ratios or rates that are interdependent, similar to the harmonic mean.

When to Use AM(Arithmetic Mean),HM(Harmonic Mean), and GM(Geometric mean) ?

For Example:

What is Weighted Mean ?

Weighted mean is a type of mean that takes into account the different weights or importance assigned to each value in a dataset. In a weighted mean, each value is multiplied by its corresponding weight, and then the sum of these weighted values is divided by the sum of the weights.

Mathematical formula:

Weighted Mean = (Sum of (Value * Weight)) / (Sum of Weights)

The main difference between the weighted mean and the normal mean lies in the way the weights are taken into account. In the weighted average, certain values ​​have greater impact or importance due to their assigned weights. This allows more emphasis to be placed on specific values ​​considered more significant or representative.

To illustrate the difference, consider an example where you want to calculate the average grade of students in a class. If each student’s grade is equally important, I would use the normal mean. However, if the final exam contributes more to the overall grade, I would give the final exam grades a higher weight and use the weighted average.

In short, the weighted mean incorporates the concept of weighting to give more importance to certain values ​​in a data set, while the normal mean treats all values ​​equally. The choice between the two depends on the context and the specific meaning or influence assigned to each value.

Median: The median represents the middle value in a data set when arranged in ascending or descending order. It is not affected by extreme values ​​or outliers and is often used when there are outliers in the data.

Mode: The mode refers to the value that occurs most frequently in a data set. Unlike the mean and median, the mode can be applied to both numeric and categorical data

What is Range ?

In statistics, the range is a simple measure that describes the spread or dispersion of a dataset. represents the difference between the maximum and minimum values in a set of data.

Mathematical formula:

Range = Maximum value — Minimum value

Why to calculate Range ?

Range provides a basic understanding of how scattered values ​​are in a dataset. However, it should be noted that the range is sensitive to outliers, as extreme values ​​can significantly affect the measure. As such, it is often used as a quick and easy measure of variability, but it may not provide a full picture of the variability of the data. To better understand variance, other measures such as variance, standard deviation, or interquartile range are often used.

Now the biggest question is why we have calculated the mean, median and mode, what we are going to understand from these statistics values, lets discuss…

Symmetrical Distribution or Normal Distribution

Context of center of tendency, a symmetrical distribution refers to a distribution of data where the values are evenly distributed around a central point, resulting in a balanced or symmetric shape. In a symmetrical distribution mean, median, and mode tend to be close to each other and represent a typical or central value.

Characteristics of a symmetrical distribution:

  1. Mean, median, and mode are approximately equal or very close to each other.
  2. Data points are evenly distributed on both sides of the central point.
    shape of the distribution is mirror-like, meaning if you were to fold it in half, the two halves would align.
  3. Examples of symmetrical distributions include the normal distribution (bell-shaped curve) and the uniform distribution (rectangular shape).

Dealing with a symmetrical distribution, the arithmetic mean is often used as the measure of central tendency because it takes into account the values on both sides of the central point. The mean provides a balance between the values, and any deviations from the central point tend to cancel each other out.

Positive Skewness

Positive skew refers to a skewed or asymmetric distribution where the tail of the distribution extends towards higher values, while the majority of the data is concentrated towards lower values. This means that the distribution is “skewed” or “lopsided” towards the right side.

Characteristics of a positively skewed distribution:

  1. Mean is typically greater than the median.
  2. Mode tends to be the smallest value in the dataset.
  3. Tail of the distribution is elongated towards the right, indicating the presence of outliers or extreme values.

Visually, a positively skewed distribution appears stretched out towards the right side, with the tail pointing in that direction.

Examples of data that may exhibit positive skew include income distribution, stock market returns, and exam scores in a highly competitive test. These distributions often have a long tail of high values due to a few extreme values or high achievers.

Dealing with positively skewed data, the median or mode may provide a more accurate representation of the center of tendency. The median is less affected by extreme values, as it represents the middle value when the data is sorted. mode, on the other hand, represents the most frequently occurring value and can be used to identify the peak of the distribution.

Negative Skewness

Negative skew refers to a skewed or asymmetric distribution where the tail of the distribution extends towards lower values, while the majority of the data is concentrated towards higher values. This means that the distribution is “skewed” or “lopsided” towards the left side.

Characteristics of a negatively skewed distribution:

  1. Mean is typically less than the median.
  2. Mode tends to be the largest value in the dataset.
  3. Tail of the distribution is elongated towards the left, indicating the presence of outliers or extreme values.

Visually, a negatively skewed distribution appears stretched out towards the left side, with the tail pointing in that direction.

Examples of data that may exhibit negative skew include salaries in a company (where there is an income ceiling but no lower bound), prices of stocks during a bull market, or test scores in an easier exam where most students score high.

Dealing with negatively skewed data, the median or mode may provide a more accurate representation of the center of tendency. The median is less affected by extreme values, as it represents the middle value when the data is sorted. The mode, on the other hand, represents the most frequently occurring value and can be used to identify the peak of the distribution.

you can find excel and R practical implementation here

  1. See that 👏 icon? Send my article some claps
  2. Connect with me via linkedin, github and on medium👈 and Buy me a coffee if you like this blog.

--

--

Gautam Kumar
Gautam Kumar

Written by Gautam Kumar

Data Scientist | MLOps | Coder l Machine learning | NLP | AI BOT I NEO4J | Python | Digital transformation |Applied AI | RPA | Blogger | Innovation enthusiast

No responses yet