Statistics Tutorial (Part I)

Gautam Kumar
5 min readJun 1, 2023

--

What is Statistics ?

Statistics is a branch of mathematics that deals with the collection, analysis, interpretation, presentation and organization of data. It involves the use of various methods to collect, summarize and draw meaningful conclusions from numerical and categorical information.

Statistics is widely used in various fields such as business, economics, finance, healthcare, engineering, and many others. It helps researchers and analysts to analyze data, identify patterns, test hypotheses, and make data-driven decisions.

Statistics can be broadly classified into two categories:

  1. Descriptive statistics
  2. Inferential statistics

Difference between Descriptive and Inferential Statistics.

Descriptive statistics focuses on summarizing and describing data, while inferential statistics goes beyond the observed data to make inferences and draw conclusions about a larger population.
Descriptive statistics provides a snapshot of the data, while inferential statistics allows us to make educated guesses or predictions based on sample data.

What is population and sample ?

In statistics, the terms “population” and “sample” refer to two different groups of data that are used for analysis.

Population refers to the complete set or group of individuals, objects, or events that we are interested in studying or making inferences about. It represents the entire target group or the larger context to which the study or analysis is used to apply.
In general, population can be large or small, depending on the scope of the study. For example, it could be all students in a school or all registered voters in a India.

But it is often impractical or impossible to collect data from the entire population.

Sample is a subset or a smaller representative group selected (selection type will discuss in next section) from the larger population, and use it to make inferences or generalizations about the population. Sample should ideally possess similar characteristics or attributes as the population so that the findings from the sample can be generalized to the population.

Statistical analysis is always performed on the sample data, and the results are then used to draw conclusions or make predictions about the population.

The key difference between the population and sample is that the population includes all individuals or elements of interest, while the sample is a smaller group selected from the population.

Now, we will see the sampling methods, how we can select a sample from entire population.

Probability sampling is a sampling technique used in statistics where each member of the population has a known and non-zero chance of being selected as part of the sample. In simple words, probability sampling ensures that each individual or element in the population has an equal or known probability of being included in the sample.

Non-probability sampling is a sampling technique in statistics where the selection of individuals or elements from a population is not based on a known probability of selection. Non-probability sampling does not involve random selection and does not provide each member of the population with a known chance of being included in the sample. Instead, the selection of participants is based on subjective judgment or convenience.

Types of probability sampling methods:

Simple Random Sampling: In simple random sampling, each member of the population has an equal chance of being selected. This can be done using techniques such as random number generators or drawing names from a hat.

Stratified Sampling: Stratified sampling involves dividing the population into homogeneous subgroups or strata and then randomly selecting samples from each stratum in proportion to their representation in the population. This technique ensures that each subgroup is adequately represented in the sample.

Cluster Sampling: Cluster sampling involves dividing the population into clusters or groups and then randomly selecting entire clusters as the sample. This approach is often used when it is difficult or impractical to sample individuals directly.

Systematic Sampling: Systematic sampling involves selecting every nth member from the population after a random starting point is determined. For example, if the population size is N and the desired sample size is n, every N/nth member can be selected.

Types of non-probability sampling methods:

Convenience Sampling: Convenience sampling involves selecting individuals who are readily available or easily accessible. This method is convenient and quick, but it may introduce bias because the sample may not be representative of the population.

Purposive Sampling or Judgement Sampling: Purposive sampling involves selecting individuals who meet specific criteria or characteristics relevant to the research question. The sample is chosen based on the researcher’s judgment and expertise. While purposive sampling allows for selecting participants who possess the desired qualities, it may not represent the entire population.

Snowball Sampling: Snowball sampling involves selecting initial participants and then asking them to refer other potential participants. This method is often used when the population of interest is difficult to reach or identify. It is commonly used in studies where participants share a common characteristic or belong to a specific network.

Quota Sampling: Quota sampling involves selecting individuals based on predetermined quotas or targets for specific characteristics, such as age, gender, or occupation. The aim is to ensure that the sample reflects the proportions of these characteristics in the population. However, the selection within the quotas is often non-random.

Non-probability sampling methods can be useful in certain situations, such as exploratory research or when generalizability to a larger population is not a primary concern.

  1. See that 👏 icon? Send my article some claps
  2. Connect with me via linkedin, github and on medium👈 and Buy me a coffee if you like this blog.

--

--

Gautam Kumar
Gautam Kumar

Written by Gautam Kumar

Data Scientist | MLOps | Coder l Machine learning | NLP | AI BOT I NEO4J | Python | Digital transformation |Applied AI | RPA | Blogger | Innovation enthusiast

No responses yet