Statistics Tutorial (Part II)
What is a variable and Random variable?
Variable: variable refers to a characteristic or attribute that can vary or take on different values. It is a fundamental concept used to represent and analyze data in quantitative or qualitative terms. Variables can be classified into different types based on their nature and measurement scales.
Random Variable: A random variable is a concept in probability theory that assigns a numerical value to each possible outcome of a random phenomenon. This is a function that assigns the results of a random experiment to real numbers. Random variables are used to model uncertainty and variability in probabilistic situations. They can take different values, with certain probabilities associated with each value.
Random variable can be of two types
Discrete Random Variable: A discrete random variable can only take on a countable number of distinct values. The probabilities associated with these values can be expressed as a probability mass function (PMF). Examples include the number of crossings in various coin tosses or the number of cars that go through a tollbooth in a given hour.
Continuous Random Variable: A continuous random variable can take on any value within a specified range or interval. The probabilities associated with various values are expressed by a probability density function (PDF). Examples of this are the height of people, the time it takes to complete a task, or the temperature in a specific location.
There are two form of data:
- Qualitative data
- Quantitative data
What is Qualitative data ?
Qualitative data refers to information that describes qualities, traits, or attributes, rather than numerical measurements. This is non-numerical data and is often descriptive and subjective in nature. Qualitative data are generally obtained through methods such as interviews, observations, open surveys, or analysis of textual materials. Qualitative data reveals the meanings, perceptions, experiences, and interpretations of individuals or groups. The main focus is on understanding the richness, context, and nuances of a phenomenon rather than quantifying or measuring it. This type of data is commonly used in the social sciences and humanities, market research, anthropology, psychology, and other fields where understanding human behavior, beliefs, and experiences is critical.
Qualitative data is of two types
- Nominal Data
- Ordinal Data
Nominal data, also called categorical data, is qualitative data composed of categories or groups with no inherent order or ranking. In nominal data, the values or labels assigned to the categories are simply used to distinguish one group from another, without indicating any quantitative or qualitative differences between them.
Key characteristics of nominal data are listed below:
Categories: Nominal data is organized into distinct and mutually exclusive categories or levels. Each category represents a different attribute or trait.
No order or precedence: Unlike ordinal dates, nominal dates have no inherent order or precedence. Categories are considered equal and cannot be ranked by value or quality. Open in Goo
Qualitative Labels: Nominal data is usually represented by qualitative labels such as names, words, or symbols. These tags are used to identify and classify the different categories.
Examples :
Marital status: Married, Single, Divorced
Eye color: Blue, Brown, Green, Hazel
Types of animals: Dog, Cat, Bird, Fish
Car brands: Toyota, Honda, Ford, BMW
Gender: Male, Female, Other
Ordinal data is qualitative data that represents categories or groups with a natural order or ranking. In ordinal data, the categories have relative position or rank, indicating a qualitative distinction between them. Although the categories can be ordered in ordinal dates, the numerical difference between the categories may not be consistent or measurable.
Key characteristics of ordinal data are:
Categories: Ordinal data consists of categories that have a specific order or ranking. The categories represent different levels or positions based on a particular attribute or characteristic.
Non-Uniform Differences: Unlike interval or ratio data, the differences between the categories in ordinal data may not be equal or precisely quantifiable. The distinction between categories is based on their relative position, rather than a specific measurement or value.
Qualitative Labels or Ranks: Ordinal data is typically represented using qualitative labels or ranks, such as words or numbers that reflect the order or ranking of the categories.
Examples:
Education level: High School Diploma, Bachelor’s Degree, Master’s Degree,
Likert scale ratings: Strongly Disagree, Disagree, Neutral, Agree
Socioeconomic status: Low, Middle, High
Performance ratings: Poor, Fair, Good, Excellent
Levels of satisfaction: Very Dissatisfied, Dissatisfied, Neutral, Satisfied
What is Quantitative data ?
Quantitative data is information that is presented numerically and can be measured or counted. These are numerical values obtained through systematic and structured methods, enabling statistical analysis and mathematical calculations. Quantitative data provides objective and measurable information about variables, making it useful for statistical analysis, comparison, and generalization. Quantitative data can be collected through various methods such as surveys, experiments, sensors or existing databases. It is commonly used in fields such as economics, finance, psychology, biology, physics, and social sciences. Quantitative data enable various statistical analyses, including descriptive statistics, inferential statistics, regression analysis, hypothesis testing, and data modeling. By quantifying variables, relationships, and patterns, researchers can make statistical inferences, make predictions, spot trends, and test hypotheses.
Quantitative data is of two types
- Discrete Data
- Continuous Data
already discussed about discrete and Continuous data in above section.
Pictorial summarization :
What is frequency ?
Frequency: In statistics, frequency refers to the number of times a particular value or category occurs in a data set. It represents the number or number of observations that fall into a specific category or have a specific value. Frequencies provide basic information about the distribution of data and the prevalence of different values or categories.
For example, consider a dataset of students’ scores on a test:
Score: 80, 75, 90, 85, 75, 85, 90, 80, 90
In this dataset, the frequency of the score 80 is 2 (it occurs twice), the frequency of 75 is 2, the frequency of 90 is 3, and the frequency of 85 is 2.
What is frequency distribution ?
Frequency distribution : frequency distribution is a tabular or graphical representation that shows the frequencies of different values or categories in a data set. It organizes data into different groups or intervals and provides a summary of how often each value or category occurs. Frequency distributions help you understand the structure and distribution of data. The frequency distribution summarizes the frequencies of each score and provides a compact representation of the dataset.
In summary, frequency refers to the count or tally of occurrences of a specific value or category, while a frequency distribution organizes the frequencies of different values or categories into a summary table or graphical representation. The frequency distribution provides a more comprehensive view of the data’s distributional properties.
- See that 👏 icon? Send my article some claps
- Connect with me via linkedin, github and on medium👈 and Buy me a coffee if you like this blog.