Module III·Article I·~4 min read
Fundamentals of Quantitative Research and Statistics
Introduction to Quantitative Research
Turn this article into a podcast
Pick voices, format, length — AI generates the audio
What is Quantitative Research?
Quantitative research is a systematic approach to the collection and analysis of numerical data designed to describe, explain, and predict phenomena. Quantitative methods are based on the measurement of variables and the use of statistical tools to identify patterns and test hypotheses.
Key Characteristics:
- Use of numerical data
- Statistical analysis
- Striving for objectivity and generalizability
- Testing of pre-formulated hypotheses (deductive approach)
- Large samples to ensure representativeness
Types of Data
Understanding data types is critical, as the data type determines which statistical methods can be applied.
Nominal Data
Categories without order. Examples: gender (male/female), nationality, company type (private/public), industry.
- Permissible operations: counting frequencies, mode
- Not allowed: ranking, calculating the mean
Ordinal Data
Categories with a certain order, but without equal intervals. Examples: education level (secondary/bachelor's/master's/PhD), Likert scale (1 = strongly disagree ... 5 = strongly agree), job level.
- Permissible operations: median, quartiles, rank correlation
- Not allowed: claiming that the difference between 1 and 2 is the same as between 3 and 4
Interval Data
Numerical data with equal intervals, but no absolute zero. Examples: temperature in Celsius, dates, IQ.
- Permissible operations: mean, standard deviation, Pearson correlation
- Not allowed: saying that 40°C is "twice as hot" as 20°C
Ratio Data
Numerical data with an absolute zero. Examples: income, age, number of employees, revenue.
- Permissible operations: all statistical operations, including proportions
- Permitted: stating that an income of 100,000 rubles is twice as much as 50,000 rubles
Descriptive Statistics
Measures of Central Tendency
Arithmetic mean (Mean) — the sum of all values divided by their number.
- Formula: x̄ = Σx / n
- Sensitive to outliers
- Suitable for interval and ratio data
Median — the value that divides the ordered data set in half.
- Not sensitive to outliers
- Suitable for ordinal and higher data types
Mode — the most frequently occurring value.
- Can be applied to all types of data, including nominal
Example Calculation
Salary data for 7 employees (thousand rubles): 30, 35, 40, 42, 45, 50, 200
- Mean: (30+35+40+42+45+50+200) / 7 = 63.1 thousand rubles
- Median: 42 thousand rubles (the central value)
- Mode: none (all values are unique)
Note: the mean (63.1) is heavily inflated by the outlier (200). The median (42) better reflects the typical salary.
Measures of Dispersion (Variability)
Range — the difference between the maximum and minimum values.
- Range = Max − Min = 200 − 30 = 170
Variance — the average squared deviation from the mean.
- Formula: s² = Σ(x − x̄)² / (n − 1)
Standard Deviation — the square root of the variance.
- Formula: s = √s²
- Interpretation: the larger the standard deviation, the greater the spread of the data around the mean
Normal Distribution
The normal (Gaussian) distribution is a bell-shaped curve, symmetrical about the mean. In a normal distribution:
- 68% of data lie within ±1 standard deviation from the mean
- 95% — within ±2 standard deviations
- 99.7% — within ±3 standard deviations
Many statistical tests assume a normal distribution of data.
Graphical Presentation of Data
Bar Chart — for nominal and ordinal data. Shows the frequency of each category.
Histogram — for continuous numerical data. Shows the distribution of values by intervals.
Pie Chart — for displaying category proportions in a whole. Use with caution: for a large number of categories the chart becomes unreadable.
Scatter Plot — for visualizing the relationship between two numerical variables.
Box Plot — displays the median, quartiles, range, and outliers. Very useful for comparing distributions.
Practical Assignments
Assignment 1
Question: Determine the data type for each variable:
a) Number of employees in a company
b) Satisfaction level (1-5)
c) Company industry (IT, finance, manufacturing)
d) Company revenue in rubles
e) Office temperature
Solution: a) Ratio data — there is an absolute zero (0 employees); it makes sense to say "twice as many" b) Ordinal — there is order (5 > 4 > 3), but it cannot be claimed that the difference between 1 and 2 equals the difference between 4 and 5 c) Nominal — categories without internal order d) Ratio data — absolute zero, proportions can be calculated e) Interval — equal intervals, but 0°C does not mean "absence of temperature"
Assignment 2
Question: Calculate the mean, median, and mode for the following data set (number of errors in reports): 2, 3, 3, 5, 7, 3, 8, 12, 4
Solution:
- Order the data: 2, 3, 3, 3, 4, 5, 7, 8, 12
- Mean: (2+3+3+3+4+5+7+8+12) / 9 = 47/9 = 5.22
- Median: 9 values → the middle (5th): 4
- Mode: value 3 occurs 3 times (most frequently): 3
Interpretation: the mean (5.22) is increased due to the outlier (12). The median (4) and mode (3) better characterize the typical number of errors. The distribution is skewed to the right (positively skewed).
§ Act · what next