Fundamentals of Quantitative Research and Statistics

What is Quantitative Research?

Quantitative research is a systematic approach to the collection and analysis of numerical data designed to describe, explain, and predict phenomena. Quantitative methods are based on the measurement of variables and the use of statistical tools to identify patterns and test hypotheses.

Key Characteristics:

Use of numerical data
Statistical analysis
Striving for objectivity and generalizability
Testing of pre-formulated hypotheses (deductive approach)
Large samples to ensure representativeness

Types of Data

Understanding data types is critical, as the data type determines which statistical methods can be applied.

Nominal Data

Categories without order. Examples: gender (male/female), nationality, company type (private/public), industry.

Permissible operations: counting frequencies, mode
Not allowed: ranking, calculating the mean

Ordinal Data

Categories with a certain order, but without equal intervals. Examples: education level (secondary/bachelor's/master's/PhD), Likert scale (1 = strongly disagree ... 5 = strongly agree), job level.

Permissible operations: median, quartiles, rank correlation
Not allowed: claiming that the difference between 1 and 2 is the same as between 3 and 4

Interval Data

Numerical data with equal intervals, but no absolute zero. Examples: temperature in Celsius, dates, IQ.

Permissible operations: mean, standard deviation, Pearson correlation
Not allowed: saying that 40°C is "twice as hot" as 20°C

Ratio Data

Numerical data with an absolute zero. Examples: income, age, number of employees, revenue.

Permissible operations: all statistical operations, including proportions
Permitted: stating that an income of 100,000 rubles is twice as much as 50,000 rubles

Descriptive Statistics

Measures of Central Tendency

Arithmetic mean (Mean) — the sum of all values divided by their number.

Formula: x̄ = Σx / n
Sensitive to outliers
Suitable for interval and ratio data

Median — the value that divides the ordered data set in half.

Not sensitive to outliers
Suitable for ordinal and higher data types

Mode — the most frequently occurring value.

Can be applied to all types of data, including nominal

Example Calculation

Salary data for 7 employees (thousand rubles): 30, 35, 40, 42, 45, 50, 200

Mean: (30+35+40+42+45+50+200) / 7 = 63.1 thousand rubles
Median: 42 thousand rubles (the central value)
Mode: none (all values are unique)

Note: the mean (63.1) is heavily inflated by the outlier (200). The median (42) better reflects the typical salary.

Measures of Dispersion (Variability)

Range — the difference between the maximum and minimum values.

Range = Max − Min = 200 − 30 = 170

Variance — the average squared deviation from the mean.

Formula: s² = Σ(x − x̄)² / (n − 1)

Standard Deviation — the square root of the variance.

Formula: s = √s²
Interpretation: the larger the standard deviation, the greater the spread of the data around the mean

Normal Distribution

The normal (Gaussian) distribution is a bell-shaped curve, symmetrical about the mean. In a normal distribution:

68% of data lie within ±1 standard deviation from the mean
95% — within ±2 standard deviations
99.7% — within ±3 standard deviations

Many statistical tests assume a normal distribution of data.

Graphical Presentation of Data

Bar Chart — for nominal and ordinal data. Shows the frequency of each category.

Histogram — for continuous numerical data. Shows the distribution of values by intervals.

Pie Chart — for displaying category proportions in a whole. Use with caution: for a large number of categories the chart becomes unreadable.

Scatter Plot — for visualizing the relationship between two numerical variables.

Box Plot — displays the median, quartiles, range, and outliers. Very useful for comparing distributions.

Practical Assignments

Assignment 1

Question: Determine the data type for each variable: a) Number of employees in a company
b) Satisfaction level (1-5)
c) Company industry (IT, finance, manufacturing)
d) Company revenue in rubles
e) Office temperature

Solution: a) Ratio data — there is an absolute zero (0 employees); it makes sense to say "twice as many" b) Ordinal — there is order (5 > 4 > 3), but it cannot be claimed that the difference between 1 and 2 equals the difference between 4 and 5 c) Nominal — categories without internal order d) Ratio data — absolute zero, proportions can be calculated e) Interval — equal intervals, but 0°C does not mean "absence of temperature"

Assignment 2

Question: Calculate the mean, median, and mode for the following data set (number of errors in reports): 2, 3, 3, 5, 7, 3, 8, 12, 4

Solution:

Order the data: 2, 3, 3, 3, 4, 5, 7, 8, 12
Mean: (2+3+3+3+4+5+7+8+12) / 9 = 47/9 = 5.22
Median: 9 values → the middle (5th): 4
Mode: value 3 occurs 3 times (most frequently): 3

Interpretation: the mean (5.22) is increased due to the outlier (12). The median (4) and mode (3) better characterize the typical number of errors. The distribution is skewed to the right (positively skewed).