Module III·Article III·~7 min read
Descriptive Statistics and Graphs in SPSS
Introduction to Quantitative Research
Turn this article into a podcast
Pick voices, format, length — AI generates the audio
Measures of Central Tendency
Measures of central tendency indicate the "typical" or "central" value in a dataset. The choice of the appropriate measure depends on the type of data and distribution.
Arithmetic Mean (Mean)
- Formula: x̄ = Σxᵢ / n
- When to use: for interval and ratio data with approximately normal distribution
- Limitations: sensitive to outliers—one extreme value can significantly shift the mean
- Example: the average salary of 10 employees is informative if there are no sharp outliers
Median
- Definition: the value that divides an ordered series of data exactly in half
- When to use: in the presence of outliers, for skewed distributions, for ordinal data
- Advantage: resistant to extreme values
- Example: median income better characterizes the "typical" income of the population than the mean, since incomes are distributed with right skew
Mode
- Definition: the most frequently occurring value in a dataset
- When to use: for nominal data (the only applicable measure of central tendency), for multimodal distributions
- Features: there can be several modes (bimodal, multimodal distribution) or no mode at all
Comparison of Measures by Distribution Type
- Symmetric distribution: Mean ≈ Median ≈ Mode
- Right skew (positive skewness): Mean > Median > Mode
- Left skew (negative skewness): Mean < Median < Mode
Measures of Dispersion (Variance)
Measures of dispersion indicate how much values deviate from the center of the distribution.
Range
- Formula: Range = Max − Min
- The simplest measure, but considers only the two extreme values and is very sensitive to outliers
Variance
- Sample variance formula: s² = Σ(xᵢ − x̄)² / (n − 1)
- Shows the average squared deviation from the mean
- Division by (n − 1) instead of n is Bessel's correction for unbiased estimation of population variance
Standard Deviation
- Formula: s = √s²
- The most commonly used measure of dispersion, expressed in the same units as the original data
- Interpretation: the larger the SD, the greater the scatter of data; small SD indicates values are concentrated near the mean
Interquartile Range (IQR)
- Formula: IQR = Q3 − Q1 (difference between the 75th and 25th percentiles)
- Resistant to outliers, often used along with the median
Measures of Distribution Shape
Skewness
- Indicates the degree to which a distribution deviates from symmetry
- Skewness = 0 — symmetrical distribution
- Skewness > 0 — right skew (tail stretched to the right, most values on the left)
- Skewness < 0 — left skew (tail stretched to the left)
- Rule: if |Skewness| < 1, skewness is moderate; if > 1 — significant
Kurtosis
- Indicates "sharpness" or "flatness" of a distribution compared to the normal
- Kurtosis = 0 — normal distribution (mesokurtic)
- Kurtosis > 0 — peaked distribution (leptokurtic), heavy tails
- Kurtosis < 0 — flat-topped distribution (platykurtic), light tails
The Normal Distribution and Its Importance
The normal distribution is a fundamental concept in statistics. Its properties:
- Symmetrical about the mean
- Mean = Median = Mode
- 68.27% of values within ±1 SD from the mean
- 95.45% of values within ±2 SD
- 99.73% of values within ±3 SD
Why important: many parametric tests (t-test, ANOVA, Pearson correlation, regression) assume normal distribution of data. Violation of this assumption can lead to incorrect results.
Checking normality in SPSS:
- Visually: histogram with normal curve, Q-Q plot
- Statistically: Shapiro-Wilk test for samples n < 50, Kolmogorov-Smirnov test for larger samples
- Analyze → Descriptive Statistics → Explore → Plots → Normality plots with tests
Descriptive Statistics in SPSS
Method 1: Frequencies
Analyze → Descriptive Statistics → Frequencies
- Move variables to the list
- Click Statistics → select: Mean, Median, Mode, Std. Deviation, Variance, Skewness, Kurtosis, Minimum, Maximum
- Click Charts → select chart type (histogram with normal curve)
- OK
Method 2: Descriptives
Analyze → Descriptive Statistics → Descriptives
- More compact output: mean, standard deviation, minimum, maximum
- Option Save standardized values as variables — creates z-scores (standardized values)
Method 3: Explore
Analyze → Descriptive Statistics → Explore
- Most comprehensive analysis: descriptive statistics, normality tests, boxplot, stem-and-leaf plot
- Allows dividing analysis by groups (Factor List)
Creating Graphs in SPSS
Histogram
- Graphs → Legacy Dialogs → Histogram or through Frequencies
- Shows the distribution of a continuous variable
- Option Display normal curve overlays the normal distribution curve for visual assessment of normality
Bar Chart
- Graphs → Legacy Dialogs → Bar → Simple
- Used for categorical variables
- Shows frequency or percentage of each category
Boxplot
- Graphs → Legacy Dialogs → Boxplot or via Explore
- Displays: median (center line), Q1 and Q3 (edges of box), "whiskers" (1.5 × IQR), outliers (dots outside whiskers)
- Ideal for comparing distributions between groups and identifying outliers
Scatterplot
- Graphs → Legacy Dialogs → Scatter/Dot → Simple Scatter
- Visualizes the relationship between two quantitative variables
- Allows adding a trend line (regression line) via double-clicking on the chart → Elements → Fit Line at Total
Interpreting SPSS Output Tables
When performing analysis, SPSS displays results in the Output Viewer window. A typical descriptive statistics table contains:
| Statistic | Value | Interpretation |
|---|---|---|
| N | 150 | Number of valid observations |
| Mean | 35.40 | Average age value |
| Std. Deviation | 8.72 | Average spread from the mean |
| Skewness | 0.45 | Slight right skew |
| Std. Error of Skewness | 0.198 | For assessing the significance of skewness |
| Kurtosis | −0.32 | Slightly platykurtic |
| Minimum | 19 | Minimum age |
| Maximum | 62 | Maximum age |
Tip: to assess the significance of skewness and kurtosis, divide their values by the standard error. If the result in absolute value exceeds 1.96 (at significance level 0.05), deviation from normality is statistically significant.
Frequency Tables and Crosstabulation
Frequency Tables
Analyze → Descriptive Statistics → Frequencies
- Show count (Frequency), percentage (Percent), valid percentage (Valid Percent), and cumulative percentage (Cumulative Percent) for each value of a variable
- Especially useful for categorical variables
Crosstabulation (Crosstabs)
Analyze → Descriptive Statistics → Crosstabs
- Shows the joint distribution of two categorical variables
- Rows: one variable; columns: the other
- Click Cells → choose Row percentages, Column percentages, or Total percentages for more informative analysis
- Click Statistics → choose Chi-square to test the association between variables
Example: crosstabulation "Gender × Satisfaction level" will show whether the distribution of satisfaction differs between males and females.
Practical Tasks
Task 1
Question: Scores of 12 students in a test: 45, 52, 58, 60, 62, 65, 65, 68, 70, 75, 82, 95. Manually calculate: mean, median, mode, range, and determine the type of skewness.
Solution:
- Mean: (45+52+58+60+62+65+65+68+70+75+82+95) / 12 = 797/12 = 66.42
- Median: 12 values → mean between 6th (65) and 7th (65) = 65
- Mode: 65 occurs 2 times (more than others) = 65
- Range: 95 − 45 = 50
- Skewness: Mean (66.42) > Median (65) = Mode (65) → slight right skew (positive skewness), explained by the outlier 95
Task 2
Question: Describe step by step how to obtain in SPSS descriptive statistics (mean, median, standard deviation, skewness, kurtosis) for the variable "income" and build a histogram with the normal distribution curve.
Solution:
- Open your data file in SPSS
- Analyze → Descriptive Statistics → Frequencies
- Move the variable "income" to the Variable(s) list
- Click Statistics:
- Check: Mean, Median, Std. Deviation, Skewness, Kurtosis
- Click Continue
- Click Charts:
- Choose Histograms
- Check Show normal curve on histogram
- Click Continue
- Click OK
- In the Output Viewer, analyze the statistics table and histogram
Task 3
Question: Values Skewness = 1.85 and Std. Error of Skewness = 0.35 for the variable "income". Is skewness statistically significant? What recommendations will you give?
Solution:
- Calculate the z-score for skewness: z = 1.85 / 0.35 = 5.29
- Since |5.29| > 1.96, skewness is statistically significant (p < 0.05)
- Positive value (1.85 > 1) indicates substantial right skew
- Recommendations:
- Use the median instead of the mean to describe central tendency
- Consider logarithmic transformation (LN or LOG10) to normalize the distribution
- When using parametric tests—check the robustness of results with nonparametric alternatives (for example, Mann–Whitney U-test instead of t-test)
§ Act · what next