Descriptive Statistics and Graphs in SPSS

Measures of Central Tendency

Measures of central tendency indicate the "typical" or "central" value in a dataset. The choice of the appropriate measure depends on the type of data and distribution.

Arithmetic Mean (Mean)

Formula: x̄ = Σxᵢ / n
When to use: for interval and ratio data with approximately normal distribution
Limitations: sensitive to outliers—one extreme value can significantly shift the mean
Example: the average salary of 10 employees is informative if there are no sharp outliers

Median

Definition: the value that divides an ordered series of data exactly in half
When to use: in the presence of outliers, for skewed distributions, for ordinal data
Advantage: resistant to extreme values
Example: median income better characterizes the "typical" income of the population than the mean, since incomes are distributed with right skew

Mode

Definition: the most frequently occurring value in a dataset
When to use: for nominal data (the only applicable measure of central tendency), for multimodal distributions
Features: there can be several modes (bimodal, multimodal distribution) or no mode at all

Comparison of Measures by Distribution Type

Symmetric distribution: Mean ≈ Median ≈ Mode
Right skew (positive skewness): Mean > Median > Mode
Left skew (negative skewness): Mean < Median < Mode

Measures of Dispersion (Variance)

Measures of dispersion indicate how much values deviate from the center of the distribution.

Range

Formula: Range = Max − Min
The simplest measure, but considers only the two extreme values and is very sensitive to outliers

Variance

Sample variance formula: s² = Σ(xᵢ − x̄)² / (n − 1)
Shows the average squared deviation from the mean
Division by (n − 1) instead of n is Bessel's correction for unbiased estimation of population variance

Standard Deviation

Formula: s = √s²
The most commonly used measure of dispersion, expressed in the same units as the original data
Interpretation: the larger the SD, the greater the scatter of data; small SD indicates values are concentrated near the mean

Interquartile Range (IQR)

Formula: IQR = Q3 − Q1 (difference between the 75th and 25th percentiles)
Resistant to outliers, often used along with the median

Measures of Distribution Shape

Skewness

Indicates the degree to which a distribution deviates from symmetry
Skewness = 0 — symmetrical distribution
Skewness > 0 — right skew (tail stretched to the right, most values on the left)
Skewness < 0 — left skew (tail stretched to the left)
Rule: if |Skewness| < 1, skewness is moderate; if > 1 — significant

Kurtosis

Indicates "sharpness" or "flatness" of a distribution compared to the normal
Kurtosis = 0 — normal distribution (mesokurtic)
Kurtosis > 0 — peaked distribution (leptokurtic), heavy tails
Kurtosis < 0 — flat-topped distribution (platykurtic), light tails

The Normal Distribution and Its Importance

The normal distribution is a fundamental concept in statistics. Its properties:

Symmetrical about the mean
Mean = Median = Mode
68.27% of values within ±1 SD from the mean
95.45% of values within ±2 SD
99.73% of values within ±3 SD

Why important: many parametric tests (t-test, ANOVA, Pearson correlation, regression) assume normal distribution of data. Violation of this assumption can lead to incorrect results.

Checking normality in SPSS:

Visually: histogram with normal curve, Q-Q plot
Statistically: Shapiro-Wilk test for samples n < 50, Kolmogorov-Smirnov test for larger samples
Analyze → Descriptive Statistics → Explore → Plots → Normality plots with tests

Descriptive Statistics in SPSS

Method 1: Frequencies

Analyze → Descriptive Statistics → Frequencies

Move variables to the list
Click Statistics → select: Mean, Median, Mode, Std. Deviation, Variance, Skewness, Kurtosis, Minimum, Maximum
Click Charts → select chart type (histogram with normal curve)
OK

Method 2: Descriptives

Analyze → Descriptive Statistics → Descriptives

More compact output: mean, standard deviation, minimum, maximum
Option Save standardized values as variables — creates z-scores (standardized values)

Method 3: Explore

Analyze → Descriptive Statistics → Explore

Most comprehensive analysis: descriptive statistics, normality tests, boxplot, stem-and-leaf plot
Allows dividing analysis by groups (Factor List)

Creating Graphs in SPSS

Histogram

Graphs → Legacy Dialogs → Histogram or through Frequencies
Shows the distribution of a continuous variable
Option Display normal curve overlays the normal distribution curve for visual assessment of normality

Bar Chart

Graphs → Legacy Dialogs → Bar → Simple
Used for categorical variables
Shows frequency or percentage of each category

Boxplot

Graphs → Legacy Dialogs → Boxplot or via Explore
Displays: median (center line), Q1 and Q3 (edges of box), "whiskers" (1.5 × IQR), outliers (dots outside whiskers)
Ideal for comparing distributions between groups and identifying outliers

Scatterplot

Graphs → Legacy Dialogs → Scatter/Dot → Simple Scatter
Visualizes the relationship between two quantitative variables
Allows adding a trend line (regression line) via double-clicking on the chart → Elements → Fit Line at Total

Interpreting SPSS Output Tables

When performing analysis, SPSS displays results in the Output Viewer window. A typical descriptive statistics table contains:

Statistic	Value	Interpretation
N	150	Number of valid observations
Mean	35.40	Average age value
Std. Deviation	8.72	Average spread from the mean
Skewness	0.45	Slight right skew
Std. Error of Skewness	0.198	For assessing the significance of skewness
Kurtosis	−0.32	Slightly platykurtic
Minimum	19	Minimum age
Maximum	62	Maximum age

Tip: to assess the significance of skewness and kurtosis, divide their values by the standard error. If the result in absolute value exceeds 1.96 (at significance level 0.05), deviation from normality is statistically significant.

Frequency Tables and Crosstabulation

Frequency Tables

Analyze → Descriptive Statistics → Frequencies

Show count (Frequency), percentage (Percent), valid percentage (Valid Percent), and cumulative percentage (Cumulative Percent) for each value of a variable
Especially useful for categorical variables

Crosstabulation (Crosstabs)

Analyze → Descriptive Statistics → Crosstabs

Shows the joint distribution of two categorical variables
Rows: one variable; columns: the other
Click Cells → choose Row percentages, Column percentages, or Total percentages for more informative analysis
Click Statistics → choose Chi-square to test the association between variables

Example: crosstabulation "Gender × Satisfaction level" will show whether the distribution of satisfaction differs between males and females.

Practical Tasks

Task 1

Question: Scores of 12 students in a test: 45, 52, 58, 60, 62, 65, 65, 68, 70, 75, 82, 95. Manually calculate: mean, median, mode, range, and determine the type of skewness.

Solution:

Mean: (45+52+58+60+62+65+65+68+70+75+82+95) / 12 = 797/12 = 66.42
Median: 12 values → mean between 6th (65) and 7th (65) = 65
Mode: 65 occurs 2 times (more than others) = 65
Range: 95 − 45 = 50
Skewness: Mean (66.42) > Median (65) = Mode (65) → slight right skew (positive skewness), explained by the outlier 95

Task 2

Question: Describe step by step how to obtain in SPSS descriptive statistics (mean, median, standard deviation, skewness, kurtosis) for the variable "income" and build a histogram with the normal distribution curve.

Solution:

Open your data file in SPSS
Analyze → Descriptive Statistics → Frequencies
Move the variable "income" to the Variable(s) list
Click Statistics:
- Check: Mean, Median, Std. Deviation, Skewness, Kurtosis
- Click Continue
Click Charts:
- Choose Histograms
- Check Show normal curve on histogram
- Click Continue
Click OK
In the Output Viewer, analyze the statistics table and histogram

Task 3

Question: Values Skewness = 1.85 and Std. Error of Skewness = 0.35 for the variable "income". Is skewness statistically significant? What recommendations will you give?

Solution:

Calculate the z-score for skewness: z = 1.85 / 0.35 = 5.29
Since |5.29| > 1.96, skewness is statistically significant (p < 0.05)
Positive value (1.85 > 1) indicates substantial right skew
Recommendations:
- Use the median instead of the mean to describe central tendency
- Consider logarithmic transformation (LN or LOG10) to normalize the distribution
- When using parametric tests—check the robustness of results with nonparametric alternatives (for example, Mann–Whitney U-test instead of t-test)