SPSS: Data Types and Working with Variables

SPSS Interface

IBM SPSS Statistics is one of the most widespread programs for statistical data analysis in the social and business sciences. After launching the program, you work in the Data Editor window, which has two viewing modes:

Data View

This is the main table for data input and review. It resembles an Excel spreadsheet:

Columns represent variables (for example, "Age", "Gender", "Income")
Rows represent observations (respondents, companies, cases)
Each cell contains a single value for a particular variable of a particular observation

Variable View

This is the mode for setting up variables. Here, each row corresponds to one variable, and the columns determine its properties. Switching between modes is accomplished by tabs at the bottom of the window.

Data Types: Categorical and Numeric

All data in research is divided into two large groups:

Categorical Data

Nominal — categories without any natural order.

Examples: gender (1 = male, 2 = female), city of residence, company sector
In SPSS: Measure = Nominal

Ordinal — categories with a defined order, but without equal intervals between them.

Examples: education level (1 = secondary, 2 = bachelor, 3 = master's, 4 = PhD), Likert scale
In SPSS: Measure = Ordinal

Numeric (Quantitative) Data

Interval — numeric data with equal intervals, but without an absolute zero point.

Examples: temperature in Celsius, year of birth, IQ score
In SPSS: Measure = Scale

Ratio data — numeric data with an absolute zero.

Examples: age, income, number of employees, years of work experience
In SPSS: Measure = Scale (SPSS does not differentiate between interval and ratio data)

Setting Up Variables in Variable View

Each variable in SPSS has ten properties, which are configured in Variable View:

Property	Description	Example
Name	Short variable name (no spaces, up to 64 chars)	vozrast, pol, dohod
Type	Data type: Numeric, String, Date, etc.	Numeric for numeric data
Width	Maximum number of characters	8
Decimals	Number of decimal places	0 for integers, 2 for decimals
Label	Full description of the variable (shows in tables)	"Respondent age"
Values	Value labels for coded variables	1 = "Male", 2 = "Female"
Missing	Definition of missing values	99 = missing value
Columns	Column width in Data View	8
Align	Cell data alignment	Right for numeric
Measure	Level of measurement	Nominal, Ordinal, or Scale

Step-by-step Example — Setting up the "Gender" Variable:

Go to Variable View
In the row of the new variable, enter Name: pol
Type: Numeric
Width: 1, Decimals: 0
Label: Respondent’s gender
Values: click “...” → add 1 = “Male”, 2 = “Female” → OK
Missing: as needed (for example, 9 = not specified)
Measure: Nominal

Entering and Importing Data

Manual Data Entry

Set up all variables in Variable View
Switch to Data View
Enter values in corresponding cells row by row (each row = one respondent)

Importing Data from Excel

File → Open → Data (or File → Import Data)
Choose file type: Excel (*.xlsx)
Find and open the file
In the dialog, check "Read variable names from the first row of data" if the first row contains variable names
Click OK — data will be loaded into SPSS

Importing Data from CSV

File → Read Text Data
Select the .csv file
Follow the step-by-step wizard (Text Import Wizard), specifying the delimiter (comma, semicolon), data format, and header presence

Coding Categorical Variables

Categorical variables in SPSS are stored in numeric form with assigned value labels (Value Labels).

Example of coding the "Education Level" variable:

1 = Secondary
2 = Bachelor
3 = Master's
4 = Doctorate (PhD)

To set up: in Variable View, click the Values cell → a dialog opens → enter numeric code and text label for each value → click Add → OK.

After coding, in Data View you can switch between displaying codes (1, 2, 3, 4) and labels (Secondary, Bachelor, ...) via View → Value Labels or a button on the toolbar.

Recoding Variables (Recode)

Recoding allows you to change the values of a variable—for example, combine categories or transform a continuous variable into a categorical one.

Recode into Same Variables

Original data is replaced by new values.

Transform → Recode into Same Variables
Select the variable → click Old and New Values
Specify old and new values → Add → Continue → OK

Recode into Different Variables

A new variable is created with recoded values—the original data is preserved. This method is recommended, as it allows you to retain the original data.

Transform → Recode into Different Variables
Select the source variable → enter the name and label for the new variable → Change
Click Old and New Values → set up correspondences → OK

Practical Example: Recoding age into age groups:

18–25 → 1 (Young)
26–40 → 2 (Middle-aged)
41–60 → 3 (Older)
61+ → 4 (Senior)

In the Old and New Values dialog use Range to define intervals.

Computing New Variables (Compute Variable)

The Compute Variable function allows you to create new variables based on arithmetic expressions or built-in functions.

Transform → Compute Variable
In Target Variable field, enter the name of the new variable
In Numeric Expression field, enter the formula

Formula examples:

Total score: total_score = q1 + q2 + q3 + q4 + q5
Average score: mean_score = MEAN(q1, q2, q3, q4, q5)
Log of income: log_income = LN(income)
Satisfaction index: sat_index = (sat1 + sat2 + sat3) / 3

The MEAN() function in SPSS ignores missing values, unlike simple addition and division, making it preferable for questionnaire data.

Practical Tasks

Task 1

Question: You are conducting an employee satisfaction survey. Set up the following variables in SPSS: employee ID, age, gender, department (sales, marketing, IT, HR), years of work experience, job satisfaction (scale 1–5).

Solution:

Name	Type	Label	Values	Measure
id	Numeric	Employee ID	—	Scale
vozrast	Numeric	Age	—	Scale
pol	Numeric	Gender	1=Male, 2=Female	Nominal
otdel	Numeric	Department	1=Sales, 2=Marketing, 3=IT, 4=HR	Nominal
stazh	Numeric	Years of work experience	—	Scale
udovl	Numeric	Job Satisfaction	1=Very low...5=Very high	Ordinal

Task 2

Question: Create a new variable "Age Group" by recoding the "vozrast" variable: under 30 = "Young", 30–45 = "Middle", over 45 = "Older". Which recoding method should you use and why?

Solution: You should use Recode into Different Variables to preserve the original "vozrast" variable. Steps:

Transform → Recode into Different Variables
Move "vozrast" to the list → enter new variable name: vozr_group, label: "Age Group" → Change
Old and New Values:
- Range: Lowest through 29 → 1 → Add
- Range: 30 through 45 → 2 → Add
- Range: 46 through Highest → 3 → Add
Continue → OK
Then in Variable View set up Value Labels: 1 = Young, 2 = Middle, 3 = Older, and Measure = Ordinal

Task 3

Question: You have 5 satisfaction questions (q1–q5) on a scale of 1–5. Calculate the average satisfaction score. Write the formula for Compute Variable.

Solution: In Transform → Compute Variable:

Target Variable: mean_satisfaction
Label: "Average satisfaction score"
Numeric Expression: MEAN(q1, q2, q3, q4, q5)

Using the MEAN() function instead of (q1+q2+q3+q4+q5)/5 is preferable, as MEAN() correctly handles missing values, calculating the mean based on available answers.