Module III·Article II·~7 min read
SPSS: Data Types and Working with Variables
Introduction to Quantitative Research
Turn this article into a podcast
Pick voices, format, length — AI generates the audio
SPSS Interface
IBM SPSS Statistics is one of the most widespread programs for statistical data analysis in the social and business sciences. After launching the program, you work in the Data Editor window, which has two viewing modes:
Data View
This is the main table for data input and review. It resembles an Excel spreadsheet:
- Columns represent variables (for example, "Age", "Gender", "Income")
- Rows represent observations (respondents, companies, cases)
- Each cell contains a single value for a particular variable of a particular observation
Variable View
This is the mode for setting up variables. Here, each row corresponds to one variable, and the columns determine its properties. Switching between modes is accomplished by tabs at the bottom of the window.
Data Types: Categorical and Numeric
All data in research is divided into two large groups:
Categorical Data
Nominal — categories without any natural order.
- Examples: gender (1 = male, 2 = female), city of residence, company sector
- In SPSS: Measure = Nominal
Ordinal — categories with a defined order, but without equal intervals between them.
- Examples: education level (1 = secondary, 2 = bachelor, 3 = master's, 4 = PhD), Likert scale
- In SPSS: Measure = Ordinal
Numeric (Quantitative) Data
Interval — numeric data with equal intervals, but without an absolute zero point.
- Examples: temperature in Celsius, year of birth, IQ score
- In SPSS: Measure = Scale
Ratio data — numeric data with an absolute zero.
- Examples: age, income, number of employees, years of work experience
- In SPSS: Measure = Scale (SPSS does not differentiate between interval and ratio data)
Setting Up Variables in Variable View
Each variable in SPSS has ten properties, which are configured in Variable View:
| Property | Description | Example |
|---|---|---|
| Name | Short variable name (no spaces, up to 64 chars) | vozrast, pol, dohod |
| Type | Data type: Numeric, String, Date, etc. | Numeric for numeric data |
| Width | Maximum number of characters | 8 |
| Decimals | Number of decimal places | 0 for integers, 2 for decimals |
| Label | Full description of the variable (shows in tables) | "Respondent age" |
| Values | Value labels for coded variables | 1 = "Male", 2 = "Female" |
| Missing | Definition of missing values | 99 = missing value |
| Columns | Column width in Data View | 8 |
| Align | Cell data alignment | Right for numeric |
| Measure | Level of measurement | Nominal, Ordinal, or Scale |
Step-by-step Example — Setting up the "Gender" Variable:
- Go to Variable View
- In the row of the new variable, enter Name: pol
- Type: Numeric
- Width: 1, Decimals: 0
- Label: Respondent’s gender
- Values: click “...” → add 1 = “Male”, 2 = “Female” → OK
- Missing: as needed (for example, 9 = not specified)
- Measure: Nominal
Entering and Importing Data
Manual Data Entry
- Set up all variables in Variable View
- Switch to Data View
- Enter values in corresponding cells row by row (each row = one respondent)
Importing Data from Excel
- File → Open → Data (or File → Import Data)
- Choose file type: Excel (*.xlsx)
- Find and open the file
- In the dialog, check "Read variable names from the first row of data" if the first row contains variable names
- Click OK — data will be loaded into SPSS
Importing Data from CSV
- File → Read Text Data
- Select the .csv file
- Follow the step-by-step wizard (Text Import Wizard), specifying the delimiter (comma, semicolon), data format, and header presence
Coding Categorical Variables
Categorical variables in SPSS are stored in numeric form with assigned value labels (Value Labels).
Example of coding the "Education Level" variable:
- 1 = Secondary
- 2 = Bachelor
- 3 = Master's
- 4 = Doctorate (PhD)
To set up: in Variable View, click the Values cell → a dialog opens → enter numeric code and text label for each value → click Add → OK.
After coding, in Data View you can switch between displaying codes (1, 2, 3, 4) and labels (Secondary, Bachelor, ...) via View → Value Labels or a button on the toolbar.
Recoding Variables (Recode)
Recoding allows you to change the values of a variable—for example, combine categories or transform a continuous variable into a categorical one.
Recode into Same Variables
Original data is replaced by new values.
- Transform → Recode into Same Variables
- Select the variable → click Old and New Values
- Specify old and new values → Add → Continue → OK
Recode into Different Variables
A new variable is created with recoded values—the original data is preserved. This method is recommended, as it allows you to retain the original data.
- Transform → Recode into Different Variables
- Select the source variable → enter the name and label for the new variable → Change
- Click Old and New Values → set up correspondences → OK
Practical Example: Recoding age into age groups:
- 18–25 → 1 (Young)
- 26–40 → 2 (Middle-aged)
- 41–60 → 3 (Older)
- 61+ → 4 (Senior)
In the Old and New Values dialog use Range to define intervals.
Computing New Variables (Compute Variable)
The Compute Variable function allows you to create new variables based on arithmetic expressions or built-in functions.
- Transform → Compute Variable
- In Target Variable field, enter the name of the new variable
- In Numeric Expression field, enter the formula
Formula examples:
- Total score:
total_score = q1 + q2 + q3 + q4 + q5 - Average score:
mean_score = MEAN(q1, q2, q3, q4, q5) - Log of income:
log_income = LN(income) - Satisfaction index:
sat_index = (sat1 + sat2 + sat3) / 3
The MEAN() function in SPSS ignores missing values, unlike simple addition and division, making it preferable for questionnaire data.
Practical Tasks
Task 1
Question: You are conducting an employee satisfaction survey. Set up the following variables in SPSS: employee ID, age, gender, department (sales, marketing, IT, HR), years of work experience, job satisfaction (scale 1–5).
Solution:
| Name | Type | Label | Values | Measure |
|---|---|---|---|---|
| id | Numeric | Employee ID | — | Scale |
| vozrast | Numeric | Age | — | Scale |
| pol | Numeric | Gender | 1=Male, 2=Female | Nominal |
| otdel | Numeric | Department | 1=Sales, 2=Marketing, 3=IT, 4=HR | Nominal |
| stazh | Numeric | Years of work experience | — | Scale |
| udovl | Numeric | Job Satisfaction | 1=Very low...5=Very high | Ordinal |
Task 2
Question: Create a new variable "Age Group" by recoding the "vozrast" variable: under 30 = "Young", 30–45 = "Middle", over 45 = "Older". Which recoding method should you use and why?
Solution: You should use Recode into Different Variables to preserve the original "vozrast" variable. Steps:
- Transform → Recode into Different Variables
- Move "vozrast" to the list → enter new variable name: vozr_group, label: "Age Group" → Change
- Old and New Values:
- Range: Lowest through 29 → 1 → Add
- Range: 30 through 45 → 2 → Add
- Range: 46 through Highest → 3 → Add
- Continue → OK
- Then in Variable View set up Value Labels: 1 = Young, 2 = Middle, 3 = Older, and Measure = Ordinal
Task 3
Question: You have 5 satisfaction questions (q1–q5) on a scale of 1–5. Calculate the average satisfaction score. Write the formula for Compute Variable.
Solution: In Transform → Compute Variable:
- Target Variable: mean_satisfaction
- Label: "Average satisfaction score"
- Numeric Expression: MEAN(q1, q2, q3, q4, q5)
Using the MEAN() function instead of (q1+q2+q3+q4+q5)/5 is preferable, as MEAN() correctly handles missing values, calculating the mean based on available answers.
§ Act · what next