Module III·Article II·~7 min read

SPSS: Data Types and Working with Variables

Introduction to Quantitative Research

Turn this article into a podcast

Pick voices, format, length — AI generates the audio

SPSS Interface

IBM SPSS Statistics is one of the most widespread programs for statistical data analysis in the social and business sciences. After launching the program, you work in the Data Editor window, which has two viewing modes:

Data View

This is the main table for data input and review. It resembles an Excel spreadsheet:

  • Columns represent variables (for example, "Age", "Gender", "Income")
  • Rows represent observations (respondents, companies, cases)
  • Each cell contains a single value for a particular variable of a particular observation

Variable View

This is the mode for setting up variables. Here, each row corresponds to one variable, and the columns determine its properties. Switching between modes is accomplished by tabs at the bottom of the window.

Data Types: Categorical and Numeric

All data in research is divided into two large groups:

Categorical Data

Nominal — categories without any natural order.

  • Examples: gender (1 = male, 2 = female), city of residence, company sector
  • In SPSS: Measure = Nominal

Ordinal — categories with a defined order, but without equal intervals between them.

  • Examples: education level (1 = secondary, 2 = bachelor, 3 = master's, 4 = PhD), Likert scale
  • In SPSS: Measure = Ordinal

Numeric (Quantitative) Data

Interval — numeric data with equal intervals, but without an absolute zero point.

  • Examples: temperature in Celsius, year of birth, IQ score
  • In SPSS: Measure = Scale

Ratio data — numeric data with an absolute zero.

  • Examples: age, income, number of employees, years of work experience
  • In SPSS: Measure = Scale (SPSS does not differentiate between interval and ratio data)

Setting Up Variables in Variable View

Each variable in SPSS has ten properties, which are configured in Variable View:

PropertyDescriptionExample
NameShort variable name (no spaces, up to 64 chars)vozrast, pol, dohod
TypeData type: Numeric, String, Date, etc.Numeric for numeric data
WidthMaximum number of characters8
DecimalsNumber of decimal places0 for integers, 2 for decimals
LabelFull description of the variable (shows in tables)"Respondent age"
ValuesValue labels for coded variables1 = "Male", 2 = "Female"
MissingDefinition of missing values99 = missing value
ColumnsColumn width in Data View8
AlignCell data alignmentRight for numeric
MeasureLevel of measurementNominal, Ordinal, or Scale

Step-by-step Example — Setting up the "Gender" Variable:

  1. Go to Variable View
  2. In the row of the new variable, enter Name: pol
  3. Type: Numeric
  4. Width: 1, Decimals: 0
  5. Label: Respondent’s gender
  6. Values: click “...” → add 1 = “Male”, 2 = “Female” → OK
  7. Missing: as needed (for example, 9 = not specified)
  8. Measure: Nominal

Entering and Importing Data

Manual Data Entry

  1. Set up all variables in Variable View
  2. Switch to Data View
  3. Enter values in corresponding cells row by row (each row = one respondent)

Importing Data from Excel

  1. File → Open → Data (or File → Import Data)
  2. Choose file type: Excel (*.xlsx)
  3. Find and open the file
  4. In the dialog, check "Read variable names from the first row of data" if the first row contains variable names
  5. Click OK — data will be loaded into SPSS

Importing Data from CSV

  1. File → Read Text Data
  2. Select the .csv file
  3. Follow the step-by-step wizard (Text Import Wizard), specifying the delimiter (comma, semicolon), data format, and header presence

Coding Categorical Variables

Categorical variables in SPSS are stored in numeric form with assigned value labels (Value Labels).

Example of coding the "Education Level" variable:

  • 1 = Secondary
  • 2 = Bachelor
  • 3 = Master's
  • 4 = Doctorate (PhD)

To set up: in Variable View, click the Values cell → a dialog opens → enter numeric code and text label for each value → click AddOK.

After coding, in Data View you can switch between displaying codes (1, 2, 3, 4) and labels (Secondary, Bachelor, ...) via View → Value Labels or a button on the toolbar.

Recoding Variables (Recode)

Recoding allows you to change the values of a variable—for example, combine categories or transform a continuous variable into a categorical one.

Recode into Same Variables

Original data is replaced by new values.

  • Transform → Recode into Same Variables
  • Select the variable → click Old and New Values
  • Specify old and new values → AddContinueOK

Recode into Different Variables

A new variable is created with recoded values—the original data is preserved. This method is recommended, as it allows you to retain the original data.

  • Transform → Recode into Different Variables
  • Select the source variable → enter the name and label for the new variable → Change
  • Click Old and New Values → set up correspondences → OK

Practical Example: Recoding age into age groups:

  • 18–25 → 1 (Young)
  • 26–40 → 2 (Middle-aged)
  • 41–60 → 3 (Older)
  • 61+ → 4 (Senior)

In the Old and New Values dialog use Range to define intervals.

Computing New Variables (Compute Variable)

The Compute Variable function allows you to create new variables based on arithmetic expressions or built-in functions.

  • Transform → Compute Variable
  • In Target Variable field, enter the name of the new variable
  • In Numeric Expression field, enter the formula

Formula examples:

  • Total score: total_score = q1 + q2 + q3 + q4 + q5
  • Average score: mean_score = MEAN(q1, q2, q3, q4, q5)
  • Log of income: log_income = LN(income)
  • Satisfaction index: sat_index = (sat1 + sat2 + sat3) / 3

The MEAN() function in SPSS ignores missing values, unlike simple addition and division, making it preferable for questionnaire data.

Practical Tasks

Task 1

Question: You are conducting an employee satisfaction survey. Set up the following variables in SPSS: employee ID, age, gender, department (sales, marketing, IT, HR), years of work experience, job satisfaction (scale 1–5).

Solution:

NameTypeLabelValuesMeasure
idNumericEmployee IDScale
vozrastNumericAgeScale
polNumericGender1=Male, 2=FemaleNominal
otdelNumericDepartment1=Sales, 2=Marketing, 3=IT, 4=HRNominal
stazhNumericYears of work experienceScale
udovlNumericJob Satisfaction1=Very low...5=Very highOrdinal

Task 2

Question: Create a new variable "Age Group" by recoding the "vozrast" variable: under 30 = "Young", 30–45 = "Middle", over 45 = "Older". Which recoding method should you use and why?

Solution: You should use Recode into Different Variables to preserve the original "vozrast" variable. Steps:

  1. Transform → Recode into Different Variables
  2. Move "vozrast" to the list → enter new variable name: vozr_group, label: "Age Group" → Change
  3. Old and New Values:
    • Range: Lowest through 29 → 1 → Add
    • Range: 30 through 45 → 2 → Add
    • Range: 46 through Highest → 3 → Add
  4. Continue → OK
  5. Then in Variable View set up Value Labels: 1 = Young, 2 = Middle, 3 = Older, and Measure = Ordinal

Task 3

Question: You have 5 satisfaction questions (q1–q5) on a scale of 1–5. Calculate the average satisfaction score. Write the formula for Compute Variable.

Solution: In Transform → Compute Variable:

  • Target Variable: mean_satisfaction
  • Label: "Average satisfaction score"
  • Numeric Expression: MEAN(q1, q2, q3, q4, q5)

Using the MEAN() function instead of (q1+q2+q3+q4+q5)/5 is preferable, as MEAN() correctly handles missing values, calculating the mean based on available answers.

§ Act · what next