Module VII·Article II·~7 min read
Secondary Data and Their Use
Advanced Quantitative Methods
Turn this article into a podcast
Pick voices, format, length — AI generates the audio
What Are Secondary Data?
Secondary data are data that were previously collected by someone else for different purposes but can be used by a researcher to solve their own research question. In contrast to primary data, which the researcher collects independently "first hand" (via surveys, interviews, experiments), secondary data already exist in ready-made form.
Primary data — data collected by the researcher specifically for the current research. They precisely fit the research objectives but require significant time and resource investments.
Secondary data — data collected by other individuals or organizations for their purposes. The researcher adapts them to their research question. They are available faster and at lower cost, but may not fully meet the needs of the current project.
Sekaran and Bougie (2016) emphasize that using secondary data is an important stage of any research: even if the researcher plans to collect primary data, they should first examine available secondary sources to formulate hypotheses and contextualize the problem.
Sources of Secondary Data
1. Government Statistics
Government agencies regularly publish extensive datasets:
- Statistical bureaus — data on population, employment, incomes, prices, industrial production (for example, Rosstat in Russia, ONS in the United Kingdom, BLS in the USA)
- Central banks — financial and macroeconomic statistics (interest rates, inflation, money supply)
- Ministries and departments — sectoral data (education, healthcare, trade)
- International organizations — World Bank, IMF, UN, OECD publish cross-country comparative data
2. Corporate Sources
- Annual company reports — financial indicators, strategic initiatives
- Internal databases — records of sales, customers, enquiries, HR data
- Industry associations — market reviews, benchmarking
- Commercial databases — Bloomberg, Thomson Reuters, Statista
3. Academic and Research Sources
- Academic journals and publications — previously collected data from other researchers
- Dissertations and theses — appendices with data
- Data repositories — UK Data Archive, ICPSR, Harvard Dataverse
- Surveys and monitoring studies — World Values Survey, Eurobarometer, Global Entrepreneurship Monitor
4. Media and Archival Sources
- Newspapers and magazines — for content analysis
- Corporate archives — historical documents, meeting minutes
- Internet sources — websites, social networks, forums (subject to ethical guidelines)
Advantages of Using Secondary Data
| Advantage | Description |
|---|---|
| Time-saving | The data are already collected; the entire collection process need not be repeated |
| Cost-saving | Significantly cheaper than conducting one's own large-scale research |
| Large samples | Government surveys often cover thousands of respondents |
| Longitudinal comparisons | Trends can be tracked over long periods (e.g., data for 10–20 years) |
| High collection quality | Large organizations apply strict methodological standards |
| Possibility of cross-country comparisons | International databases allow comparison of countries and regions |
| Reproducibility | Other researchers can verify results by using the same data |
Disadvantages and Limitations of Secondary Data
| Limitation | Description |
|---|---|
| Mismatch with objectives | Data were collected for other purposes and may lack the required variables |
| Obsolescence | Data may be too old for the current study |
| Unknown quality | The researcher did not control the collection process and is unaware of all errors |
| Differences in definitions | Operationalization of concepts may differ from what is needed |
| Limited access | Some data are paid or available only upon request |
| Aggregation | Data may be provided only in an aggregated form without access to individual responses |
| Lack of control | It is impossible to change the data collection instrument or add variables |
Assessing the Quality of Secondary Data
Before using secondary data, the researcher must critically assess their quality based on the following criteria:
1. Who collected the data? The authority of the source: government agencies and large research centers usually provide higher quality than little-known organizations.
2. Why were the data collected? The purpose of collection may affect bias. For example, data collected by a company to promote its product may be biased.
3. How were the data collected? It is important to study the methodology: sampling method, collection instrument, sample size, response rate. The absence of methodological documentation is a serious sign of possible problems.
4. When were the data collected? The relevancy of the data depends on the research subject. For rapidly changing markets (technology, fashion), data from two years ago may be outdated.
5. How do the data align with other sources? Comparison with other similar sources allows anomalies to be detected and confidence in reliability to be increased.
Using Secondary Data in SPSS
When working with secondary data in SPSS, the following algorithm is recommended:
- Importing data: File → Open → Data. SPSS supports the .sav, .csv, .xlsx formats. When importing from CSV it is important to correctly specify the delimiter and encoding.
- Checking structure: Go to Variable View and check the variable names, data types, measurement levels (nominal, ordinal, interval), value labels.
- Data cleaning: Use Analyze → Descriptive Statistics → Frequencies to detect missing values and outliers. Check for logical consistency in the responses.
- Recoding variables: If operationalization does not match yours, use Transform → Recode into Different Variables to bring the data to the required format.
- Merging files: If you need to merge data from several sources, use Data → Merge Files (Add Variables or Add Cases).
- Weighting: If the data specify weights (which is typical for large surveys), activate them via Data → Weight Cases.
Practical Assignments
Assignment 1
Question: A researcher wants to study the relationship between the unemployment rate and crime rate in Russian regions over the last 10 years. Identify suitable secondary data sources and assess their advantages for this research.
Solution: Suitable sources:
- Rosstat — data on the unemployment rate by region for each year
- Ministry of Internal Affairs of Russia — statistics on registered crimes by region
- Unified Interdepartmental Information and Statistical System (EMISS) — aggregated data from various agencies
Advantages for this research:
- The longitudinal nature allows trends to be analyzed over 10 years
- Broad coverage (all Russian regions) ensures representativeness
- Standardized data collection methodology ensures data comparability
- Time-saving: primary data collection on such a scale would require enormous resources
Assignment 2
Question: You are provided with a set of secondary data on employee satisfaction at a large company for 2019. The data contain 500 observations, but methodological documentation is lacking. List the risks of using these data and the steps to take before analysis.
Solution: Risks:
- The sample is unknown: probability or convenience sampling
- The response rate is unknown — there may be systematic non-response bias
- The wording of questions is unknown — leading questions may be present
- Data from 2019 may not reflect the current situation (especially after the COVID-19 pandemic)
- Operationalization of "satisfaction" may not match the researcher's definition
Steps before analysis:
- Contact the data authors and request methodological documentation
- Conduct descriptive analysis in SPSS (Frequencies, Descriptives) to identify anomalies
- Check for missing values and patterns of missingness
- Compare distributions of key variables with similar studies
- Document all limitations when interpreting the results
Assignment 3
Question: Which type of data (primary or secondary) would you recommend for the following research questions? Justify your answer.
a) "How has the structure of consumer expenditures in Russia changed over the past 5 years?" b) "Which motivation factors are most important for employees of a specific IT company?"
Solution: a) Secondary data — Rosstat regularly publishes data on the structure of household consumer expenditures. Collecting primary data of such scale would be impractical. Government longitudinal data provide reliable time series for trend analysis.
b) Primary data — the question concerns a specific company, and universal secondary data will not reflect the specifics of its corporate culture. It is necessary to conduct your own employee survey using validated motivation scales, adapted to the context of the IT sector.
§ Act · what next