Module VII·Article I·~4 min read

Sampling and Questionnaire Design

Advanced Quantitative Methods

Turn this article into a podcast

Pick voices, format, length — AI generates the audio

Secondary Data

Secondary data are data that have been collected by other researchers or organizations for other purposes, but can be used in your research.

Sources of secondary data:

  • Government statistics — Rosstat, Eurostat, OECD, World Bank
  • Corporate reports — annual reports, financial statements
  • Databases — COMPUSTAT, Bloomberg, Bureau van Dijk
  • Previous research — published datasets

Advantages of secondary data:

  • Saves time and resources
  • Access to large volumes of data
  • Possibility of longitudinal analysis
  • Data are often of high quality (collected by professional organizations)

Disadvantages:

  • Data may not fully match your research questions
  • No control over data collection quality
  • Possible relevance issues
  • Limited information about collection methodology

Sampling Methods

Probability Sampling

Each element of the population has a known probability of being included in the sample.

Simple Random Sampling

  • Each element has an equal probability of selection
  • Uses a random number generator
  • Requires a complete list of the population (sampling frame)

Systematic Sampling

  • Every k-th element from the list is selected
  • $k = N / n$ (population size / desired sample size)
  • Example: from 1000 employees, sample 100 $\rightarrow$ $k = 10$, select every 10th

Stratified Sampling

  • The population is divided into homogeneous subgroups (strata)
  • A proportional number is selected from each stratum
  • Example: 60% men and 40% women $\rightarrow$ in a sample of 100 people, 60 men and 40 women

Cluster Sampling

  • The population is divided into groups (clusters), usually geographically
  • Entire clusters are randomly selected
  • Example: randomly select 10 out of 50 company branches and survey all employees in the selected branches

Non-Probability Sampling

The probability of an element being included in the sample is unknown.

Convenience Sampling — selection of the most accessible participants. Fast, but with high risk of bias.

Quota Sampling — non-probability analogue of stratified sampling. Quotas are determined by characteristics, but within quotas selection is non-targeted.

Snowball Sampling — each participant refers the next. Used for hard-to-reach groups.

Questionnaire Design

Types of Questions:

Closed-ended questions — the respondent chooses from the given options:

  • Dichotomous (Yes/No)
  • Multiple choice
  • Likert scale (1-5 or 1-7)
  • Ranking

Open-ended questions — the respondent answers freely. Provide rich data, but are complex to analyze.

Likert Scale

The most common scale in quantitative research:

12345
Strongly disagreeDisagreeNeutralAgreeStrongly agree

Principles of a Good Questionnaire:

1. Clarity of wording — questions should be clear and unambiguous
2. Avoid double-barreled questions — “Are you satisfied with your salary and working conditions?” → two separate questions
3. Avoid leading questions — “Don’t you think that management is doing an excellent job?” → biased question
4. Logical sequence — from simple to complex, from general to specific
5. Piloting — testing the questionnaire on a small group before the main study

Validity and Reliability

Validity — does the instrument measure what it is intended to measure?

  • Content validity — does the instrument cover all aspects of the concept?
  • Construct validity — does the instrument measure this construct specifically?
  • Criterion validity — do the results correlate with an external criterion?

Reliability — does the instrument yield consistent results upon repeated use?

  • Cronbach’s Alpha — measure of internal consistency. $\alpha \geq 0.7$ is considered acceptable.
  • Test-retest — consistency of results upon repeated testing.

Practical Assignments

Assignment 1

Question: A company with 5000 employees wants to conduct a satisfaction survey. Of them, 60% work in the office, 40% — in production. Which type of sampling would you recommend and why? Calculate the sample size for each stratum with a total sample size of 200 people.

Solution: Stratified sampling is recommended:

  1. Two strata: office workers and production workers
  2. Proportional allocation:
    • Office: $200 \times 0.60 = \textbf{120 people}$
    • Production: $200 \times 0.40 = \textbf{80 people}$
  3. Justification: stratification ensures that both groups are adequately represented. Working conditions and satisfaction factors can differ significantly between office and production.
  4. Within each stratum, systematic sampling can be used: for office $k = 3000/120 \approx 25$ (every 25th), for production $k = 2000/80 = 25$

Assignment 2

Question: Identify problems in the following survey question: “Don’t you think that our company provides excellent opportunities for career growth and training?”

Solution: Problems:

  1. Leading question — the word “excellent” prompts a positive response
  2. Negative phrasing — “Don’t you think” may confuse the respondent
  3. Double-barreled question — combines career growth AND training (the respondent may be satisfied with one, but not the other)

Revised variants (two separate questions with neutral wording):

  • “Please rate your satisfaction with career growth opportunities in the company” (scale 1-5)
  • “Please rate your satisfaction with learning and development opportunities in the company” (scale 1-5)

§ Act · what next