Is cluster sampling a good fit for your survey?
2/20/14 / Matt Bruce
Here at Corona, we strive to help our clients maximize the value of their research budgets, often by suggesting solutions that get the job done faster, better, or at a reduced cost. In survey research, developing an accurate sampling frame (i.e., a list of the study population and their contact information) is instrumental for success, but sometimes developing or acquiring a sampling frame can be time consuming, expensive, or impractical. Using a cluster sampling technique is one potential solution that can save time or money while maintaining the integrity of the research and results.
What is cluster sampling? Cluster sampling, as the name implies, groups your total study population into many small clusters, typically defined by a proximity variable. For example, street blocks in a neighborhood are clusters of households and residents; schools represent clusters of employees that work in the same school district. The main difference between simple random sampling and cluster sampling is instead of selecting a random sample of individuals, you select a random sample of clusters. This approach provides a representative sample that is appropriate for the use of inferential statistics that draw conclusions about the broader population.
How to use cluster sampling: First, make sure the nature of your research question is compatible with cluster sampling; if your analysis will require completely independent respondents, then this is probably not the best approach. Second, consider the configuration of your population; you must be able to group people by defined boundaries, such as a city blocks or office building floors. After grouping your population into small clusters, use a random number generator to draw a random sample of clusters (rather than a sample of individuals). Typically, every individual from those selected clusters are sampled, although you can infuse your sampling plan with other techniques such as stratified or systematic sampling. As long as 1) you can match every person in the population with a cluster, 2) you have an appropriate person to cluster ratio, and 3) assuming you have a complete list of clusters, you can use these groupings as a sampling shortcut.
When might cluster sampling be useful? Cluster sampling is useful when you don’t have enough resources to develop a complete sampling frame or when it takes significant effort to distribute or collect surveys (such as going door-to-door). For example, if we wanted to survey bus riders within a city, it would be impractical to develop a list of all bus riders on any given day, let alone to find our random sample of individuals and give them all surveys. Cluster sampling allows us to select a random sample of bus routes and times, and then survey everyone on those buses. Although individual clusters may not be representative of the population as a whole, when you select enough clusters at random, your sample as a whole will be representative.
Potential problems: Cluster sampling should be applied with caution, and there are some disadvantages to using cluster sampling compared to a simple random approach. It is better to sample more, smaller clusters than fewer, larger clusters. For example, for a nationwide survey it is better to cluster by counties than by states. If your clusters are too few and too large, you might draw a sample that does not adequately represent the population. The size and homogeny of each cluster and your final sample size desired also impact the viability of cluster sampling.
At Corona, we start fresh with each research project, and we are full of solutions that can help maximize the value of your research budget and resources. If you are struggling with how to reach your population of interest, give us a call, maybe we can shed some light on the situation.