Effective Sample Size

Consider the case where you have observations on the IQ of six individuals. Let say that three of the individuals are from California and three are from Florida. Assume the following data structure:

California: 90, 110, 130
Florida: 95, 100, 120

In this case, the mean IQ nationally is 107.5, the variance of the sample is 237.5, and the standard error around the mean estimate is 6.29. The sample size is of course 6.

In practice, however, more people live in California than in Florida. About twice as many people live in California as Florida. Thus, to get a national estimate (assuming for the moment that the U.S. only has 2 states), we need to weight these observations so that each California observation by twice as much as each Florida observation. To achieve this, each California observation receives a weight of 2/9 and each Florida observation receives a weight of 1/9.

When we change the weight, however, the “effective sample size” changes. Although weighting does not effect the number of observations used in the sample, some of the observations (namely those in Florida) are used less. In the extreme case, if Florida observations received a 0 weight, the effective (and actual) sample size would be 3. Since the sample size is used to calculate the standard error, this is an important consideration.

How does one determine the effective sample size?

One can calculate the design effect. The design effect quantifies the extent to which the sampling error in a survey departs from the sampling error that can be expected under simple random sampling. Thus, one can the effective sample size as follows:

Effective sample size = n/D_eff

where D_eff is the design effect from non-random sampling.

One option is to use Kish’s effective sample size. Kish’s effective sample size gives the approximate size of an equal probability sample which would be equivalent in precision to the unequal probability sample used. This is equal to the following formula:

n_eff = [Σ ω_i]² / [Σ (ω_i²)]

If the weights are equal, then n_eff = n. This workbook provides an example of how to calculate the Kish effective sample size for different weighting schemes for the California/Florida example above. As the weights become more concentrated in fewer observations, the effective sample size shrinks.

For more information see:

Kish, Leslie (1965). Survey Sampling. New York: Wiley

1 Comment

Leave a Reply Cancel reply