Unbiased Analysis of Today's Healthcare Issues

Determining Population Size

Written By: Jason Shafrin - Apr• 26•11

How do ecologists determine the size of a population?    One method is the mark and recapture (a.k.a. capture/recapture method).  This method relies on having two separate trials to capture (either physically or in data) members of certain population and determines the population size based on the proportion of specimens who are captured in both trials.

The key assumption for the capture/recapture method is that the probability of capturing any given specimen is independent for each trial.  If one was doing a capture/recapture study and one could more easily capture fat and old birds, then the likelihood of catching the same bird in the second trial would increase.  This would inflate the value of m, and thus the approximation of the population would be too low.

One application of the capture/recapture method is McClish et al. (1997)‘s examination of the size of the elderly cancer population in Virgina.  The authors estimate  the likelihood cancer patients appear in both the Virgina Cancer Registry (VCR) and the Medicare claims files (MEDPAR) for Virginia resident 65 and older.

Capture-recapture techniques were used to estimate the actual cancer population size, based on the concordance and discordance of the data sources. If VCR identifies M cases and MEDPAR identifies n cases, m of which are common to both sources, then the estimated number of cases in the entire population of cases at reporting hospitals will be N = [(M + 1) X (n + 1 )/(m + 1)] – 1. With this estimate of the population, the sensitivity of each source alone, as well as those of the combined sources, was estimated.”

The variance of the total population is simply:

  • var(N) = [(M+1)(n+1)(M-m)(n-m)]/[m+1)(m+1)(m+2)]

What explains the discrepancy between the VCR and MEDPAR data.  This study claims:

Cases not reported to the VCR were more likely to have their cancer diagnosis found in the second through fifth position in the MEDPAR record rather than in the first. That implies that cancer may not have been the primary reason for the hospitalization, so that MEDPAR may have identified a prevalent rather than an incident case. While the method used here has been used by others, some cases termed incident may have been prevalent cases seen as a recurrence after more than 2 years. More complex methods to identify incident cases that look at the position of cancer diagnostic code and the surgical interventions might improve this, but at the expense of missing more cases.

Source:  Donna Katzman McClish, Lynne Penberthy, Martha Whittemore, Craig Newschaffer, Diane Woolard, Chnstopher E. Desch, and Sheldon Retchin. Ability of Medicare Claims Data and Cancer Registries to Identify Cancer Cases and Treatment.  Am J Epidemiology. 1997, 145(3): 227-233.



  • Breast: 174-174.9; 233; 233.0
  • Colorectal: 153-153.9; 154; 154.0; 154.1; 230.3;230.4
  • Lung: 162-162.9; 231; 231.2; 231.9
  • Prostate: 185–185.9; 233.4
  • Breast
    • Definitive surgical therapy: 85.4-85.48; 85.21-85.23
    • Biopsy: 85.11; 85.12; 85.19
  • Colorectal:
    • Definitive surgical therapy: 45.4-45.43; 45.49;45.7-45.79; 45.8; 48.3-48.35; 48.4-48.49; 48.5;48.6-48.69
    • Biopsy: 45.23-45.29; 48.23-48.29
  • Lung:
    • Definitive surgical therapy: 32.0-32.2; 32.28-32.29
    • Biopsy: 33.22-33.29
  • Prostate:
    • Definitive surgical therapy: 60.2-60.69
    • Diagnostic procedure: 60.11; 60.12; 60.18; 60.2

    You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

    Leave a Reply

    Your email address will not be published. Required fields are marked *