Econometrics

Poisson Distribution Estimation

The Poisson distribution is one that is often used in health economics.  Wikipedia has a nice basic summary of the Poisson distribution; Wolfram MathWorld gives a more sophisticated analysis.  The distribution is

f(k;\lambda)=\frac{e^{-\lambda} \lambda^k}{k!},\,\!

where ‘λ‘ is equal to the number of expected occurrences in a period.  The distribution expresses the probability of a number of events (‘k‘) occurring in a fixed period of time if these events occur with a known average rate, and are independent of the time since the last event.  The variance and the mean for a Poisson distribution are the same.  Healthcare economists can use the distribution to determine how different variables (eg: income, smoking, medical treatments) affect the probability of observing the occurrence of a certain number of events (eg: illnesses, deaths, etc.). 

Let’s look at an example:

In the US in 2000, there were approximately 15 million people aged 55-59 years of age.  The death rate for this age group was approximately 750 deaths per year per 100,000 individuals.  Thus, the expected number of deaths (‘λ‘) per year is 112,500.  What factors impact the lambda term?  An economists may be interested in whether increasing average real income for the cohort would increase or decrease the death rate.  We can model lambda as a function of other covariates (‘x‘) such as the whether or not the individual has health insurance, if they are a smoker, and their income level and their corresponding coefficients (‘β‘).  Thus our new equation is:

  • f(k;x)=exp[-λ(x;β)] * [λ(x;β)]^k / k!

If we have observations from multiple census years (and if we assume that the 55-59 age cohort is of the same size each year), we can estimate this coefficients (β) using a log likelihood function:

  • l_i(β)=k_i *log [λ(x;β)] – [λ(x;β)]

The ‘k!‘ term drops out because it does not depend on the parameter β. Each observation (‘i‘) corresponds to data from each census year in the sample.  The most common form for λ(x;β) to take is λ(x;β)=exp().  If the variable x_j is continuous and we assume λ(x;β)=exp(x;β), then we can show:

  • ∂{E(k|x)} / ∂{x_j} = exp()*β_j
  • β_j = ∂{log [E(k|x)]} / ∂{x_j}

Now that we know β_j, the economists knows the impact that any covariate ‘x‘ (such income or smoking, or health insurance) will have on the average death rate (λ).