Econometrics

Why not to use a normal distribution to approximate a binomial distribution

Most people know that under the central limit theory claims, the distribution of the mean of a distribution will be normally distributed as the number of observations gets large.  The question is, if we have a series of discrete events that we want to approximate the distribution of the mean with a continuous distribution, should we estimate them with a normal distribution?

For instance, let us assume we have 20 observations on patient admittance to the hosptial and in 3 of those cases, the individual died.  we can use a binomial distribution to estimate the distribution of the prior as:

  •  nCrr(1-π) n-r

We can estimate π with the 3/20 = 0.15.  For our prior distribution, we could fit a normal distribution.  Using a normal distribution, however, would include values less than 0.  This is especially problematic if there is a small samples sizes (e.g., n=20).  A truncated normal would solve the problem of negative values, but eliminating one portion of the distribution will change the distribution’s mean and variance.

Another option is to use the beta distribution for the prior.  The beta distribution for the value of π is:

  • p(π) = {Γ(α + β)/[Γ(α)Γ(β)]} πα-1(1-π)β-1

If we apply Bayes’ theorm to the binomial data with a beta prior, we get:

  • p(π) ∝ πr(1-π)n-rπα-1(1-π)β-1
  • p(π) ∝ πα+r-1*(1-π)β + n-r -1

Now we have that the posterior distribution is Beta(α+r-1,β + n-r -1).  We already know r and n, and can match α and β with the methods of moments.

  • E(θ) = α/(α + β)
  • var(θ) = αβ/[(α + β)2(α + β+1)]

Now we estimate E(θ) and var(θ) with the sample moments. If 3/20 people died, then we estimate E(θ) with 3/20 = 0.15. Further, with a binomial distribution, we can estimate var(θ) with p(1-p)/n = .15*.85/20 = .00638. This means that the s(θ)=.006381/2 = .07984. Thus we can solve for α and β since we now have 2 equations and two unknowns.

Leave a Reply

Your email address will not be published. Required fields are marked *