Heckman

You are currently browsing articles tagged Heckman.

Much of health care data is characterized by a large cluster of data at 0, and a right skewed distribution of the remaining outcomes. For instance, people who do not get sick generally use $0 of medical care. Those who do get sick, use a varying amount of medical care dollars, but there are a large number of outliers with extremely expensive medical care. How do health economists take these anomalies into account?

David Madden looks at two alternatives to correct for the shape of the distribution in his 2008 JHE paper: sample selection and two-part models. Zero consumption of medical can be caused from two different decisions: a participation decision and a consumption decision. For instance, in the case of smoking, individuals may decide not to smoke no matter how cheap cigarettes get (participation decision). On the other hand, some smokers may decide not to smoke during a given time period because cigarettes are very expensive or they have low income (consumption decision). Since people can not smoke negative cigarettes, there still may be a cluster of observations around zero.

Assume that individuals utility from participation is equal to w=α’Z + v. If w>0, then d=1, (the individual participates) and if w<0, then d=0, (the individual does not participate). For consumption, individuals will choose y**=max[0,y*]; y*= β’X + u. A general model can be written as follows:

  • L0 = Π0 [1-P(v>-α'Z) P(u>-β'X |v>-α'Z)] Π+ P(v>-α’Z) P(u > -β’X|v>-α’Z) g(y|v>-α’Z,u > -β’X)

If u and v are independent, then we have the Cragg model:

  • L10 [1-P(v>-α'Z) P(u>-β'X)] Π+ P(v>-α’Z) P(u > -β’X) g(y|u > -β’X)

If we assume that the participation constraint dominates the consumption constraint (which is likely in the smoking example, but maybe not for drinking), then we have P(y*>0|d=1)=1 and g(y*|y*>0,d=1)=g(y*|d=1). This means that if you are a smoker you will have at least one cigarette per period. When the participation constraint dominates, we ignore the consumption decision and we have the following likelihood function which corresponds to the Heckman Selection model.

  • L20 [1-P(v>-α’Z) Π+ P(v>-α’Z) g(y|v>-α’Z)

If independence is assumed, then we are left with probit for participation and OLS for consumption. This is the two part model:

  • L30 [1-P(v>-α’Z) Π+ P(v>-α’Z) g(y)

Which of these models works best empirically?

Results

Madden looks at the fit of regressions trying to model smoking and drinking behavior using a wide variety of covariates. In general, the two-part model seems to be perform better in the data used for this study, but the author wisely notes that deciding between the Heckman selection and the two-part model should be done on a case-by-case basis.

Tags: , , , ,

Traditional instrumental variables (IV) econometric methodologies often fail to take into account response heterogeneity. Response heterogeneity based on characteristics not observed by the researcher can create a heterogeneity in the self-selection process. For instance, one group of people who elect to receive surgery may have knowledge of a family history where surgery is typically successful, whereas another group may elect not to receive surgery due to a different family history. If this information is unobservable to the researcher than an analysis of the average of effect of surgery may be biased. In the medical context, traditional IV assumes that:

  1. treatment effects are constant conditional on observed characteristics, or
  2. if treatment effects are heterogeneous, patients or physicians cannot anticipate these effects and use this information to select the most beneficial treatment.

In traditional IV, the treatment parameter gives researchers a local average treatment effect (LATE). But can a researcher characterize a heterogeneous response using IV? A solution to this problem is presented by Basu, Heckman, Navarro-Lozano and Urzua in a 2007 Health Economics paper. They use a local IV to estimate marginal treatment effect (MTE) parameters.

Basic Econometrics Review

Let us assume that a person will have two different outcomes based on whether or not they are treated:

  • Y1 = μ1(X) + U1
  • Y0 = μ0(X) + U0
  • Δ = Y1 – Y0 = {μ1(X) -μ0(X)} + (U1 – U0)
  • Y=μ0(X)+D*{μ1(X)-μ0(X)} + {D(U1 – U0) + U0}

The variable Y1 represents the outcome if the person is treated and Y0 represents the outcome if they are not treated. We only have one observation per person, however, since we cannot observe the counterfactual. If we could observe the counterfactual, Δ would give us the effect of the treatment for each person. Unfortunately we only observe Y. The dummy variable D is equal to unity if the person is in the treatment group and zero otherwise. If there were a randomized trial where people are randomly placed into the treatment and control groups, it would be easy to estimate the treatment effect by comparing the mean outcomes of the treated and control groups. We could examine the mean outcomes for individuals with similar characteristics to determine the treatment parameter by subgroup. However, if individuals can select whether or not to be treated, the error term–which may be composed of unobserved heterogeneity in the effectiveness of the treatment–may be correlated with the regressors that impact the outcome.

The traditional solution to the endogeneity problem is IV. Let X be the set of regressors and Z represent the instruments. “LATE computes the mean gain to those induced to switch from no treatment to treatment by a change in Z from z to z‘.”

  • LATE={E(Y|X=x, Z=z‘)-E(Y|X=x, Z=z)} / {P(D=1| X=x, Z=z‘) – P(D=1| X=x, Z=z)}

Marginal Treatment Effect (MTE)

Developed by Björklund and Moffitt (1987) and furthered by Heckman (1997), the MTE measures “the average gain to patients who are indifferent between receiving treatment 1 [the treatment] versus treatment 0 [the control] given X and Z.” The benefit of using MTE is that one can calculate the marginal treatment effect for different subgroups based on the propensity score. This places a high degree of reliance on the accuracy and precision of the propensity score in order to determine these subgroup treatment parameters.

Let V denote a latent variable which measures the difference in benefits from being in the treated and control groups. Treatment choice can be modeled as follows.

  • V= μv(Z,X) + Uv
  • E(Uv)=0
  • D=1(V>0)

The authors use a propensity score to determine the probability of selecting treatment.

  • P(z,x)=P(D=1|Z=z, X=x) = P(Uv > -μv(z,x)) = 1 – FUv(-μv(z,x))
  • FUv() is the cdf of Uv.

Now we can define MTE to be:

  • MTE(x,z)=E(Δ|X=x, Z=z, V=0)
  • =E(Δ|X=x, Z=z, Uv=-μv(z,x))
  • 1(x) -μ0(x) + E{U1 – U0|Uv=-μv(z,x)}
  • 1(x) -μ0(x) + E{U1 – U0|UD= FUv(-μv(z,x))}

where FUv(Uv)=UD. The last equation after the ‘|’ is a monotonic transformation of the terms after the ‘|’ in the third equation.

Local IV (LIV)

The LIV estimates the derivative of the expected outcome conditional on observed characteristics and the probability of electing to be in the treatment group, E(Y|X=x, P(z,x)), with respect to the probability of treatment, P(z,x). The term E(Y|X=x, P(z,x)) is defined as follows:

  • E(Y|X=x, P(z,x))=E{ DY1 – (1-D)Y0 |X=x, P(Z,X)=P(z,x)}
  • 0(x) + P(z,x){μ1(x) -μ0(x)} + E{U0|P(Z,X)=P(z,x)} + P(z,x){E{U1-U0 | P(Z,X)=P(z,x), D=1)
  • 0(x) + P(z,x){μ1(x) -μ0(x)} + K{P(z,x))

The term K(P(z,x)) is a general function of the propensity score, P(z,x). Often, K() will be a polynomial of the propensity score. The MTE can be computed mathematically as below:

  • {∂E(Y|X=x, P(z,x)) / ∂P(z,x)} |1-P(x,z)=UD
  • = μ1(x) -μ0(x) + ∂K(P(z,x))/∂P(z,x)

The equation above “…is implemented by regressing the outcome Y on all covariates [X], the propensity score, the interaction of the propensity score with all covariates and a polynomial on the propensity score.” This procedure is carried out in the paper empirically by applying these methods to data on breast cancer patients and their choice of breast-conserving surgery with radiation compared to mastectomy.

Tags: , , , ,