Selection

You are currently browsing articles tagged Selection.

Nobel laureate James Heckman has a nice summary of how applied econometricians and policy researchers should define causality. Some of the more interesting points I have excerpted below.

On the source of randomness in a sample

One reason why many statistical models are incomplete is that they do not specify the sources of randomness generating variability among agents, i.e., they do not specify why otherwise observationally identical people make different choices and have different outcomes given the same choice. They do not distinguish what is in the agent’s information set from what is in the observing statistician’s information set, although the distinction is fundamental in justifying the properties of any estimator for solving selection and evaluation problems. They do not distinguish uncertainty from the point of view of the agent whose behavior is being analyzed from variability as analyzed by the observing analyst. They are also incomplete because they are recursive. They do not allow for simultaneity in choices of outcomes of treatment that are at the heart of game theory and models of social interactions and contagion (see, e.g., Brock & Durlauf, 2001; Tamer, 2003).

Unbundling a treatment

Researchers often say that a policy change will cause a change in some outcome measure. However, a policy change is often made up of many components. Which components of the policy change actually influenced the outcomes? In Heckman’s words:

Many causal models in statistics are black-box devices designed to investigate the impact of “treatments”—often complex packages of interventions—on observed outcomes in a given environment. Unbundling the components of complex treatments is rarely done. Explicit scientific models go into the black box to explore the mechanism(s) producing the effects.

Outcomes vs. Utilities

Most researchers pick an outcome variable of interest and if the outcome increases–assuming a beneficial outcome measure–than people are better off. This may not be the case however. For instance, Bill Clinton’s welfare reform act (PRWORA) may have increased employment rates and income for single mothers, but the mother’s utility may have decreased. The single mothers may (or may not) have valued spending time caring for their child more than working.

Problems with non-linearity

Issues such as “social interactions, contagion and general equilibrium effects” can complicate causal inference.

What are you measuring?

Let us assume that Y is the outcome variable of interest. Y depends on what state, s, you are in. For instance, in a treatment/no treatment world, Y(s) is the outcome if you would be treated and Y(s’) is the effect if you were not treated. D(s)=1 if you were actually treated and D(s)=0 if you did not receive treatment in the data. Thus, we can measure various things:

  • Average Treatment Effect (ATE): E (Y s) − Y(s’)). This is equal to the average effect if all individuals moved from a untreated to a treated state.
  • Treatment on the Treated (TT): E[(Y(s) − Y(s')) | D(s) = 1]. This looks at the average effect of treatment only on those who were treated. This is important if only certain individual select into the treatment group, or if the policy change is only relevant for certain individuals.
  • Treatment on the Untreated (TUT): E[(Y(s) − Y(s')) | D(s) = 0]. It is also possible that treatment can affect those who are not treated. For instance, instituting a work training program for treated individuals may reduce community college enrollment and thus may affect untreated individuals (e.g., if the community college closes from lack of enrollment).
  • Policy relevant treatment effect (PRTE):Ep[Y(s)] − Ep’ [Y(s)]. The estimator compares the average outcomes of two different policy choices.

Heckman, James (2008) “Economic Causality” NBER WP #13934.

Tags: , , ,

Much of health care data is characterized by a large cluster of data at 0, and a right skewed distribution of the remaining outcomes. For instance, people who do not get sick generally use $0 of medical care. Those who do get sick, use a varying amount of medical care dollars, but there are a large number of outliers with extremely expensive medical care. How do health economists take these anomalies into account?

David Madden looks at two alternatives to correct for the shape of the distribution in his 2008 JHE paper: sample selection and two-part models. Zero consumption of medical can be caused from two different decisions: a participation decision and a consumption decision. For instance, in the case of smoking, individuals may decide not to smoke no matter how cheap cigarettes get (participation decision). On the other hand, some smokers may decide not to smoke during a given time period because cigarettes are very expensive or they have low income (consumption decision). Since people can not smoke negative cigarettes, there still may be a cluster of observations around zero.

Assume that individuals utility from participation is equal to w=α’Z + v. If w>0, then d=1, (the individual participates) and if w<0, then d=0, (the individual does not participate). For consumption, individuals will choose y**=max[0,y*]; y*= β’X + u. A general model can be written as follows:

  • L0 = Π0 [1-P(v>-α'Z) P(u>-β'X |v>-α'Z)] Π+ P(v>-α’Z) P(u > -β’X|v>-α’Z) g(y|v>-α’Z,u > -β’X)

If u and v are independent, then we have the Cragg model:

  • L10 [1-P(v>-α'Z) P(u>-β'X)] Π+ P(v>-α’Z) P(u > -β’X) g(y|u > -β’X)

If we assume that the participation constraint dominates the consumption constraint (which is likely in the smoking example, but maybe not for drinking), then we have P(y*>0|d=1)=1 and g(y*|y*>0,d=1)=g(y*|d=1). This means that if you are a smoker you will have at least one cigarette per period. When the participation constraint dominates, we ignore the consumption decision and we have the following likelihood function which corresponds to the Heckman Selection model.

  • L20 [1-P(v>-α’Z) Π+ P(v>-α’Z) g(y|v>-α’Z)

If independence is assumed, then we are left with probit for participation and OLS for consumption. This is the two part model:

  • L30 [1-P(v>-α’Z) Π+ P(v>-α’Z) g(y)

Which of these models works best empirically?

Results

Madden looks at the fit of regressions trying to model smoking and drinking behavior using a wide variety of covariates. In general, the two-part model seems to be perform better in the data used for this study, but the author wisely notes that deciding between the Heckman selection and the two-part model should be done on a case-by-case basis.

Tags: , , , ,