Econometrics

You are currently browsing articles tagged Econometrics.

If you are evaluating the treatment effect of a policy or medical intervention, does it matter if some of your subjects leave the sample? In many cases, the answer is ‘yes’.

The Problem

As outlined in Grasdal (2001), the effect of the treatment is simply:

  • Δ = E(Y|X, T=1) − E(Y|X, T=0)

However, in some cases we may not observe Y. For instance, if there is attrition in the study, we will not observe their outcomes. Thus, we can decompose the two components from the equation above as follows: The effect of treatment with attrition is:

  • E(Y|X, T=1) = pTE(Y|X, T=1, A=0) + (1-pT)E(Y|X, T=1, A=1)
  • E(Y|X, T=0) = pCE(Y|X, T=0, A=0) + (1-pC)E(Y|X, T=0, A=1)

where pT is the probability someone in the treatment group drops out of the sample (pT=p(A=0|X, T=1) and pC is the probability someone in the control group drops out of the sample (pC=p(A=0|X, T=0).

Rearranging terms we get:

  • Δ = [E(Y|X, T=1, A=0)-E(Y|X, T=0, A=0)] + pT[E(Y|X, T=1, A=0)-E(Y|X, T=1, A=1)] + pC[E(Y|X, T=0, A=1)-E(Y|X, T=0, A=0)]

The first term in brackets is what we observe. The second term in brackets is the difference between is the outcome in the treatment group for the attrition and non-attrition group; the third term in brackets gives the difference between is the outcome in the control group for the attrition and non-attrition group. With random attrition, the two expressions inside the square brackets will cancel out. If attrition is random, then estimating the treatment effect using the first equation will produce unbiased estimates.

Potential Solutions

If one knows the source of the attrition bias, one can explicitly model the source of the attrition. Explicit models are typically sample selection model in which two simultaneous regression
models are calculated. “The first model is a regression model that addresses the research question, with the hypotheses of the study being examined by the regression of the dependent variable on the key independent variables in the study. The second model includes the variables that are causing attrition, with the dependent variable being a dichotomous variable indicating either continued participation or nonparticipation in the study. The error terms of the substantive dependent variable in the first regression model and the participation dependent variable in the second regression model are correlated. A significant correlation between the two error terms indicates attrition bias.”

If the source of the bias is unknown, one can use the Heckman selection model. The first step of the Heckman selection model “…not only tests for attrition bias but also creates an outcome variable, which Heckman calls λ (lambda). Thus, a λ value is computed for all cases in the study, and it represents the proxy variable that explains the causation of attrition in the study…The second step of Heckman’s procedure is to merge the λ value of each participant into the larger data set and then include it…in the regression equation that is used to test the hypotheses in the study. Including λ in the equation solves the problem of specification error and leads to more accurate regression coefficients.”

Empirical Investigation

A study by Grasdal looks at attrition in a randomized field trial of a rehabilitation programme designed to bring long-term sick listed workers with musculoskeletal problems back to work in Bergen, Norway. In this case, they found that “Both the parametric and the semi-parametric sample estimators that were considered indicated that sample attrition biased outcome data regarding posttreatment earnings, while the data regarding sick leave status remained unbiased. The sample selection estimators of post-treatment earnings perform quite well in terms of correcting for attrition bias and estimating treatment effects not very different from the experimental benchmark.”

…The analysis also demonstrates an inherent paradox in the ‘common support’ approach, which prescribes exclusion from the analysis of observations outside of common support for the selection probability. The more important treatment status is as a determinant of attrition, the larger is the proportion of treated with support for the selection probability outside the range, for which comparison with untreated counterparts is possible.”

Source:

Tags: , ,

Suppose you look at health care spending in two different regions and observe a significant difference.  You may want to know what the cause of this difference is.  Is it because one region has a mix of people who are sicker; or is because the reason treat patients with a given disease more intensively?

One way to answer this question is to use the Oaxaca decomposition.  This approach was originally formulated by Ronald Oaxaca. This document provides a nice overview of how to use the Oaxaca Decomposition and I apply that framework to the health spending case.

Differences in Health Spending

Assume that there are two regions: Region A and Region B. The spending for the two regions can be modeled using a linear regression framework:

  • YA = βAX + εA
  • YB = βBX + εB

The Y term represents spending and the variable X represents the patient’s health status. Health status could be measured as a vector of factors or as a single indicator (e.g., healthy or sick). The term β describes much an area spending on medical resources to treat a patient with a health status of X. Thus, average difference in spending per person the two regions is:

  • YA – YB = βAXA – βBXB

where XA is the average case mix in the area.

Determinants of Health Spending Differentials

Now the question is whether case mix or spending practices conditional on case mix is the key driver of the differences in spending between regions A and B. One can differentiate these two components using the following Oaxaca Decomposition:

  • YA – YB = ΔXβB + ΔβXA
  • YA – YB = ΔXβA + ΔβXB

In the first equation, the differences in health status (X‘s)are weighted by the coefficients for region B and the differences in the coefficients are weighted by the X’s from region A, whereas in the second, the differences in the X‘s are weighted by the coefficients of from region A and the differences in the coefficients are weighted by the X‘s of from region B.

There are basically three factors that effect health spending in the region: i) differences in health status across regions ii) differences in treatment patterns conditional on health status, and iii) the interaction of health status and conditional treatment effects. One can see this clearly below:

  • YA – YB = ΔXβB + ΔβXB + ΔXΔβ
  • YA – YB = H + T + HT

The equations above show the health status effect (H), the treatment effect (T) and the interaction (HT).

The specification chosen for the Oaxaca decomposition determines whether the interaction effect is placed with the health status effect or the treatment effect.  More precisely:

  • YA – YB = ΔXβB + ΔβXA = H + (HT + T)
  • YA – YB = ΔXβA + ΔβXB = (H+ HT) + T

In effect, the first decomposition specification incorporates the interaction term with the treatment effect whereas the second specification places the interaction term together with the health status effect.

Sources:

Tags: , , ,

Biases

All economists are familiar with the problem of selection bias.  In non-randomized samples, patients may choose to be in either the treatment or control group based on factors which are also related to the outcome of interest.  Even if researchers can design a study that fully controls for selection bias, robust studies must also account for other biases.  These include:

  • Recall bias: Patients in one group have better or worse memory of a given event.  If one wishes to compare changes in income for individual who received certain workforce training, individuals who participated in the program may be more or less likely to inflate their income levels over time.
  • Interviewer bias: If new data is being collected and researchers use separate interviewers for the treatment and control groups, if one interviewer systematically over/understates the interviewee responses, the study results will be biased.
  • Observation bias: This problem is particularly problematic for medical studies.  Observation bias occurs when physicians (or patients) are more likely to detect a disease.  Thus, a study identifying how pollution affected disease rates may underestimate the impact of the pollution if those affected are less likely to detect any disease than those who are not.  For instance, if poor individuals are more likely to drink polluted water than rich individuals, but also less likely to go to the doctor, the disease incidence from polluted water would be underreported and the causal impact of water pollution would be underestimated.

Outside of purely statistical biases, the research community at large may suffer from other biases as well.  These include:

  • Funding bias: Researcher bias towards interpreting quantitative results in favor of the entity which funded their study.
  • Status quo bias: Survey respondents may base their opinions closer to the status quo or researchers can interpret their results in a fashion more likely to coincide with the existing academic literature.
  • Publication Bias: tendency of researchers, editors, and pharmaceutical companies to handle the reporting of experimental results that are positive (i.e. showing a significant finding) differently from results that are negative (i.e. supporting the null hypothesis) or inconclusive, leading to bias in the overall published literature.
  • Hindsight bias: is the inclination to see events that have already occurred as being more predictable than they were before they took place

 

Tags: , ,

Understanding quantiles is fairly intuitive. A physician would rank in the τth quantile of in terms of quality of care if he performs better than the proportion τ of the reference group of physicians and worse than the proportion (1–τ). For physicians at the median, half of physicians will perform worse than this doctor and half will perform better.

Quantile regressions, however, offer the power to evaluate whether the predicted effect of selected explanatory variables on the outcome of interest differs depending on the where in the distribution the individual is located. Koenker and Bassett (1978) created these regression models and based them on the same intuition used to calculate the median. Today I review contrasts how quantile regressions work compared to ordinary least squares (OLS).

Mean vs. Quantile

The simplest way to compare OLS against quantile regression is to compare optimization methods for the mean and quantiles (e.g., median). Most people know the mean and median formulas, but the following specifications detail how to calculate these values for any sample using optimization techniques.

  • Mean: min μ∈ℜ Σ (yi – μ)2
  • Quantile: min ξ∈ℜ Σ ρτ(yi – ξ)

where the function ρτ(x) = x(τ – I(x<0)). In essence, the function ρτ tilts the absolute value function towards the quantile under investigation. For the mean, the goal is to pick the a parameter (the mean) which will minimize the sum of squared deviations. For the quantile, the goal is to pick a parameter which will minimize the sum of absolute deviations. For the median, the absolute deviations are weighted equally whereas for other quantiles deviations closer the quantile of interest receive more weight than those further away.

I have created this spreadsheet to more clearly demonstrate how calculating quantiles can be done in practice.  Wikipedia also has a nice example.

OLS vs. Quantile Regression

Again, compare the mechanisms by which OLS and quantile regressions choose the coefficients (i.e., β) to optimize the equations below.

  • OLS: min β∈ℜ Σ (yi – Xβ)2
  • Quantile Regression: min βτ∈ℜ Σ ρτ(yi – Xβτ)

When you calculate the sample mean, you are calculating the unconditional population mean [i.e., E(y)]. When you conduct the OLS regression, one calculates the conditional expectation function E(y|X)]. Similarly, the quantile regression is used to estimate the conditional quantile of the dependent variable.

To conduct the quantile regression in SAS, on can perform the QUANTREG function. In Stata one can use the qreg function.

Quantile Regression in Practice

An example of a paper using Quantile Regression includes the following: Johar, M. and Katayama, H. (2011), Quantile regression analysis of body mass and wages. Health Economics, 20: n/a. doi: 10.1002/hec.1736. This paper uses the National Longitudinal Survey of Youth 1979, to explore the relationship between body mass and wages. The researchers use quantile regression to provide a broad description of the relationship across the wage distribution. “Our results find that for female workers body mass and wages are negatively correlated at all points in their wage distribution. The strength of the relationship is larger at higher-wage levels. For male workers, the relationship is relatively constant across wage distribution but heterogeneous across ethnic groups.”

Sources:

Tags: , , ,

How can you estimate an individual’s total lifetime cost of medical care?  For people who die in your sample, this is simple.  In most data sets, however, not all individuals will die during the period of observation.  Thus, the data set is censored for those who do not die.

In addition, many standard hazard models do not allow for researchers to disaggregate the effects of covariates on survival and the intensity of utilization.  Both factors have an effect on cost.

Assuming that censoring is random, Basu and Manning (2010) describe a method to calculate expected lifetime costs for each individual as follows:

Read the rest of this entry »

Tags: ,

The Problem

Many times, researchers wish to transform the dependent variable of a regression in order to estimate parameter values.  Performing the transformation, however, complicates the calculation of the expected value of the dependent variable on the untransformed scale.  Assume, the Yi is the dependent variable. Assume the function g is used to transform the dependent variable as follows:

  • ηi = g(Yi)
  • Yi = h(ηi)
  • h=g-1

The easiest way to image this functions is think of g as the ln function and h as the exp function. In health economics, researchers often use a log transformation to attenuate problems related to a heavily right-skewed distribution. In this case, one would estimate the following regression:

  • ηi = xiβ + εi
  • εi~F(iid), E(εi)=0; Var(εi)=σ2

One can estimate β consistently as follows:

  • β = (X’X)-1X’η

What is the predicted value of the dependent variable? Calculating this is not as easy as it seems:

  • E(Y0) = E[h(x0β + ε)] E[h(x0β)]

For instance, it is well known that using the log transformation, the expected value of the depended variable is equal to: exp(x0β + σ2/2). In cases where we do not know the true distribution of the error term, however, then calculating the expected value of the error term is more difficult. The solution is Duan’s Smearing Estimate.

The Solution

Read the rest of this entry »

Tags:

Asymptotic theory has played a large role in the development of many recent econometric methods. For instance, the central limit theorem states that distribution of the mean drawn from any large samples is approximately normally distributed. Asymptotic theory, however, generally assumes that sampling occurs infinitely and with replacement. In the real world, populations are not infinite and sampling does not occur with replacement.

To take into account these real-world challenges, a finite population correction (fpc) factor is needed. One can express the fpc mathematically as:

  • fpc={(N-n)/(N-1)}1/2

where n is the sample size and N is the population size. For instance, one can calculate the standard error for the mean for finite populations as:

  • σX_bar=σ*n1/2 * {(N-n)/(N-1)}1/2
  • σp_bar={[p(1-p)/(n)}1/2 * {(N-n)/(N-1)}1/2

This website has some examples of how to apply fpc‘s in practice.

One potential application for fpc is physician ratings. Physicians who treat lots of patients eligible for a given quality measure certainly can have an accurate score. Some physicians, however, treat only a handful of patients eligible for any given quality metric. In this case, should the physician be punished if he happens to have one bad quality score among the very few observations? Can fpc correct for this problem?

A paper by Elliott, Zaslavsky and Cleary argues that fpc is not appropriate for adjusting physicians scores or confidence intervals to take into account the physicians small sample size. They cite work by Birnbaum who argues that profiling of hospitals is essentially an attempt to make inferences about future performance at the same facility if nothing changes, and involves “a theoretically infinite population.” The authors give the following example to explain why the fpc is not appropriate for rating physicians.

…suppose we had 50 responses out of 50 total patients at a small hospital, and 300 responses out of 1000 total patients at a larger hospital. Under the finite population model, there is no sampling variability at the smaller hospital (because we have information for all patients) but considerable sampling variability at the larger hospital, which is the appropriate inference if all we care about is the experience of those 50 and 1000 patients. However, to tell a new group of patients what their experiences are likely to be like at each hospital, we have much more information about the large hospital.

Our concern, then, is that FPSM-based approaches would under-represent the uncertainty in data for small facilities with high (possibly 100%) sampling rates, misleading users into thinking that such a facility would be likely to provide below-average (or above-average) care to them.

To correct for the small sample size problem for providers treating few patients eligible for quality scores each year, the authors believe using a moving average score over multiple years would be more appropriate. The benefit of this method is that the sample size increases, but the drawback is that provider quality improvements will not be fully reflected in the provider’s score for a number of years.

Tags: ,

Oftentimes, people use the following rule of thumb: if the dependent variable is continuous, use OLS; if binary use a logit or probit.  But what should you do if your dependent variable is fraction between 0 and 1.  To use a logit or probit one would have to unnecessarily transform the dependent variable into binary form.  If one would use OLS, the estimation of the coefficients would likely be incorrect.  Because the dependent variable is bounded between 0 and 1, the effenct of any explanatory variably xj cannot be constant through its entire range. Additionally, the predicted values from an OLS regression often produce figures outside the range of 0 to 1.

A paper by Papke and Wooldridge (1996) examines potential econometric alternatives when your dependent variable is fractional.

LOG-ODDS RATIO

One option to estimate a fractional response variable is to transform the dependent variable into a a log-odds ratio.  For instance:

  • E(log[y/(1-y)]|x) =

This model is simple and can be estimated with OLS techniques onces the depenent variable is transformed.  It only works, however, when the dependent variable is strictly between 0 and 1. [If y=0 the you have the log(0) and if y=1 then you get the log(1/0) which is ∞].   Additionally, using this framework, it is difficult to recover E(y|x).  Under the model specified above:

  • E(y|x)=∫ {exp(+ν)/[1+exp(+ν)]} * f(ν|x)dν

If the residuals are independent of the explanatory variables (i.e., νx), one can use Duan’s (1983) smearing technique to estimate f(•).   If not, one must make functional form assumptions regarding the distribution of the error terms.

QUASI-LIKELIHOOD METHODS

Papke and Wooldridge support using quasi-likelihood methods. Assume the following relationship:

  • E(y|x) = G()

where 0<1 for all z∈ℜ. The most popular choice for G(z) is the logistic function where G(z)=exp(z)/[1+exp(z)]. In this model, one can estimate the parameters β using the following Brenoulli log-likelihood function:

  • li(β) ≡ yilog[G(xiβ)] + (1-yi)log[1-G(xiβ)]

This method has several advantages.  First, it is fairly easy to estimate.  Secondly, the equation above is a member of the linear exponential family thus the quasi MLE method will produce a consistent estimator of β where β is normally distributed.  Assuming a logit function for G(z) produces the following variance:

  • Var(yi|xi) = σ2 * G(xiβ)[1-G(xiβ)]

The Papke and Wooldridge (1996) also describe how to compute the asymptotic variance of the estimator β.

Tags: , , ,

What is power?  Merriam Webster defines power as the “possession of control, authority, or influence over others.”  The power I will talk about today, however, is statistical power.  Statistical power measures the ability of a statistical test to determine whether the null hypothesis is false.  For instance, in the U.S. judicial system, the null hypothesis is that the defendant is innocent.  Trials that can more accurately determine when the defendant is in fact guilty have more power.

In statistics, there are two types of errors: Type I and Type II. The probability of a Type II error, a false negative, is represented by the symbol β.  Thus, the probability of correctly rejecting the null (i.e., the power) is 1-β.

The larger the magnitude of the hypothesized effect, the higher the power.  It is much easier to detect a large effect than a small effect.  Also, as the size of the sample increases, so does a test’s statistical power.

The more variation that exists in the data, however, the lower the power.  If there is a lot of variation in the data, it is difficult to determine if null hypothesis is false or if observing a phenomenon that contradicts the null is simply due to the excessive amount of variability in the data.  On the hand, if the variability (i.e., standard deviation) is low, then one can generally conclude that that the null hypothesis is false, since the low variability indicates that the anomaly is not caused by normal variation in the data.

Tags: ,

Regression Discontinuity is an econometric method that has become popular in recent years.  Let me give you an example where regression discontinuity would be valid.  

Let us say that all students who score 1000 or more on their SATs matriculate at Ivy U and all students who score below 1000 attend college at State U.  The research question is what impact going to Ivy U has on wages.

If we simply compare the average salaries of those at Ivy U and those at State U, this will likely not reveal the true effect that Ivy U had on its graduates.  Those at Ivy U were likely smarter and more motivated than those at State U.  Thus, the impact of Ivy U’s education is confounded with the individual’s own talent and motivation.  

Regression discontinuity, however, can solve this problem.  If we compare individuals who scored just above and just below 1000, these individuals are likely very similar in terms of intelligence and motivation.  The only difference would the impact of Ivy U’s education and networking possibilities against State U’s.  We could compare average scores just above and below the 1000 mark.  However, we could also fit a polynomial function of test scores on wages with a discrete jump term at 1000.  Mathematically, this means the following:

  • Effect = limx↓c E[Yi|Xi=x]  -  limx↑c E[Yi|Xi=x]
  • In this example, Yi is the wages, Xi is the test scores, and the the cutoff value, c, is 1000.

Can we use Regression Discontinuity to estimate the impact of school districts on schooling?  We could compare houses on each side of the school district boundaries and then see if these similar houses have different test scores.  However, this will likely not produce reliable results if parents choose their house based on the school district.  Thus, even if two identical houses are right next to each other, if high achieving parents always choose the better school district, then there will be perfect sorting between school districts.

David S. Lee and Thomas Lemieux (2009)  have a great “user guide” about how to use Regression Discontinuity in practice.  Some of their top tips are the following:

RD designs can be invalid if individuals can precisely manipulate the “forcing variable”. 

  • In the school district choose example, where parents can precisely choose their school district RD may be invalid.

If individuals – even while having some influence – are unable to precisely manipulate the forcing variable, a consequence of this is that the variation in treatment near the threshold is randomized as though from a randomized experiment. 

  • Intuitively, when individuals have imprecise control over the forcing variable, even if some are especially likely to have values of X near the cutoff, every individual will have approximately the same probability of having an X that is just above (receiving the treatment) or just below (being denied the treatment) the cutoff – similar to a coin-flip experiment.  This is the case of people who score around 1000 on the SAT and thus have an approximately equal probability of getting into Ivy U or State U.

RD designs can be analyzed – and tested – like randomized experiments.

  • If variation in the treatment near the threshold is approximately randomized, then it follows that all “baseline characteristics” – all those variables determined prior to the realization of the forcing variable – should have the same distribution just above and just below the cutoff.

Non-parametric estimation does not represent a “solution” to functional form issues raised by RD designs. It is therefore helpful to view it as a complement to – rather than a substitute for – parametric estimation.

  • Parametric functions are what are traditionally used.  These are generally polynomial that regress the dependent variable of interest onto the X variable.  In my example, this would be a polynomial regression with future wages as the dependent variable and test scores as the independent X variable.  Non-parametric estimation techniques include local linear regression.

Goodness-of-fit and other statistical tests can help rule out overly restrictive specifications.

  • Although there is no simple formula that works in all situations and contexts for weeding out inappropriate specifications, it seems reasonable, at a minimum, not to rely on an estimate resulting from a specification that can be rejected by the data when tested against a strictly more flexible specification. For example, it seems wise to place less confidence in results from a low-order polynomial model, when it is rejected in favor of a less restrictive model (e.g., separate means for each discrete value of X). Similarly, there seems little reason to prefer a specification that uses all the data, if using the same specification but restricting to observations closer to the threshold gives a substantially (and statistically) different answer. 

Citation

Tags: ,

« Older entries