Oftentimes, people use the following rule of thumb: if the dependent variable is continuous, use OLS; if binary use a logit or probit. But what should you do if your dependent variable is fraction between 0 and 1. To use a logit or probit one would have to unnecessarily transform the dependent variable into binary form. If one would use OLS, the estimation of the coefficients would likely be incorrect. Because the dependent variable is bounded between 0 and 1, the effenct of any explanatory variably xj cannot be constant through its entire range. Additionally, the predicted values from an OLS regression often produce figures outside the range of 0 to 1.
A paper by Papke and Wooldridge (1996) examines potential econometric alternatives when your dependent variable is fractional.
LOG-ODDS RATIO
One option to estimate a fractional response variable is to transform the dependent variable into a a log-odds ratio. For instance:
- E(log[y/(1-y)]|x) = xβ
This model is simple and can be estimated with OLS techniques onces the depenent variable is transformed. It only works, however, when the dependent variable is strictly between 0 and 1. [If y=0 the you have the log(0) and if y=1 then you get the log(1/0) which is ∞]. Additionally, using this framework, it is difficult to recover E(y|x). Under the model specified above:
- E(y|x)=∫ {exp(xβ+ν)/[1+exp(xβ+ν)]} * f(ν|x)dν
If the residuals are independent of the explanatory variables (i.e., ν⊥x), one can use Duan’s (1983) smearing technique to estimate f(•). If not, one must make functional form assumptions regarding the distribution of the error terms.
QUASI-LIKELIHOOD METHODS
Papke and Wooldridge support using quasi-likelihood methods. Assume the following relationship:
- E(y|x) = G(xβ)
where 0<1 for all z∈ℜ. The most popular choice for G(z) is the logistic function where G(z)=exp(z)/[1+exp(z)]. In this model, one can estimate the parameters β using the following Brenoulli log-likelihood function:
- li(β) ≡ yilog[G(xiβ)] + (1-yi)log[1-G(xiβ)]
This method has several advantages. First, it is fairly easy to estimate. Secondly, the equation above is a member of the linear exponential family thus the quasi MLE method will produce a consistent estimator of β where β is normally distributed. Assuming a logit function for G(z) produces the following variance:
- Var(yi|xi) = σ2 * G(xiβ)[1-G(xiβ)]
The Papke and Wooldridge (1996) also describe how to compute the asymptotic variance of the estimator β.
- Papke LE and Wooldridge JM (1996) “Econometric methods for fractional response variables with an application to 401(k) plan participation rates“, Journal of Applied Econometrics, v11:619-632.

Non linearities and the Black Swan
September 25, 2009 in Books, Econometrics | 1 comment
The book Black Swan by Nassim Nicholas Taleb is an interesting book about probability outside of the traditional Gaussian framework and how paradigm changing often arise. The highlight of the book is its philosophy of the black swan, and its unknown unknown. The book also includes discussion of behavioral economics and tries to discredit Gaussian statistics. The book is interesting but rambles somewhat. Further, Taleb writes in a condescending manner disparaging other intellectuals and experts. Although Taleb does make some good points but the negative tone does become tiresome.
The Turkey Problem
The crux of the book can be understood by looking at the following series. This series represents the weight of the turkey over 30 days.
Assume you are a turkey, what would you predict would happen to your weight over then next 15 days. Using ordinary least squares, one would predict that the turkey would continue to grow at 1/4 pound per day. Let us see what happened in reality.
We see that a “black swan” event has a occurred; one that was outside the paradigm one would establish based on past data. We see that on day number 41, the turkey is slaughtered. This is a huge paradigm shift from the point of view of the turkey. One can see that relying on past data to predict the future will be highly inaccurate in the presences of these black swans.
Other Non-linearities
Let us look at another seemingly linerar series.
How would you predict the series would continue into the future? Using linear extrapolation techniques, one would predict the series would increase linearly ad infinitum. However, let us examine the true data generating process.
We can see that the data come from a sine function.
The key insight of Taleb’s book is that these non-linearities, paradigm shifts and black swans occur all the time. Further, they are responsible for most of the innovatiations and important events in history. Thus, ignoring black swans can be perilious. Taleb’s message is one of humility. It is exceedingly difficult to predict the future. A sure thing is rarely ever such. Thus, we should view expert opinion with some skepticism and embrace–rather than reject–uncertainty.
Tags: Black Swan, non-linearities