Econometrics

Serial Corelation and the Durbin Watson Statistic

What is the effect a country’s GDP on health? What about the country’s literacy rate on infant mortality rates? Often researchers try to answer these questions using time-series data. With time series data, we have observations of a few units (e.g.: countries or individuals) over many years.

Let the subscript i represent the the individual or country and the subscript t indicate the year. We can have a regression framework as follows:

  • yit = βxit + εit

As long as cov(xitit)=0, then ordinary least squares (OLS) will provide an unbiased estimate of β1.

One frequent problem which occurs with time series data is that there will be serial correlation. Serial correlation (or autocorrelation) occurs when the error terms are correlated over time. For instance,

  • εit=ρεit-1it

Serial correlation means that if your predicted y value is overestimated in period, it is likely to be overestimated in another period. This is likely due to some persistent variable omitted the regression. For instance, if we regressed test scores on a vector of explanatory variables, it is likely that student who scored higher than their predicted test score in one period would also score higher then their predicted test score in another period.

Fortunately, our coefficient vector (β) is still unbiased even in the presence of serial correlation. However, OLS is inefficient. In this case, the standard errors are too small.

One way to test for serially correlation is to use the Durbin-Watson test. Let uit be the fitted values of the error terms after we conduct and OLS regression (uit = yitβols xit ).

The Durbin Watson statistic is:

  • d= [Σ(t=2 to T) (uit – uit-1)2] / [Σ(t=1 to T) (uit)2]

With panel data we have:

  • d= [Σ(i=1 to N)Σ(t=2 to T) (uit – uit-1)2] / [Σ(i=1 to N)Σ(t=1 to T) (uit)2]

This page will help you interpret the statistic as to whether or not you should accept or reject serial correlation. If there is serial correlation in your data, you may want to include a lagged dependent variable as one of your right hand side variables. This will result in an AR(1) specification.
Yuting Wang of Notre Dame has a good explanation of the problems that occur with serial correlation.