Berkson’s Paradox Explained

What is Berkson’s paradox? Assume that there are two independent events, A and B.  These events are not correlated when observed in nature.  However, what if one conditions on the fact that either A or B occurs.  By conditioning on the fact that one of the two events occurred, these events are now correlated. If…

Model Fit for a Logistic Regression

How do you know if your model fits the data well?  When applying an OLS regression, the standard metric is the R-squared (i.e., R2).   If you have a dependent variable that is binary, however, most researchers prefer a logistic regression.  If you choose a logistic rather than an OLS approach, however, how do you know…

Local Instrumental Variables

What is the effect of a treatment on health outcomes?  The real question is: can you be more specific? Researchers may measure the treatment effect a variety of ways.  Sensible research questions include: What is the average effect of the treatment across all individuals? What is the average treatment effect only among those who received…

Mahalanobis Distance

What is Mahalanobis distance? Most people know what Euclidean distance is…it is the shortest distance between any two points.  In other words, its what we typically think of when we think of distance – the distance we would measure with a ruler, and the one given by the Pythagorean formula. Unlike Euclidean distance, Mahalanobis distance…

Kaplan-Meier Survival Curves

Survival analysis is used in many contexts.  Some examples include: Medical research: fraction of patients living for a certain amount of time after treatment. Economics: length of time people remain unemployed after a job loss. Engineering: time until failure of machine parts. Ecology: how long fleshy fruits remain on plants before they are removed by…

Is my distribution normal?

How can you tell if you have a normal distribution?  For instance, assume you have data on the results of a drug relative to a placebo.  You know the mean and standard deviation of the data, but that does not necessarily imply that the data is distributed in a normal fashion. How can you do…

What are we weighting for?

Weighting has a number of uses.  For instance, one can use weighting to estimate population sample statistics.  The Panel Study of Income Dynamics (PSID) for instance oversamples households with low income.  To get nationally mean values, one must reweight the PSID values, either using survey weights or matching to a nationally representative sample such as…

How Missing Data affects Physicians’ P4P Bonuses

Pay-for-performance programs often offer bonuses (or penalties) for physicians, hospitals and other providers based on the quality of care patients receive.  Measuring quality of care, however, is often difficult.  For chronic conditions, for instance, many patients eligible for outcome measures may be lost to follow-up.  This issue can potentially affect provider evaluations and bonus payments.…