Unbiased Analysis of Today's Healthcare Issues

What are we weighting for?

Written By: Jason Shafrin - May• 07•13

Weighting has a number of uses.  For instance, one can use weighting to estimate population sample statistics.  The Panel Study of Income Dynamics (PSID) for instance oversamples households with low income.  To get nationally mean values, one must reweight the PSID values, either using survey weights or matching to a nationally representative sample such as the CPS or ACS.

Researchers also use weighting when estimating causal effects.  A recent working paper by Solon, Haider and Wooldridge (NBER 2013) examines whether weighting is useful in the following 3 applications: (1) to achieve precise estimates by correcting for heteroskedasticity, (2) to achieve consistent estimates by correcting for endogenous sampling, and (3) to identify average partial effects in the presence of unmodeled heterogeneity of effects.  I discuss each of these situations below.

Correcting Heteroskedasticity

Heteroskedasticity occurs when one subpopulation has more variability than another.  This heteroskedasticity can affect the precision of regression coefficients.  However, can one use weighting to correct heteroskedasticity?  The authors state:

Now  suppose  that  one  estimates  that  population  regression  by  performing ordinary  least  squares  (OLS)  estimation  of  the  regression  of  log  earnings  on  the  race dummy, years of schooling, and a quartic in potential earnings for black and white male household heads in the PSID sample…this estimate [the coefficient on the race dummy] might be distorted by the PSID’s oversampling of low-income households, which surely must lead to an unrepresentative sample with respect to male household heads’ earnings…one  can  apply  a  reverse  funhouse  mirror  by  using  weights.    In particular, instead  of applying ordinary (i.e., equally weighted) least squares to the sample regression, one can use weighted least squares (WLS), minimizing the sum of squared residuals weighted by the  inverse  probabilities  of  selection.

Compared to Wyoming,  California  offers many more observations of the individual-level decision of whether or not to divorce, and therefore it seems at first that weighting by state population should lead to more precise coefficient estimation.  And yet, for the specification shown in Table 1, it appears that weighting by population harms the precision of estimation.

In many cases, however, using WLS actually harms the precision of the estimates.  This occurs because “…in many practical applications, the  assumption  that  the  individual-level  error  terms vij are independent  is  wrong.  Instead, the individual-level error terms within a group are positively correlated with each other because they have unobserved group-level factors in common.  In current parlance, the individual-level  error  terms  are  ‘clustered.’”  Thus the true individual error term may be better modelled as:

  • vij = ci + uij

where j indexes individuals and i indexes the groups. The cluster level variance causes the WLS to be relatively imprecise.

What should one do to address heteroskedasticity in this case?

One way to go is to…use  the  OLS  residuals  to  perform  the standard  heteroskedasticity  diagnostics  we  teach  in  introductory  econometrics.    For example,  in  this  situation,  the  modified Breusch-Pagan test  described  in  Wooldridge (2013, pp. 276-8) comes down to just applying OLS to a simple regression of the squared OLS residuals on the inverse within-group sample size 1/Ji, [where J is the size of a the group to which observation i belongs.]  The significance of the t-ratio for the coefficient on 1/Ji indicates whether the OLS residuals display significant evidence of heteroskedasticity… A  remarkable  feature  of  this  test  is  that  the estimated  intercept  consistently  estimates (σc)2,  and  the  estimated  coefficient  of 1/Ji consistently estimates (σu)2.

Other recommendations include:

  • Due to inevitable uncertainty about  the  true  variance  structure, report heteroskedasticity-robust standard error estimates.
  • Report both weighting and unweighted estimates since the differences between OLS and WLS estimates can be used as a diagnostic for model misspecification or endogenous sampling

Endogenous Sampling

Endogenous sampling occurs when the criteria used to create the sample are correlated with the error term of one’s regression.  For instance, if one conducted an earnings regression of various (exogenous) factors on income using PSID data, the resulting coefficients would be inconsistent because income itself is used to determine which individuals participate in the survey. [The PSID oversamples low-income individuals].

In  the  presence  of  endogenous  sampling, estimation that ignores the endogenous sampling generally will be inconsistent.  But if instead one weights the criterion function to be minimized (a sum of squares, a sum of absolute deviations, the negative of a log likelihood, a distance function for orthogonality conditions,  etc.)  by  the  inverse  probabilities  of  selection,  the  estimation  becomes consistent.

On the other hand, if the sampling probabilities are independent of the error term—for instance, if they vary only on the basis of the explanatory variables in the regression equation, then the estimates would be consistent.  In fact weighting would be unnecessary and harmful for precision.

Conclusions:

  • If the sampling rate varies endogenously, estimation weighted by the inverse probabilities of selection is needed on consistency grounds.
  • The weighted estimation should be accompanied by robust estimation of standard errors.
  • When the variation  in  the sampling rate is exogenous, both weighted and unweighted estimation are consistent for the parameters of a correctly specified model, but unweighted estimation may be more precise.

Weighting to Estimate Partial Effects

Many times, the causal effect of one variable on another will vary across different subpopulations.  For instance, in a drug trial, the study compares the average effect of being in the treatment versus control arms on drug outcomes.  However, if the drug has heterogeneous treatment effects on outcomes depending on age, one may want to estimate the average partial effects of the drug.

Assume that the sample has more old people than young people relative to the population at large.  In this case, OLS would not be able to estimate the partial effect since the old people are over-represented in the sample.   Additionally:

In least squares estimation, observations with extreme values of the explanatory variables have particularly large influence on the estimates.  As a result, the weighted average of the rural and urban effects [in my example, young and old] identified by OLS depends not only on the sample shares of the two sectors, but also on how the within-sector variance of X differs between the two sectors… By reweighting the sample to get the sectoral shares in line with the population shares, WLS eliminates the first reason that OLS fails to identify the population average partial effect, but it does not eliminate the second.  As  a  result,  the  WLS  estimator  and  the  OLS  estimator  identify  different weighted averages of the heterogeneous effects, and neither one identifies the population average effect.

Conclusions

  1. Do not believe that in  the  presence  of  unmodeled  heterogeneous  effects, weighting to reflect population shares generally identifies the population average partial effect.
  2. Contrasting the weighted  and unweighted  estimates can  serve  as  a  test  for misspecification.  The failure  to  model  heterogeneous  effects is  one  sort  of misspecification  that  can  generate  a  significant  contrast.
  3. Where  heterogeneous effects are salient, study the heterogeneity don’t ust try to average it  out.

In Summary

In situations in which you might be  inclined to weight, it often  is useful to report both weighted and unweighted estimates  and  to  discuss  what  the  contrast  implies  for  the interpretation  of  the  results.  And, in many of the situations we have discussed, it is advisable to use robust standard error estimates.

(more…)

How Missing Data affects Physicians’ P4P Bonuses

Written By: Jason Shafrin - May• 06•13

Pay-for-performance programs often offer bonuses (or penalties) for physicians, hospitals and other providers based on the quality of care patients receive.  Measuring quality of care, however, is often difficult.  For chronic conditions, for instance, many patients eligible for outcome measures may be lost to follow-up.  This issue can potentially affect provider evaluations and bonus payments.

To examine the magnitude of the attrition problem for P4P programs, a paper by Ryan and Bao (2013) examine a what would happen to a P4P program that measured patient readmission rates when one incorporates that patient follow-up data is not always complete.  To do this, the authors use a randomized controlled trial (RCT) called IMPACT (Improving Mood-Promoting Access to Collaborative Treatment) to generate parameters for a simulation. The IMPACT data include both a clinical registry used by care managers in the trial to document exposure to the intervention and to track patient outcomes (“registry data”) as well as longitudinal research interviews, which independently assessed patient outcomes at regular intervals (“research data”).

The authors use these data to examine whether individuals with missing data are more likely to be hospitalized in the future.  They calculate that “the rate of remission for those with missing registry data (0.232) and those without missing registry data (0.262) at the patient level. The difference between these rates (−.030) provides an estimate of the association between data missingness and remission (i.e., the effect of systematically missing data on remission).”

Using this and other parameters, the authors create a Monte Carlo simulation to examine how measured physician performance changes in the presence of missing data where performance is measured on a relative (80th percentile of remission rate among all providers) and absolute (whether providers exceed a remission rate of 30 percent) scales.   Their results are as follows:

We found that, over a range of scenarios, relative profiling approaches had profiling error rates that were approximately 20 percent lower than absolute profiling approaches. Also, most of the profiling error in the simulations was a result of random sampling variation, not missing data: between 11 and 21 percent of total error was attributable to missing data for relative profiling, while between 16 and 33 percent of total error was attributable to missing data for absolute profiling. This finding, however, is based largely on the fact that the missing data were not strongly related to the remission outcome in the IMPACT data, and a stronger relationship would amplify the relationship between missing data and profiling error. We also found that absolute profiling approaches were much more sensitive to error from systematically missing data than relative profiling approaches. Finally, the risk of profiling error was extremely high, approximately 50 percent, for providers whose true quality was in the immediate proximity of incentive thresholds, but decreased sharply to approximately 10 percent for providers whose true quality was 5 percentage points from incentive thresholds, indicating that the risk of profiling error is disproportionately borne by providers whose true quality is close to incentive thresholds.

(more…)

Stimulus Money Doesn’t Reach those Most in Need

Written By: Jason Shafrin - May• 03•13

…according to political scientists James G. Gimpel and Frances E. Lee of the University of Maryland, College Park, and Rebecca U. Thorpe, of the University of Washington, the areas of the country hit hardest by the downturn actually got a smaller share of the discretionary portion of the federal goodies than more fortunate regions.

How could this be the case? Was this due to politicians securing funds for their districts?

The broader reason why spending went haywire, the authors believe, is because Democrats used it to fund their pet policy initiatives. Ordinarily, progress on fronts such as infrastructure repair and scientific and medical research is painfully slow. ARRA offered an all-too-tempting “policy window” to change that.

Democrats showered the National Institutes of Health with $10 billion of new funding and the National Science Foundation with $3 billion. They created more than 30 new federal programs, leapfrogging the normal congressional authorization process. President Obama homed in on highways. “Because of this investment, nearly 400,000 men and women will go to work rebuilding our crumbling roads and bridges, repairing our faulty dams and levees,” he declared when he signed the stimulus into law.

Not surprisingly, Gimpel, Lee, and Thorpe found that counties that already had lots of roads and other public installations profited handsomely, harvesting an average of $50 more per capita than less endowed counties. “Prioritizing infrastructure favored areas with access to interstate highways, bodies of water, and national parks, regardless of local economic circumstances.”

HT: Wilson Quarterly.

Friday Links

Written By: Jason Shafrin - May• 03•13

Option Theory and Foul Trouble

Written By: Jason Shafrin - May• 02•13

Some non-healthcare reading from Wages of Wins as the NBA playoffs are upon us.  Should a coach bench a starter in foul trouble?  Doesn’t reducing a high-quality player’s aggregate minutes adverse affect a team and thus coaches should let players play regardless of the number of fouls they have?

It turns out that the optimal strategy looks something like this: bench starters in foul trouble only if it is early in the game and the player has a strong backup on the bench. The specifics are of course in the paper but the basic idea can be explained using option pricing.

Option pricing is the most common example of derivatives valuation in the field of finance. It is also the most notable, being the source of the famous Black-Scholes pricing formula under specific assumptions about the underlying process for which Robert C. Merton and Myron S. Scholes won their Nobel prizes in 1998 (two of us worked with them at Long Term Capital Management, the hedge fund where they were partners)…

In the context of basketball, benching a player creates an option for the coach. The coach has the choice to put him back in the game at a critical juncture later in the game. In options terminology, the coach has the right to exercise his option early, by letting the player play through his foul trouble, but he risks that the player fouls out and will be unavailable at the end of the game, when matchups become important…

We had prior beliefs that benching was universally bad; we were wrong.

If you need longitudinal Census data…

Written By: Jason Shafrin - May• 01•13

one good place to go is IPUMS.  IPUMS is run by the Minnesota Population Center at the University of Minnesota.  The website contains microdata from a large number of surveys including the Census, American Community Survey, and Current Population Study.  From the IPUMS website:

The Integrated Public Use Microdata Series (IPUMS-USA) consists of more than fifty high-precision samples of the American population drawn from fifteen federal censuses and from the American Community Surveys of 2000-2011. Some of these samples have existed for years, and others were created specifically for this database. These samples, which draw on every surviving census from 1850-2000, and the 2000-2011 ACS samples, collectively constitute our richest source of quantitative information on long-term changes in the American population. However, because different investigators created these samples at different times, they employed a wide variety of record layouts, coding schemes, and documentation. This has complicated efforts to use them to study change over time. The IPUMS assigns uniform codes across all the samples and brings relevant documentation into a coherent form to facilitate analysis of social and economic change.

IPUMS is not a collection of compiled statistics; it is composed of microdata. Each record is a person, with all characteristics numerically coded. In most samples persons are organized into households, making it possible to study the characteristics of people in the context of their families or other co-residents. Because the data are individuals and not tables, researchers must use a statistical package to analyze the millions of records in the database. A data extraction system enables users to select only the samples and variables they require.

IPUMS-International is the world’s largest collection of publicly available individual-level census data. IPUMS-International integrates samples from population censuses from around the world taken since 1960. Scholars interested only in the United States are better served using IPUMS-USA, which is optimized for U.S. research.

IPUMS-CPS is an integrated set of data from the March Current Population Survey (CPS), beginning in 1962 and continuing until the present. This harmonized dataset is also compatible with the data from the U.S. decennial censuses that are part of the IPUMS-USA. Researchers can take advantage of the relatively large sample size of IPUMS-USA at ten-year intervals and fill in information for the intervening years using IPUMS-CPS.

So IPUMS is just a collection of data sets…big deal. Where is the value-added? Again from the IPUMS website:

IPUMS data is integrated over time and across samples by assigning uniform codes to variables. This process itself adds value to the data by fully documenting all codes and compiling all variable documentation in a hyperlinked web format. But we do many other things as well:

IPUMS creates a consistent set of constructed variables on family interrelationships for all samples. The “pointer” variables indicate the location within the household of every person’s mother, father, and spouse.

IPUMS data also includes harmonized income and occupation variables. The Census Bureau has reorganized its occupational and industrial classification systems in almost every census administered since 1850. Although IPUMS retains the original occupation and industry codes, a variety of occupation and industry variables have been created for long-term analysis. More information on these variables can be found on the Occupation and Industry Variables page.

Cavalcade of Risk: West, Texas Edition

Written By: Jason Shafrin - May• 01•13

Jeff Root of RootFin hosts this week’s collection of risk-related posts, and dedicates it to the victims of the horrific explosion in West, Texas.

Why the internet is free…

Written By: Jason Shafrin - Apr• 30•13

Because 20 years ago today, a decision with significant ramifications occured.  The CERN website relates:

On 30 April 1993, CERN made the source code of WorldWideWeb available on a royalty-free basis, making it free software.

And how the world has changed (for the better?) because of it.

Reichenbach’s Common Cause Principle

Written By: Jason Shafrin - Apr• 30•13

How can you tell whether one event causes another?  For instance, assume that you observe that when one event happens, another event is more likely to happen.  For instance, when it rains the ground gets wet.  Generally, rain will cause the ground to get wet.  Use of umbrellas is also highly correlated with the ground getting wet.  Even though umbrella use and a wet ground are correlated, umbrellas of course do not cause the ground to get wet.  In this example, one can use intuition and experience to determine which event causes another.  How does one define causality mathematically?

Reichenbach’s Common Cause Principle states that the correlation between events A and B indicates either that A causes B, or that B causes A, or that A and B have a common cause.  Mathematically, consider the case where two events are likely to occur at once.  Mathematically, this means that Pr(A,B) > Pr(A) × Pr(B). If there is a common cause, C, then this common cause has the following properties:

  1. P(A|C) > P(A|~C)
  2. P(B|C) > P(B|~C)
  3. P(A,B|C) = P(A|C)× P(B|C)
  4. P(A,B|~C) = P(A|~C)× P(B|~C)

The first condition states that event A is more likely to occur when the common cause, C, is present.  The second condition states that event B is also more likely to occur when C is present.  The third and fourth conditions state that once we know that the common cause has occured, A and B are independent.  In other words, C causes both A and B; and correlation between A and B disappears once we know that C.

Consider our example above, where the use of umbrellas is A and the ground getting wet is B.  These events are highly correlated.   Let’s check if the event “rain” meets the common cause criteria.  The use of umbrellas is more likely when rain occurs (condition 1) and the ground is more likely to be wet when it rains (condition 2).  Further, once we know that it is raining, the use of umbrellas does not effect the probability the ground gets wet or conversely the ground being wet does not effect the probability that umbrellas are used.  In other words, umbrella use and the ground being wet are independent once we know whether (condition 3) or not (condition 4) it is raining.

Extensions of the Common Cause Principle

(more…)

A glitch in ACO beneficiary assignement?

Written By: Jason Shafrin - Apr• 29•13

For most managed care plans, beneficiaries elect to participate in the plan. In exchange, beneficaries often have lower premiums, but often restricted access to providers (e.g., referral requirements, copayment differentials for out-of-network physicians).

Medicare’s Accountable Care Organizations (known as Shared Savings Plans (SSP)) also assign beneficaries to organizations. The SSPs are groups of providers that are responsible for a set of beneficiaries and can earn bonuses if they are able to save Medicare money and improve quality. Unlike managed care plans, however, beneficaries do not choose to participate in an ACO; Medicare assigns beneficiaries to ACOs based on where they receive care. Unlike managed care plans, beneficaries do not have any in/out-of-network limitations or referral restrictions, even though ACOs are responsible for their care. Thus, it is crucial that the assignment of beneficiaries to SSPs is reliable.

A study by McWilliams et al. (2013), however, states that reliable attribution my be compromised in the current system. They write:

For the purpose of assignment, both the SSP and Pioneer program define primary care as specific sets of evaluation and management (E&M) services delivered not only in outpatient settings but also in skilled nursing facilities (SNFs) and nursing homes This definition ensures that long-term nursing home residents no longer receiving primary care in the community are still eligible for assignment to ACOs that include nursing homes…

including nursing home E&M services in the assignment process appropriately allows for ACO contracts to apply to long-term nursing home residents if such partnerships should develop…

For community-dwelling beneficiaries, however, counting E&M services provided in SNFs as primary care services could transfer the locus of accountability from primary to post-acute care providers as an unintended consequence. Specifically among beneficiaries who receive both outpatient primary care and short-term post-acute care, the assignment rules may selectively assign the sickest patients requiring the most post-acute care away from community primary care providers, and thus away from ACOs that do not include SNFs in their contracting networks.

Will the inclusion of post-acute E&M services into the attribution method affect the SSP to which beneficiaries are assigned in practice? The authors find that the answer is yes.

Assignment shifts occurred for 27.6 percent of 25,992 community-dwelling beneficiaries with at least one post-acute skilled nursing facility stay, and they were more common for those incurring higher Medicare spending. Those whose assignment shifted constituted only 1.3 percent of all community-dwelling beneficiaries cared for by large ACO-eligible organizations (n = 535,138), but they accounted for 8.4 percent of total Medicare spending for this population.

Source:
(more…)