Many payers are moving towards value-based purchasing programs that reward efficient physicians with additional payments and punish inefficient physicians with decreased payments. Medicare’s Quality and Resource Use Reports (QRUR) are a step in this direction.
However, summarizing overall physician quality is a difficult prospect. First, the types of cases each physician treats is not homogeneous, even within a specialty. Second, within each treatment regimen, patients have different comorbities. Third, physicians who have a single high-cost outlier case may score poorly. If these outliers are random and largely outside of the control of the physicians, then a composite quality measure may not adequately summarize the physician’s underlying efficiency level.
To address these issues, a paper by Metfessel and Greene (2012) propose a form of a Wilcoxon rank-sum (WRS) test. Their proposal uses Episode Treatment Groups (ETG) episode groupers to measure physician efficiency within episodes of care. Using the WRS framework, Physicians are ranked within and then across episode types. A Z-score determines whether the physician is statistically significantly better or worse than the average physician.
By using percentile rankings rather than absolute spending levels, the authors show that the use of the WRS method produces more stable physician efficiency measures (over time) than methods currently popular. For instance, the WRS outperforms the the observed-to-expected ratio (a.k.a., O:E ratio or “efficiency index”) where ratios greater than 1.0 indicating higher costs than an average practice pattern and ratios less than 1.0 indicating lower costs.
There are two main drawbacks of this paper. First, the authors do not risk within an episode. A treatment episodes is defined by the ETG, severity, treatment indicator, and pharmacy benefit status; within these cells, however, patient comoribidities are ignored. The second main drawback of this approach is that it measures physician efficiency using percentiles rather than levels. Thus, if a physician is very efficient for low-cost episode types, but inefficient for high cost episode types, the WRS method would estimate that the physician is of average efficiency even though in terms of total cost per person relative to expected they are expensive. The authors admit that “Our WRS application gives the same weight to less costly episodes, such as pharyngitis, as more costly episodes, such as pneumonia.”
The Healthcare Economist has created an example of how the Metfessel-Greene framework occurs in practice and also shows a case where the WRS produces problematic rankings.
- Metfessel, B. A. and Greene, R. A. (2012), A Nonparametric Statistical Method That Improves Physician Cost of Care Analysis. Health Services Research. doi: 10.1111/j.1475-6773.2012.01415.x.