Econometrics

You are currently browsing the archive for the Econometrics category.

How do ecologists determine the size of a population?    One method is the mark and recapture (a.k.a. capture/recapture method).  This method relies on having two separate trials to capture (either physically or in data) members of certain population and determines the population size based on the proportion of specimens who are captured in both trials.

The key assumption for the capture/recapture method is that the probability of capturing any given specimen is independent for each trial.  If one was doing a capture/recapture study and one could more easily capture fat and old birds, then the likelihood of catching the same bird in the second trial would increase.  This would inflate the value of m, and thus the approximation of the population would be too low.

One application of the capture/recapture method is McClish et al. (1997)‘s examination of the size of the elderly cancer population in Virgina.  The authors estimate  the likelihood cancer patients appear in both the Virgina Cancer Registry (VCR) and the Medicare claims files (MEDPAR) for Virginia resident 65 and older.

Capture-recapture techniques were used to estimate the actual cancer population size, based on the concordance and discordance of the data sources. If VCR identifies M cases and MEDPAR identifies n cases, m of which are common to both sources, then the estimated number of cases in the entire population of cases at reporting hospitals will be N = [(M + 1) X (n + 1 )/(m + 1)] – 1. With this estimate of the population, the sensitivity of each source alone, as well as those of the combined sources, was estimated.”

The variance of the total population is simply:

  • var(N) = [(M+1)(n+1)(M-m)(n-m)]/[m+1)(m+1)(m+2)]

Read the rest of this entry »

Tags: , ,

Cost effectiveness and quality analysis of the treatment of cancer has long been a goal of health services researchers.  In particular, researchers aim to determine whether various treatments provide cost-effective methods to improve longevity and quality.  Physicians, however, use different treatments depending on the patient’s cancer stage.  Although most cost-effectiveness researchers want to take into account patient cancer stage in their analyses, these data are not available in many administrative data files, such as the Medicare claims files.

To overcome this problem, recent studies have examined how to develop accurate algorithms to account for cancer stage in studies using claims data.  A paper by Cooper et al. has provided an initial attempt to accomplish this feat, but a more recent paper by Smith et al. 2010 offers an alternative.  Today, I will review the Smith paper.

Methods

The initial study population consisted of 150,764 women (age ≥ 65 years) diagnosed with breast cancer between 1992 and 2002 identified through Surveillance Epidemiology and End Results (SEER)-Medicare.   From this population, the following cohorts were excluded beneficiaries characterized by:

  • Unknown SEER stage history
  • In situ rather than invasive cancer
  • Beneficiaries who were not continuously enrolled in Medicare FFS including beneficiaries who had had Medicare Advantage Coverage between 12 months prior and 9 months after diagnosis
  • Age less than 66 to ensure a complete year of history
  • Death

To determine the cancer stage, physicians typically use the following heuristic:

  • Observe if there is a distant tumor, then the patient is stage IV.
  • If the patient is not stage IV, then the patient is classified into stages based on tumor size and the extent of the disease.

This spreadsheet explains the cancer stage classification according to the American Joint Committee on Cancer (AJCC).

The study relied on demographic, tumor, and treatment characteristics to identify the cancer stage.  One of the key variables in the breast cancer algorithm was axillary lymph node involvement.  This spreadsheet also lists all the covariates included in the prediction algorithm.

To test the accuracy of the algorithm, the authors relied  on four metrics: sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).  The authors calibrated the model on a baseline sample of the SEER data and tested the accuracy using a validation sample.

One drawback of the Smith et al. algorithm is that it requires both retrospective and prospective data for up to 1 year prior to and 1 year after the date of diagnosis.  Further, patients have to be continually enrolled in Medicare FFS for the algorithm to work properly.  Those who join a Medicare Advantage plan are dropped from the sample.

Conclusion

The authors claimed the following results:

“A claims-based algorithm was utilized to predict breast cancer stage, and was particularly successful when used to identify early stage disease. These prediction equations may be applied in future studies of breast cancer patients, substantially improving the utility of claims-based studies in this group. This method may similarly be employed to develop algorithms permitting claims-based epidemiologic studies of patients with other cancers.”

Source:

Tags: , , ,

Medicare administrative data provides a rich source to conduct health services research.  Researchers who wish to use this data, however, often have to restrict their sample population in order to  have a similar types of patients and consistent data available.  Today I review some popular methods to restrict Medicare samples for research use.

Restrictions to consider for Medicare population:

  1. Continuously enrolled in Parts A and B.  This is needed for data completeness.  New individuals could enroll in Medicare throughout the year and thus the data on their spending and treatment patterns may be incomplete.
  2. Medicare Advantage Enrollment.  Researchers may want to exclude beneficiaries who receive Medicare services through managed care, because they are currently not claim-level detail for MA-enrolled beneficiaries although this will change in 2012.
  3. U.S. Residence.  Individuals residing outside the U.S. may receive their care through other countries’ medical systems and thus Medicare claims may not capture the full range of services they receive.
  4. The Working Aged.  The working aged are individuals for whom a private group health insurance plan was the primary payer.  Medicare may not have a complete set of claims for working aged beneficiaries.
  5. Hospice Beneficiaries.  Managed care plans cover all Parts A & B services except for hospice.  If one wants to predict Medicare Advantage spending, one would need to exclude hospice spending since hospices services are not covered by MA, only the Medicare FFS.
  6. ESRD Beneficiaries.  Beneficiaries with End Stage Renal Disease (ESRD) are covered by Medicare through a specific carve out.  Anyone with ESRD qualifies regardless of age.  Thus, researchers may want to exclude/include these beneficiaries.
  7. Medicaid Status.  Individuals who are dual eligible have much lower Medicare cost sharing and can also receive services covered by Medicaid.
  8. Disability Status.  Some individuals qualify for Medicare prior to being 65 because they were disabled and qualify under the Supplemental Security Income program.  A researcher could restrict the sample to: i) individuals eligible for Medicare under SSI who are less than 65, or ii) individuals who are eligible for Medicare because they are over 65, but who were already enrolled in Medicare when they were under 65 because of their disability status.
  9. Death.  For researchers analyzing cost analysis, patients who die causes a problem.  Patients who die are low cost; excluding these patients from the sample, however, may assign low-cost status to certain providers for whom a large number of patients die.  Thus, one does not want to reward providers whose patients die at a high rate and identify them as low-cost providers.

Tags: , ,

Medicare spending changes over time for multiple reasons.  First, for any cohort of individuals, these individuals the age as time passes.  As their age increases, expected medical expenditure will also rise.   Second, the individual will likely received different medical services as the standards of care change over time.  The standards of care can change due to improved technology, policy, cultural factors, medical education, and other causes.

On the other hand, one could examine trends in Medicare spending for a certain age group over time.  For instance, how much did Medicare spend on care for 70 year-olds in 1970, 1990 and 2010?  In this case, medical expenditure change as the standards of care change over time.  In addition, certain cohorts may be more or less healthy than others when the reach a certain age, or the cohort may have difference preferences over medical treatment.

The post today review a method for describing differential growth rates in Medicare Expenditures.  For cohort c at age α, the following equation describes medical expenditures y.

  • y(c,α) = g(c,α) + uc,α

Read the rest of this entry »

Tags: , ,

Oftentimes, researchers use dummy variables to determine how observations classified into different categorical groups affect the dependent variable of interest.  One drawback with this approach is using too many dummy variables can create small cell sizes, creating an identification problem.  Alternatively, using broad groupings for dummy variables may give the appearance that the effect of the covariate is homogenous within the category when this is not the case.

An alternative to using simple categorical dummy variables is to use overlap polynomials.  For instance, Lakdawalla,  Goldman, and  Bhattacharya have a working paper where they rely on the difference of normal cumulative density functions (CDF) to create a flexible form to build these overlapping polynomials.  In particular, they use the following specification:

  • g(age;β) = Σj=0 to K {Φ[(agei-kj+1)/σ]-Φ[(agei-kj)/σ]} * pj(agei;β)

Here is the equation from the paper in larger type.

Below I decribe how this function works in practice.

Read the rest of this entry »

Tags: ,

How can you estimate an individual’s total lifetime cost of medical care?  For people who die in your sample, this is simple.  In most data sets, however, not all individuals will die during the period of observation.  Thus, the data set is censored for those who do not die.

In addition, many standard hazard models do not allow for researchers to disaggregate the effects of covariates on survival and the intensity of utilization.  Both factors have an effect on cost.

Assuming that censoring is random, Basu and Manning (2010) describe a method to calculate expected lifetime costs for each individual as follows:

Read the rest of this entry »

Tags: ,

To evaluate providers based on the health outcomes or the cost of care, one must attempt to evaluate dimensions of care which are strictly within the providers control. For instance, if a physicians treats two patients with breast cancer, but one patient has a more advanced form of breast cancer, one should take this difference into account. Patient comorbidities also affect the prognosis for a successful recovery from illness, as well.

One method to take into account the patient’s health conditions upon presentation at a provider’s facility is to use risk adjustment methods. Risk adjustment methods take into account factors such as patient demographics (e.g., age, gender), health status (e.g., prior diagnosis, current illness severity), prior utilizations (e.g., previous hospitalizations) and other factors to predict the expected outcome for a typical patient. Risk adjustment, however, is never perfect. A paper by Garber, MaCurdy and McClellen (1998) review some of the problems with using risk adjustment in the health care setting.

Read the rest of this entry »

Tags:

The Problem

Many times, researchers wish to transform the dependent variable of a regression in order to estimate parameter values.  Performing the transformation, however, complicates the calculation of the expected value of the dependent variable on the untransformed scale.  Assume, the Yi is the dependent variable. Assume the function g is used to transform the dependent variable as follows:

  • ηi = g(Yi)
  • Yi = h(ηi)
  • h=g-1

The easiest way to image this functions is think of g as the ln function and h as the exp function. In health economics, researchers often use a log transformation to attenuate problems related to a heavily right-skewed distribution. In this case, one would estimate the following regression:

  • ηi = xiβ + εi
  • εi~F(iid), E(εi)=0; Var(εi)=σ2

One can estimate β consistently as follows:

  • β = (X’X)-1X’η

What is the predicted value of the dependent variable? Calculating this is not as easy as it seems:

  • E(Y0) = E[h(x0β + ε)] E[h(x0β)]

For instance, it is well known that using the log transformation, the expected value of the depended variable is equal to: exp(x0β + σ2/2). In cases where we do not know the true distribution of the error term, however, then calculating the expected value of the error term is more difficult. The solution is Duan’s Smearing Estimate.

The Solution

Read the rest of this entry »

Tags:

Many states rely on managed care organizations (MCOs) to provide medical services for their Medicaid beneficiaries.  Contracting out medical services to private providers relies on the government’s capacity to accurately predict expected cost of care for each beneficiary.  This is typically done through risk-adjusted capitation rates.

Which risk adjustment strategy works best?  The answer of course depends on the context.  A paper by Yu and Dick (2010) examines 5 predictors specifications to predict future expenditures for Medicaid eligible children.  I list each of the five specifications and their performance (measured as the R2) below:

  • Age/Gender only: 0.2%
  • Age/Gender + subjective health status measure: 3.9%
  • Age/Gender + CSHCN: 7.3%
  • Age/Gender + HCC: 12.1%
  • Age/Gender + prior year expenditure: 43.5%

One can clearly see that the best predictor of a child’s current year expenditures is the child’s prior year’s expenditures.

Read the rest of this entry »

Tags:

Asymptotic theory has played a large role in the development of many recent econometric methods. For instance, the central limit theorem states that distribution of the mean drawn from any large samples is approximately normally distributed. Asymptotic theory, however, generally assumes that sampling occurs infinitely and with replacement. In the real world, populations are not infinite and sampling does not occur with replacement.

To take into account these real-world challenges, a finite population correction (fpc) factor is needed. One can express the fpc mathematically as:

  • fpc={(N-n)/(N-1)}1/2

where n is the sample size and N is the population size. For instance, one can calculate the standard error for the mean for finite populations as:

  • σX_bar=σ*n1/2 * {(N-n)/(N-1)}1/2
  • σp_bar={[p(1-p)/(n)}1/2 * {(N-n)/(N-1)}1/2

This website has some examples of how to apply fpc‘s in practice.

One potential application for fpc is physician ratings. Physicians who treat lots of patients eligible for a given quality measure certainly can have an accurate score. Some physicians, however, treat only a handful of patients eligible for any given quality metric. In this case, should the physician be punished if he happens to have one bad quality score among the very few observations? Can fpc correct for this problem?

A paper by Elliott, Zaslavsky and Cleary argues that fpc is not appropriate for adjusting physicians scores or confidence intervals to take into account the physicians small sample size. They cite work by Birnbaum who argues that profiling of hospitals is essentially an attempt to make inferences about future performance at the same facility if nothing changes, and involves “a theoretically infinite population.” The authors give the following example to explain why the fpc is not appropriate for rating physicians.

…suppose we had 50 responses out of 50 total patients at a small hospital, and 300 responses out of 1000 total patients at a larger hospital. Under the finite population model, there is no sampling variability at the smaller hospital (because we have information for all patients) but considerable sampling variability at the larger hospital, which is the appropriate inference if all we care about is the experience of those 50 and 1000 patients. However, to tell a new group of patients what their experiences are likely to be like at each hospital, we have much more information about the large hospital.

Our concern, then, is that FPSM-based approaches would under-represent the uncertainty in data for small facilities with high (possibly 100%) sampling rates, misleading users into thinking that such a facility would be likely to provide below-average (or above-average) care to them.

To correct for the small sample size problem for providers treating few patients eligible for quality scores each year, the authors believe using a moving average score over multiple years would be more appropriate. The benefit of this method is that the sample size increases, but the drawback is that provider quality improvements will not be fully reflected in the provider’s score for a number of years.

Tags: ,

« Older entries § Newer entries »