The safety group can also greatly influence a hospital's overall star rating, the analysis concluded. And that could have huge ramifications. Star ratings are used by payers to negotiate contracts and help consumers decide where to seek care.
The statistical model the CMS uses likely caused the miscalculation. The model, called latent variable modeling, isn't appropriate for measuring clinical outcomes, said David Levine, senior vice president of advanced analytics and informatics at consultancy Vizient. "We have expressed our deep concerns about this methodology because it changes the weight every time—that doesn't really make sense," he said.
Rush University wrote to the CMS expressing this concern in May when the CMS informed the system that it had fallen to a three-star rating, down from the five stars it had received in every previous release. The CMS gives hospitals about two months to preview their ratings before they are published on the Hospital Compare website.
Rush officials exclusively disclosed their analysis and correspondence to Modern Healthcare. Rush's findings likely prompted the CMS to announce this week that it would postpone the upcoming release of star ratings, originally scheduled for July.
A CMS spokesman denied on Friday that Rush's analysis influenced the agency's decision to postpone the star ratings. He added, "Since the inception of Hospital Compare' quality star ratings, CMS has made the preliminary review of quality ratings by participating hospitals part of its process. Participating hospitals have always had the ability to seek a review of their ratings through an open and transparent process."
The agency calculates star ratings by weighting how hospitals perform in seven categories: mortality, safety of care, readmission, patient experience, effectiveness of care, timeliness of care and efficient use of medical imaging. The three outcome groups—readmissions, safety and mortality—are each weighted the most at 22% each. Patient experience is also weighted at 22%.
Measures within each group are supposed to be evenly weighted to calculate the hospital's performance in that area. For example, the mortality group considers seven measures that make up 22% of the total group.
But Rush found that hospitals' performance on one measure in the safety-of-care group almost entirely determined how a hospital performed in that group. Rush also found that the safety-of-care group dramatically affected overall star ratings. This has been the case since the CMS first released its ratings in July 2016 and for every release since.
"Given the disproportionate weighting of the safety scores over time, they did not represent a composite measure," said Dr. Omar Lateef, an author of the analysis and Rush's senior vice president and chief medical officer.
Academic medical centers like Rush oppose star ratings because they say the rankings put them at a disadvantage when compared to specialty and community hospitals that see healthier patients with less complex conditions.
Levine said that issue again can be blamed on the latent variable model, a statistical approach that looks for variability in the data presented. The CMS relies entirely on a computer to make this analysis.
The CMS supports the model because it accounts for the similarities and differences between measures. A CMS spokesman said, the agency "developed the methodology for stars with multiple technical expert panels and public comment periods."
In the first four iterations of the CMS star ratings, the PSI 90 measure (the Patient Safety and Adverse Events Composite) presented the most variation in performance between hospitals so it was weighted most heavily. In the recent proposed July ratings, PSI 90 measures were calculated using new ICD-10 codes. The CMS told Rush officials this changed how the latent variable model calculated the measurement group, and more weight was subsequently given to the hip and knee replacement complication measure.
Levine said the latent variable model isn't designed to examine the nuance involved in clinically driven measures like medical errors. "This requires human touch," he said, adding that it's like rolling dice.
Lateef said he and his colleagues at Rush were alarmed by their ratings drop because they have improved performance on five of the eight safety measures since the December release.
But Lateef said the CMS was initially dismissive about their concerns. So, Rush recreated the CMS methodology.
Rush compared the methods in the proposed July release and the December release to see how the methodology may have changed. They found that for the July ratings, the methodology was most heavily weighting Rush's performance on the complication rate following elective hip and knee replacements.
In the December 2017 release, the methodology most heavily weighted Rush's performance on the PSI 90 measure. PSI 90 measures occurrences of various safety issues like pressure ulcers and sepsis rates.
Additionally, Rush's analysis found that the weight given to the PSI 90 measure was much greater than the seven other measures. Specifically, PSI 90 was weighted 1,010 times stronger than the catheter-associated urinary tract infections measure, 81 times stronger than the C. difficile infection rates measure, 51 times stronger than the central line-associated bloodstream infection rates measure and 20 times stronger than the surgical site infection rate measure.
In fact, Rush concluded in its analysis that if all the infection measures were removed from the safety domain, 96% of hospitals would have seen no change in their performance in the safety group.
Leaders at UChicago Medicine, the University of Virginia Health System and the Association of American Medical Colleges also contributed to Rush's analysis.
As Modern Healthcare has reported, because of the methodology, if a hospital performs below average on any of the heavily weighted groups, which include safety of care, they must perform above average in the other heavily weighted groups to get four stars or higher. This explains why Rush saw a two-star deduction in its overall rating for the July release.
Vizient's Levine said as they currently stand, star ratings don't provide an accurate depiction of safety-of-care performance because they rely so heavily on complications after hip and knee replacements.
"That is not the only thing that hospitals do," he said.
But hip and knee replacements are a popular and lucrative service provided by hospitals. More than 1 million joint replacements are performed every year in the U.S., and by 2030 that number is expected to increase to more than 4 million.
Lateef said the CMS' current measurement of safety could dissuade practitioners from taking on complex cases.
"Quality and safety reporting is incredibly difficult. This experience illustrated how hospitals could share their knowledge with the government to play a small part in optimizing the system," Lateef said, adding that the CMS has come around since Rush presented its analysis.
No date has been set for when the new ratings will be released.
An edited version of this story can also be found in Modern Healthcare's June 18 print edition.