Report says surgical-outcomes data unreliable for comparing hospitals

The American College of Surgeons claims that by implementing its National Surgical Quality Improvement Program, individual hospitals can prevent 250 to 500 complications, save 12 to 36 lives, and reduce costs by millions of dollars annually.

But a new report posted on the JAMA Surgery website concluded that outcomes data in the NSQIP registry were unreliable measures of hospital performance.

In a National Institutes of Health-funded study, researchers—led by surgeons from the University of Michigan Health System in Ann Arbor—examined 2009 NSQIP complication and mortality data for six common surgical procedures performed on 55,466 patients at 199 hospitals. Few hospitals met data thresholds for “reliability,” defined as quantifying “the proportion of provider performance variation explained by true quality differences,” they found.

“As quality measurement platforms are increasingly used for public reporting and value-based purchasing, it has never been more important to have reliable performance measures,” wrote lead author Dr. Robert Krell and his colleagues. “We have demonstrated that commonly used outcome measures have low reliability for hospital profiling for a diverse range of procedures.”

The procedures studied were colon resection, pancreatic resection, laparoscopic gastric bypass, ventral hernia repair, abdominal aortic aneurysm repair and lower extremity bypass.

The authors argued that low reliability “can mask both poor and outstanding performance relative to benchmarks,” leading underperforming institutions to assume they are doing fine and for average or well-performing hospitals “to be spuriously labeled as poor performers.”

The researchers suggest avoiding the use of sampling in data registries and suggest collecting information on 100% of patients. They also praise the ACS NSQIP as being “among the leaders in implementing best practices to increase the reliability of outcome measures.”

In an accompanying commentary, surgeons from the Stanford University School of Medicine wrote that “Krell and colleagues elegantly assess the reliability” of NSQIP measures. They also note how, in 2009, only 5% of hospitals were participating in the ACS program and how this may be too small a sample to reliably measure quality.

“The findings are cautionary to ranking systems that use observed to expected ratios as a surrogate for surgical quality,” the Stanford surgeons wrote. “Until the hospital cohort reflects the well-documented variation that occurs across the country, quality as determined by ACS-NSQIP should be interpreted with healthy skepticism.”

In response, Dr. Clifford Ko, director of ACS NSQIP, said that the JAMA Surgery report “contains some noticeable shortcomings that result in improper conclusions.” He added that Krell and colleagues appear to be referring to “some generic quality profiling program” in its evaluation of model reliability.

“Essentially, the authors are identifying weakness that may exist in some programs, to some degree or another, but not in any specific program,” Ko said. “While they use ACS NSQIP data to build these demonstration models, their conclusions do not apply to ACS NSQIP, though this is not made explicit to the reader until the last page of the paper where they specifically single out ACS NSQIP as being ahead of the curve in using best statistical practice strategies.”

The authors of the commentary do not recognize that report's findings “do not directly apply to the ACS NSQIP,” Ko also noted.

“It must be recognized that all of us in this arena are continuously working to improve the way we use data to improve surgical quality,” Ko said. “ACS NSQIP has internal data to suggest levels of reliability for many current ACS NSQIP models are greater than those reported in this study based on 2009 data. Nevertheless, available methodology is not always sufficient.”

While small models may not provide the most reliable data, Ko argued that “limited information is often times better than no information” and still has usefulness in reaching quality-improvement goals.

Follow Andis Robeznieks on Twitter: @MHARobeznieks



Loading Comments Loading comments...