For more than two decades, speech-recognition software has held bright promise for busy physicians looking for a better way to get what was in their heads onto a printed page or into a computerized health record.
Now we're talking
Speech-recognition software increasingly replacing transcription in more physician specialties
Those expectations, however, largely had been unfulfilled because, for most of those years, aside from practitioners in a few cloistered medical subspecialties and a relative handful of technology zealots in general practice, speech recognition has amounted to little more than a tantalizing, technological frippery. The bottom line has been the software remained for most physicians just too clunky or error-prone to play a major role in patient record documentation.
That's changing, according to more than a dozen health information technology experts contacted for this story, including 10 physicians of various specialties, seven of whom regularly use speech-recognition software on the job in the emergency room, pathology, radiology, other medical departments and outpatient office environments.
One satisfied daily user is Brian Zimmerman, director of the urgent-care unit in the busy emergency department at 712-bed Miami Valley Hospital in Dayton, Ohio, home to a Level I trauma center. There, he and all of his colleagues use speech-recognition software in tandem with the hospital's electronic health-record system.
Zimmerman says about four years ago he purchased a version of Dragon Speaking, now the dominant brand of speech-recognition software in healthcare. He thought it might save him time dictating e-mails and notes, he says. He was wrong.
“It was kind of a nightmare,” Zimmerman says, but adds he has subsequently changed his opinion. “If you haven't seen it in a couple of years, you should take a second look. It's just much better than you've seen in the past.”
“The administration likes it because our transcription costs went to zero,” Zimmerman says. Previously, transcription services cost the hospital $1.4 million a year for the ER alone, he says.
Speech recognition established early beachheads in radiology and pathology, medical subspecialties where free-text-based reports are key work products. Even there, its efficacy has improved dramatically in recent years, physician users say.
Keith Dreyer is vice chairman of radiology at 907-bed Massachusetts General Hospital in Boston and a user of speech-recognition software for a decade. Until recently, improvement in the technology had been gradual but steady, Dreyer says.
“I used to say, ‘If you don't like speech recognition, just wait a year—it will get better,' ” Dreyer says. Then, suddenly, in about 2004, Dreyer says, “We really saw a big difference. They really optimized it for radiology.”
Paul Valenstein, a pathologist at 529-bed St. Joseph Mercy Hospital in Ypsilanti, Mich., adds that a newer version of Dragon geared to his medical specialty “produces dramatic improvements in throughput.”
And Larry Garber, director of medical informatics at the Fallon Clinic, Worcester, Mass., says a recent work study found the clinic saves about $7,000 per physician per year by switching from dictation to speech-recognition software. “The writing is on the wall for transcription departments around the country,” Garber says.
Speech recognition development has had a long, complex, storied and even occasionally notorious history. One of the pioneers of speech recognition is inventor, serial entrepreneur, author and futurist Raymond Kurzweil, whose company, Kurzweil Computer Products, founded in 1974, and later sold to Xerox, first developed a multifont optical character recognition system.
The copying machine giant subsequently spun off its Kurzweil unit to create ScanSoft, which would go on to acquire scandal-plagued speech-recognition system developer Lernout & Hauspie, or L&H, buying it out of bankruptcy in late 2001. The Belgium-based company made business page headlines with a financial scandal involving falsified financial records and a reported $100 million worth of missing money.
Before it imploded though, L&H had acquired several of its key rivals—most notably Dragon Systems, developer of the popular Dragon NaturallySpeaking brand, added in 2000; and in 1997, Kurzweil Applied Intelligence, another firm that pioneered speech-recognition development with a focus on healthcare. It also was visited by an accounting scandal, in the run-up to a 1993 initial public offering. Founder and co-CEO Kurzweil was not implicated, but two other top executives were sentenced to prison for securities fraud.
In 2000, L&H also had bought out the venerable dictation system provider, Dictaphone, which traces its roots through patents back to the first voice recorder developed in 1881 by famed telephone inventor Alexander Graham Bell. ScanSoft then acquired Nuance Communications in 2005, but the acquiring company dropped its own name and kept Nuance instead. Nuance, now headquartered in Burlington, Mass., was founded in 1994 as a spinoff from the Speech Technology and Research Laboratory at SRI International, earlier known as the Stanford Research Institute, Menlo Park, Calif.
While many companies are still active in speech recognition, including software giant Microsoft Corp., few healthcare industry competitors remain. Pittsburgh-based, privately held M-Modal is a supplier of speech-recognition technology in healthcare, primarily as a work aide to the medical transcription industry and for picture archiving and communication/radiology information systems.
Publicly traded Nuance, however, has become “sort of the 800-pound gorilla of speech recognition” in healthcare, according to informaticist Robert Budman, the physician-executive liaison to electronic health-record system developer Medsphere Systems Corp., Carlsbad, Calif. Nuance continues to market its Dragon NaturallySpeaking line of speech-recognition products and offers several other speech recognition products for radiology branded under different names.
Last fall, Nuance acquired Philips Speech Recognition Systems, a unit of Royal Philips Electronics of the Netherlands, for $96.1 million, buying up a major competitor in radiology. And in January, Nuance announced it had entered into a joint development and marketing relationship with another healthcare industry competitor, IBM Corp.
According to a joint company statement, the two former rivals agreed to share each other's speech-recognition technology. As part of the deal, the two companies also agreed to incorporate IBM technology into Nuance's speech solutions, with the first products featuring the combined technology expected to be available within two years. While IBM said it will continue to service its own speech-recognition product customers, as part of the deal IBM agreed to sell speech-related patents to Nuance.
Keith Belton, senior director of product marketing at Nuance, says both the speed and accuracy of the company's Dragon systems for medicine have increased dramatically in the past two years. The Version 8 family of medical products produced in 2005 and 2006 had accuracy rates in the 80% to low 90% range and included medical vocabularies targeted toward eight medical specialties, Belton says.
Version 10, the latest in the series, released last October, “is 20% more accurate than Version 8 and twice as fast,” Belton says, and is optimized for more than 20 medical specialties. It also includes several new “regional accent wizards” that enable non-native English speakers and Americans with regional accents to more quickly “train” the software, creating individual “voice profiles” that improve system speed and accuracy.
The wizards now cover unaccented North American English, American English speakers from the Deep South, plus native Australian, British, Scottish and Welsh accents; and non-native English speakers with Hispanic, South Central Asian and Southeast Asian accents, Belton says.
“If you speak with an Indian accent or a Hispanic accent or no accent, it loads some additional knowledge about that speech and your accuracy is going to be well into the 90s,” Belton says. “Thirty percent of the doctors in this country were not born in this country, so that was a huge improvement for us.” One arm's-length measure of whether speech recognition has arrived as a fully functional EHR interface is whether vendors have enough confidence in the technology to use it when they submit their systems for testing by the Certification Commission for Health Information Technology, or CCHIT. And that's happened.
CCHIT Marketing Director C. Sue Reber says the commission doesn't specify which input device a vendor must use to get its system to work, only that it perform the functions required under the certification regime.
“As you know, speech recognition is not a requirement for certification, however, we have seen about half a dozen ambulatory vendors demonstrate this technology during their inspections,” Reber says. They used speech recognition to replace a keyboard for data entry and created a note in free text, Reber says, but it is “not a high number out of the hundreds” of EHRs CCHIT has tested since it launched its certification program in 2006. The utility of any tool is always in the eyes of the user, so we asked a number of physicians what they thought about speech-recognition software as an alternative to a keyboard, mouse or even dictation and transcription as input systems. All at least acknowledged the technology has made giant leaps forward in recent years.
Dreyer, the Massachusetts General radiologist, serves as a member of the technology assessment committee for the American College of Radiology. Dreyer also says he has been a user of speech recognition for about 10 years. The technology was first used in radiology at the hospital in 1996 and has been fully deployed in the department since 1999, he says.
“The reason why it works so well for radiology is because we sit alone in a quiet room,” Dreyer says. Another reason, he says, is “our language model is pretty small. Our total vocabulary is 50,000 words, but any radiologist will probably only use 7,000 words, which dramatically helps increase the accuracy.”
The downfall of market leader Lernout & Hauspie eight years ago “really took the speech-technology market down for several years,” Dreyer says. But by 2004, Dreyer says, a software developer, Commissure, made a giant leap with a software called RadWhere, a speech-enabled radiology workflow product. Commissure, based in New York, was acquired by Nuance in 2007.
The next frontier for the technology, at least in radiology, Dreyer says, will be in adapting natural language processing so that it automatically structures radiologists' reports to accommodate the advent of quality-based reimbursements.
Pathologist George Birdsong, director of anatomic pathology at Grady Health System in Atlanta, says three of the seven full- and part-time pathologists who work in his department are exclusive users of speech-recognition software. One is an occasional user. Birdsong places himself in the category of enthusiasts he describes as, “Once you get used to it, it's great.”
Birdsong and his colleagues began using Version 8 of Dragon software about three years ago and upgraded to Version 9 to take advantage of its enhanced powers to deal with non-native English speakers.
“Unless you count a Texan as being a non-native speaker, I didn't notice much of a difference,” after the upgrade, Birdsong says. “But I have a colleague from Egypt who went from saying he was not able to use it to being able to use it.” Birdsong says he sees no reason to further upgrade anytime soon. Birdsong was at his workstation when he took the call to be interviewed for this story.
“Right now, I'm looking at the corner of two tables,” he says. “On my right is a microscope and to the left on the other end of the table is my computer with a headset. I can be looking at the microscope with the headset on and dictating what I see. It works well enough that I have confidence in it that I can dictate without looking at the computer screen.”
Birdsong and his colleagues have created oral shortcuts akin to keyboard “macros” to help them be more efficient. “I just say ‘acute appendicitis macro.' It brings in the whole diagnosis for appendicitis,” he says.
Birdsong says his laboratory handles about 12,000 surgical pathology specimens a year. Hospital leaders “always speak highly of the savings we got” after Grady replaced its aging dictation system and invested in speech-recognition technology. Birdsong says the new software system was much cheaper than the price quoted for an upgraded dictation system.
After converting, “One of our transcriptionists resigned, and so one of the positions just disappeared,” Birdsong says. The other dictation/transcription workers who “still do a small amount of transcription” have been reassigned to do other things, he says. Other financial gains have come from more timely completion of lab reports, he says.
“In some cases, a patient may even be released from the hospital a day sooner,” he says. “It may be only 5%, but that's a potential savings.”
The increased efficiency, Birdsong says, “comes from being able to get those (reports) out somewhat sooner and not having the back-and-forth with transcription. With Dragon, when I get finished dictating, I'm done.” Actually, after dictation he's done most of the time, he says. The system still obdurately mixes up the words “and” and “in.” “Maybe it's my Texas accent,” Birdsong says.
Valenstein, the pathologist from St. Joseph Mercy in Michigan, says he has been a user of speech recognition software for about four years. He says the system becomes so familiar with your voice, “after a while, it becomes your friend. Now, when I have a terrible cold and no one can understand me, it can understand me just fine.”
Valenstein says the system may take a little longer than using old-fashioned dictation, but on balance, “it saves me time because I only touch a case once and I get the case out to the caregiver much faster.”
“I've had physicians calling me up within two or three minutes of my filing a report,” Valenstein says. “You can look at improvements in technology in a number of ways—economic return. Does it save me time? But another way to look at it is: Does it save other people time? And when you're in my line of work—supporting other people's work—saving time is a gift.”
The College of American Pathologists also is placing a bet on speech recognition. The medical specialty society is developing what it calls a “diagnostic workstation,” a package using off-the-shelf hardware and a combination of home-grown and proprietary software.
The workstation is designed for use by pathologists across a range of environments, from teaching hospitals to physicians' homes, according to James MacDonald, a medical imaging consultant hired by the college to work on the project. For starters, the system will use speech-recognition software for physician inputs on transcription/dictation as well as the completion of structured reports and a cancer checklist, MacDonald says.
Tim Quigley, executive vice president of business development for the healthcare group at Perot Systems Corp., Plano, Texas—which installs enterprise EHR systems from multiple vendors—says his company has not experienced a recent spike in orders or customer specifications for speech-recognition software in its EHR implementation contracts. Instead, Quigley says, “What we've seen is a steady upward march.”
One key to a successful implementation of any speech-recognition system is managing expectations, Quigley says.
“It's unclear if it will ever reach the point where we'll see 100% accuracy,” Quigley says. Speech recognition “can reduce some of your transcription costs,” he says, but it will come up short of perfection. “If you go in there and expect something that's 100% accurate, you're going to be disappointed. If you go in there expecting you're going to save transcription costs, but know you're going to have to get in there some and edit, then you'll be really happy.”
Send us a letter
Have an opinion about this story? Click here to submit a Letter to the Editor, and we may publish it in print.