Part two of a two-part series (Access part one):
There has been a significant shakeout in the once crowded market for speech recognition technology in healthcare.
While many companies outside of healthcare remain active in the speech recognition field, including software giant Microsoft Corp., few healthcare industry competitors remain. Privately held M-Modal is one notable exception. The Pittsburgh-based developer supplies speech-recognition technology to the medical transcription industry and for picture archiving and communication/radiology information systems.
Publicly traded Nuance Communications, however, has become “sort of the 800-pound gorilla of speech recognition” in healthcare, according to informaticist Robert Budman, the physician-executive liaison to electronic health-record system developer Medsphere Systems Corp., Carlsbad, Calif. Nuance continues to market its Dragon NaturallySpeaking line of speech-recognition products and offers several other speech recognition products for radiology branded under different names.
Last fall, Nuance acquired Philips Speech Recognition Systems, a unit of Royal Philips Electronics of the Netherlands, for $96.1 million, buying up a major competitor in radiology. And in January, Nuance announced it had entered into a joint development and marketing relationship with another healthcare industry competitor, IBM Corp.
According to a joint company statement, the two former rivals agreed to share each other's speech-recognition technology. As part of the deal, the two companies also agreed to incorporate IBM technology into Nuance's speech solutions, with the first products featuring the combined technology expected to be available within two years. While IBM said it will continue to service its own speech-recognition product customers, as part of the deal IBM agreed to sell speech-related patents to Nuance.
Keith Belton, senior director of product marketing at Nuance, says both the speed and accuracy of the company's Dragon systems for medicine have increased dramatically in the past two years. The Version 8 family of medical products produced in 2005 and 2006 had accuracy rates in the 80% to low 90% range and included medical vocabularies targeted toward eight medical specialties, Belton says.
Version 10, the latest in the series, released last October, “is 20% more accurate than Version 8 and twice as fast,” Belton says, and is optimized for more than 20 medical specialties. It also includes several new “regional accent wizards” that enable non-native English speakers and Americans with regional accents to more quickly “train” the software, creating individual “voice profiles” that improve system speed and accuracy.
The wizards now cover unaccented North American English, American English speakers from the Deep South, plus native Australian, British, Scottish and Welsh accents; and non-native English speakers with Hispanic, South Central Asian and Southeast Asian accents, Belton says.
“If you speak with an Indian accent or a Hispanic accent or no accent, it loads some additional knowledge about that speech and your accuracy is going to be well into the 90s,” Belton says. “Thirty percent of the doctors in this country were not born in this country, so that was a huge improvement for us.”
One arm's-length measure of whether speech recognition has arrived as a fully functional EHR interface is whether vendors have enough confidence in the technology to use it when they submit their systems for testing by the Certification Commission for Health Information Technology, or CCHIT. And that's happened.
CCHIT Marketing Director C. Sue Reber says the commission doesn't specify which input device a vendor must use to get its system to work, only that it perform the functions required under the certification regime.
“As you know, speech recognition is not a requirement for certification, however, we have seen about half a dozen ambulatory vendors demonstrate this technology during their inspections,” Reber says. They used speech recognition to replace a keyboard for data entry and created a note in free text, Reber says, but it is “not a high number out of the hundreds” of EHRs CCHIT has tested since it launched its certification program in 2006.
The utility of any tool is always in the eyes of the user, so we asked a number of physicians what they thought about speech-recognition software as an alternative to a keyboard, mouse or even dictation and transcription as input systems. All at least acknowledged the technology has made giant leaps forward in recent years.
Dreyer, the Massachusetts General radiologist, serves as a member of the technology assessment committee for the American College of Radiology. Dreyer also says he has been a user of speech recognition for about 10 years. The technology was first used in radiology at the hospital in 1996 and has been fully deployed in the department since 1999, he says.
“The reason why it works so well for radiology is because we sit alone in a quiet room,” Dreyer says. Another reason, he says, is “our language model is pretty small. Our total vocabulary is 50,000 words, but any radiologist will probably only use 7,000 words, which dramatically helps increase the accuracy.”
The downfall of market leader Lernout & Hauspie eight years ago “really took the speech-technology market down for several years,” Dreyer says. But by 2004, Dreyer says, a software developer, Commissure, made a giant leap with a software called RadWhere, a speech-enabled radiology workflow product. Commissure, based in New York, was acquired by Nuance in 2007.
The next frontier for the technology, at least in radiology, Dreyer says, will be in adapting natural language processing so that it automatically structures radiologists' reports to accommodate the advent of quality-based reimbursements.
Pathologist George Birdsong, director of anatomic pathology at Grady Health System in Atlanta, says three of the seven full- and part-time pathologists who work in his department are exclusive users of speech-recognition software. One is an occasional user. Birdsong places himself in the category of enthusiasts he describes as, “Once you get used to it, it's great.”
Birdsong and his colleagues began using Version 8 of Dragon software about three years ago and upgraded to Version 9 to take advantage of its enhanced powers to deal with non-native English speakers.
“Unless you count a Texan as being a non-native speaker, I didn't notice much of a difference,” after the upgrade, Birdsong says. “But I have a colleague from Egypt who went from saying he was not able to use it to being able to use it.” Birdsong says he sees no reason to further upgrade anytime soon.
Birdsong was at his workstation when he took the call to be interviewed for this story.
“Right now, I'm looking at the corner of two tables,” he says. “On my right is a microscope and to the left on the other end of the table is my computer with a headset. I can be looking at the microscope with the headset on and dictating what I see. It works well enough that I have confidence in it that I can dictate without looking at the computer screen.”
Birdsong and his colleagues have created oral shortcuts akin to keyboard “macros” to help them be more efficient. “I just say ‘acute appendicitis macro.' It brings in the whole diagnosis for appendicitis,” he says.
Birdsong says his laboratory handles about 12,000 surgical pathology specimens a year. Hospital leaders “always speak highly of the savings we got” after Grady replaced its aging dictation system and invested in speech-recognition technology. Birdsong says the new software system was much cheaper than the price quoted for an upgraded dictation system.
After converting, “One of our transcriptionists resigned, and so one of the positions just disappeared,” Birdsong says. The other dictation/transcription workers who “still do a small amount of transcription” have been reassigned to do other things, he says. Other financial gains have come from more timely completion of lab reports, he says.
“In some cases, a patient may even be released from the hospital a day sooner,” he says. “It may be only 5%, but that's a potential savings.”
The increased efficiency, Birdsong says, “comes from being able to get those (reports) out somewhat sooner and not having the back-and-forth with transcription. With Dragon, when I get finished dictating, I'm done.” Actually, after dictation he's done most of the time, he says. The system still obdurately mixes up the words “and” and “in.” “Maybe it's my Texas accent,” Birdsong says.
Valenstein, the pathologist from St. Joseph Mercy in Michigan, says he has been a user of speech recognition software for about four years. He says the system becomes so familiar with your voice, “after a while, it becomes your friend. Now, when I have a terrible cold and no one can understand me, it can understand me just fine.”
Valenstein says the system may take a little longer than using old-fashioned dictation, but on balance, “it saves me time because I only touch a case once and I get the case out to the caregiver much faster.”
“I've had physicians calling me up within two or three minutes of my filing a report,” Valenstein says. “You can look at improvements in technology in a number of ways—economic return. Does it save me time? But another way to look at it is: Does it save other people time? And when you're in my line of work—supporting other people's work—saving time is a gift.”
The College of American Pathologists also is placing a bet on speech recognition. The medical specialty society is developing what it calls a “diagnostic workstation,” a package using off-the-shelf hardware and a combination of home-grown and proprietary software.
The workstation is designed for use by pathologists across a range of environments, from teaching hospitals to physicians' homes, according to James MacDonald, a medical imaging consultant hired by the college to work on the project. For starters, the system will use speech-recognition software for physician inputs on transcription/dictation as well as the completion of structured reports and a cancer checklist, MacDonald says.
Tim Quigley, executive vice president of business development for the healthcare group at Perot Systems Corp., Plano, Texas—which installs enterprise EHR systems from multiple vendors—says his company has not experienced a recent spike in orders or customer specifications for speech-recognition software in its EHR implementation contracts. Instead, Quigley says, “What we've seen is a steady upward march.”
One key to a successful implementation of any speech-recognition system is managing expectations, Quigley says.
“It's unclear if it will ever reach the point where we'll see 100% accuracy,” Quigley says. Speech recognition “can reduce some of your transcription costs,” he says, but it will come up short of perfection. “If you go in there and expect something that's 100% accurate, you're going to be disappointed. If you go in there expecting you're going to save transcription costs, but know you're going to have to get in there some and edit, then you'll be really happy.”