A Google software engineer took to an official company blog last Friday to announce that the tech firm was changing its Flu Trends tool for the 2014-15 flu season. That's potentially good news for healthcare for at least two reasons. First, it presumably means that the healthcare community will be getting more accurate data about flu incidence. Second, it provides a good opportunity to think about some of the challenges posed by the use of so-called big data, massive data bases combed through by computer power rather than human brain power, for healthcare's future.
Google's Flu Trends tool was based on the idea that the company could predict the incidence of influenza faster and more locally than the Centers for Disease Control and Prevention, which typically sent out its data with a lag. The Google tool initially used a group of 50 to 300 search queries that it believed was correlated with flu incidence and extrapolated from there. The company has since expanded the tool to 29 countries, and launched Dengue Trends in 10.
The thinking behind the tool was similar to the logic Google uses for its business model, namely that people often want or are experiencing what they search for. That makes it easy to sell advertisements against searches and is the same insight that's driving its virtual visits trial: if you search “knee pain,” chances are decent you have knee pain and would like to do something about it.
The problem, noted a March 2014 paper in Science Magazine, was that the tool over-predicted the prevalence of the flu in the 2011-12 and 2012-13 seasons. Hence the latest change from Google. The March paper isn't sure about why the tool didn't work, but speculated it was because the underlying data set, Google searches, is not stable. The company is constantly changing how search works, which means that the old assumptions built into the old version of Flu Trends might not work.
Fortunately, an Oct. 29 paper in Royal Society Open Science showed, it's possible to improve predictions by combining the data from the CDC and Google.
This effort turns out to be a good case study for big data use in healthcare generally. Advocates of the use of large data sets combined with computer tools are very excited for their potential use in healthcare. Last week, digital healthcare accelerator Rock Health put out a report on the future of predictive analytics, noting that $1.9 billion had been put into companies using those tools since 2011.
Attention is being paid at the hospital and provider level as well. Last week's College of Healthcare Information Management Executives forum featured at least 10 events either explicitly or implicitly focused on that potential combination.
Vinod Khosla, an important venture capitalist, has predicted that the doctor of the future will handle about 20% of the tasks that doctors do now, with computing power handling much of the rest. That prediction is based on the consistency and comprehension of digital technology: computers, unlike humans, are dispassionate; and computers, unlike humans, can understand vast data sets rapidly.
Those computers, he predicts, will be handling data such as continuous blood pressure, glucose, skin temperature, genomics, proteomics and other basics.
All that data, Rock Health Managing Director Malay Gandhi said in a webinar presenting his report, means that “We know we're going to reach the point where the number of data inputs are so wide that it would be impossible for a human being to process them, and thus the algorithm will be better at doing it.” In the future, doctors might be required to use software.
The frequent analogy for the interplay of big data and analytics is the airline industry. “No one actually expects humans to fly a complex airplane except in exceptional circumstances,” Khosla writes “Autopilot machines do the vast majority of flying.”
The Rock Health report uses the same analogy, with one executive saying that, like pilots in airplanes, “Physicians will be monitoring algorithms.”
It's here that Google Flu Trends' experience becomes relevant. “Monitoring” is precisely the right word to use. While the March Science paper criticizing Google Flu Trends bemoaned the lack of transparency and replicability of the closed data set Google used, at least it could be observed and simulated from the outside.
Many other healthcare algorithms are not so easily overseen; they're not transparent. And one executive at a big data vendor said in an interview during CHIME's forum, that hospital executives are starting to demand more proof and validation for analytics tools.
Gandhi acknowledged that problem in the webinar, saying that “What you can't see, you can't trust.” But, he cautioned, algorithms could be too transparent: trying to show every factor that went into an algorithm's recommendation could make it “entirely too complex.” That makes recommendations difficult to assess. What if, like Google, the underlying data set is changing underneath providers' feet?
So that transparency needs to be built in, somehow.
While the Food and Drug Administration hasn't unveiled its regulatory framework for clinical decision-support software, experts and industry participants have used phrases like “substantial dependence” to describe systems that should be more tightly regulated. That is, if a user is dependent on the recommendations of the software—because there isn't much time, or because it doesn't reveal how it makes decisions—it needs to be overseen more tightly.
The airline industry metaphor deployed by many advocates also turns out to be apt for another reason: the worry is that such autopilot systems degrade human pilots' skills, making them less able to fend for themselves when the extraordinary does occur. In a Vanity Fair article examining a 2009 plane crash in the Atlantic Ocean, one Boeing executive worries that if autopilot systems handle 98% of flight, how do you train pilots for the 2% of instances designers can't predict? It may turn out to be difficult to train providers to monitor these systems accurately.
Healthcare will need to figure out how to watch the digital watchmen if it intends to leave large parts of the system to analytics and large data sets.
Follow Darius Tahir on Twitter: @dariustahir