The Shaming of Watson

IBM’s AI tool for healthcare is solving problems, and getting better every year. So why is everyone acting like it’s a failure?

From Hero to Has-Been in Just 4 Years

If you’re at all interested in technology and healthcare, by now you’ve probably heard about IBM Watson, the artificial intelligence technology that went from winning on Jeopardy in 2011 to being marketed to healthcare organizations for a variety of purposes. 

One of the earliest implementations was at MD Anderson Cancer Center (MDA) in Houston, where Watson was to help oncologists solve a big problem: too much data. From a press release in October 2013 entitled “MD Anderson Taps IBM Watson to Power ‘Moon Shots’ Mission”:

MD Anderson has accumulated an unprecedented breadth and depth of clinical oncology data and knowledge… Watson’s cognitive capability has been shown to be a powerful tool to extract valuable insights from such complex data and MD Anderson’s Oncology Expert Advisor capability can generate a more comprehensive profile of each cancer patient… MD Anderson’s Oncology Expert Advisor can provide evidence-based treatment and management options that are personalized to that patient, to aid the physician’s treatment and care decisions. 

Pretty ambitious. Fast forward just 4 years to 2017, though, and the picture has changed:

So in only 4 years, MD Anderson went from christening the project to . . . shutting it down completely. That’s a shockingly short period of time to even get a project running, much less to be able to evaluate whether it’s working. Makes you think that Watson must have been a complete disaster!

Well, not so much. In fact, the program was closed down for contracting irregularities, according to an audit done by the University of Texas (the parent university of MD Anderson). Contracts were made without proper signatures and approval, money earmarked for the Watson program was spent elsewhere, and on and on.

The only thing that wasn’t a problem, according to that audit: Watson. 

In fact, the auditors noted that “Medical oncology staff also told us that internal pilot testing of [Watson’s work with lung cancer treatment] achieved an accuracy of prediction near 90 percent, but advised that significant updating is needed before [Watson] can be tested further.” 

The medical staff also told the auditors that the Watson was not in any way integrated with the hospital’s electronic medical record (EMR) system — not surprising, since one of the main characteristics of EMR systems nationwide is the difficulty in getting them to connect to other systems.

The point is that there is no indication that anyone on the medical staff at MD Anderson felt that Watson itselfwas a problem, or overhyped, or failing to perform up to expectations.

Meanwhile, Progress

As a counterpoint to the MD Anderson collaboration, one can look to IBM’s work with its more than 230 partnering healthcare organizations worldwide. 

At Memorial Sloan Kettering Cancer Center (MSKCC) in New York City. At MSKCC, medical staff have been working with IBM since 2012, using the AI technology in a variety of ways, including 

These systems are in wide use and have been found to be highly concordant with physician recommendations in studies in KoreaThailandMexicoArkansas, North Carolina, and elsewhere. UNC provides a particularly promising example:

In a study UNC conducted with 1,000 actual patient cases to compare Watson’s genomic analysis with the analysis of the center’s tumor board, the investigators found that Watson identified the same potential therapies as the tumor board 99% of the time. But what was more extraordinary, in about 300 patients, Watson found clinically actionable information that the tumor board had not identified.

For a variety of reasons, the systems recommending treatments are unlikely to achieve full concordance, particularly in international settings—the systems are trained on US data and US treatment protocols, for example, can different significantly from those in other countries—but the results are undeniably promising.

Still, the narrative has shifted from favorable to failure.

Online articles mentioning “IBM”, “Watson”, “Health”, and “Fail” (or “Failure”)

Online articles mentioning “IBM”, “Watson”, “Health”, and “Fail” (or “Failure”)

Watson is Bad

Leading the drumbeat of bad news on Watson has been STAT News, an online journal “about life sciences and the fast-moving business of making medicines”. In 2017 and 2018, they’ve published a series of unflattering articles about Watson, with the most damning (“IBM pitched its Watson supercomputer as a revolution in cancer care. It’s nowhere close”) coming out in September 2017.

Some of the criticisms strike me as frankly silly. For example, STAT notes that “the actual capabilities of Watson for Oncology are not well-understood by the public…”, but I’m not quite sure why the public would be expected to have any in-depth understanding of a oncology data system. 

STAT also says that Watson “is still struggling with the basic step of learning about different forms of cancer,” which should surprise no one. Cancer AI isn’t like self-driving cars — where at some point the systems may be good enough that the AI won’t need further training, because the system will know everything it needs to know. In medicine, and particularly in oncology, we do not know — and do not expectto ever know — everything we needto know. 

Like they say, it’s a journey, not a destination.

Even When It’s Good

But the most “underwhelming” aspect of Watson, per the STAT authors, was that it agreedwith the doctors’ treatment ideas:

On a recent morning, the results for a 73-year-old lung cancer patient were underwhelming: Watson recommended a chemotherapy regimen the oncologists had already identified.

On a recent morning, the results for a 73-year-old lung cancer patient were underwhelming: Watson recommended a chemotherapy regimen the oncologists had already flagged… [One of the oncologists] said later that the background information Watson provided, including medical journal articles, was helpful, giving him more confidence that using a specific chemotherapy was a sound idea. But the system did not directly help him make that decision, nor did it tell him anything he didn’t already know.

So we’re supposed to be disappointed because a computer sitting on a desk provided a treatment recommendation for a particular patient, taking into account that patient’s history, labs, type of cancer, etc . . . and it was the same as the one picked by the medical specialist who had trained for more than a decade to do the same thing?

Yes, says STAT: “… showing that Watson agrees with the doctors proves only that it is competent in applying existing methods of care, not that it can improve them.” Ho-hum.

Don’t Believe the Hype

It seems to me that the one truly valid criticism of the Watson system is that IBM hyped it relentlessly (a process you might have cottoned on to once you noticed them hawking Watson on Jeopardy). Guilty as charged: IBM certainly has worked to build expectations — but looking past the hype there is a there there: Watson for Oncology is a widely-used system that most of its user-doctors seem to find useful, with no evidence at all of widespread opposition or objection in that same population of providers.

Does it need more refinement, and more data, and especially more clinical validation and more peer-reviewed reporting in the medical literature? Yes, yes, yes, and yes. But let’s not overlook the fact that even today Watson is an electronic system that can more often than not look at the patient data and give us the same treatment recommendations as a highly trained oncologist with years of experience, which is—make no mistake—a goddamn miracle of technology.

Further reading:

A Reality Check for IBM's AI Ambitions | MIT Technology Review
Paul Tang was with his wife in the hospital just after her knee replacement surgery, a procedure performed on about…

IBM's Watson proves useful at fighting cancer-except in Texas | Ars Technica
IBM's Watson is on the move. With the new ability to quickly develop clever personalized treatment strategies for…

MD Anderson Benches IBM Watson In Setback For Artificial Intelligence In Medicine | Forbes
It was one of those amazing "we're living in the future" moments. In an October 2013 press release, IBM declared that…

IBM Watson and Quest Diagnostics Launch Genomic Sequencing Service Using Data from MSK | Memorial Sloan Kettering Cancer Center press release
IBM Watson Health and Quest Diagnostics announced the launch of a new service that helps advance precision medicine by…

Mayo Clinic boosts clinical trials with IBM Watson artificial intelligence | Healthcare IT News
Mayo Clinic and IBM Watson Health have announced the results of a cutting-edge project putting the supercomputer to…

Oncologists Partner with Watson on Genomics | Cancer Discovery
Genetic sequencing has become increasingly affordable and accessible for cancer patients, but the complexity of…

Abstract S6-07: Double blinded validation study to assess performance of IBM artificial…
Abstracts: 2016 San Antonio Breast Cancer Symposium; December 6-10, 2016; San Antonio, Texas Background: IBM Watson for…