An African American male is tested during the Tuskegee Study of Untreated Syphilis in the Negro Male, a textbook example of an unethical study. From 1932-1972, U.S. Public Health Service medical personnel conducted tests on African American men to find out what happens when syphilis is left untreated. Researchers didn't tell the men they had syphilis, and didn't offer them penicillin when it became the standard of care for that disease. The study was halted after it became public, and led to sweeping changes in federal laws protecting study participants. (Centers for Disease Control and Prevention)
Every day we make a series of choices that are all really the same choice underneath: between our favorite things and new ones. Do we go to the restaurant we adore, or the place down the road that just opened up? Get our “usual,” or try the special? Invite our best friend, or reach out to a new acquaintance we’d like to get to know better? We intuitively understand that life is a balance between novelty and familiarity, between the latest and the greatest, between taking chances and savoring what we know and love. But the unanswered question is: What is that balance?
It’s a question with higher stakes than we might realize. And it has a more explicit answer, from a field we don’t often turn to in moments of human indecision, though perhaps we should: computer science. Indeed, computer scientists have been working on finding this balance for more than 50 years. They even have a name for it: the “explore/exploit” tradeoff.
Today, algorithms derived from studying the explore/exploit tradeoff power a sizable fraction of the Internet economy, including the advertising business responsible for almost all of Google’s revenue -- they determine when to display the ad that has performed the best so far and when to experiment with potentially superior alternatives. Algorithms have also become essential to political campaigning, an indispensable part of honing messaging and donation appeals.
The single most important application of explore/exploit algorithms, however, is a domain where human lives are directly on the line. That domain is clinical trials, and a growing community of doctors, statisticians, and computer scientists think we’re doing them wrong.
In English, the words “explore” and “exploit” come loaded with opposite connotations. But to a computer scientist, these words have much more specific and neutral meanings. Simply put, exploration is gathering information, and exploitation is using the information you have to get a known good result. It’s fairly intuitive that never exploring is no way to live. But computer science shows that failing to exploit can be every bit as bad.
One of the fundamental maxims in medical ethics is the oath to “first, do no harm.” But this is not always as straightforward as it sounds. In clinical trials, doctors and scientists encounter directly the tension between acting on one’s best knowledge and gathering more. Do you give someone the best known conventional treatment, even if it’s not very good? Or do you give them an experimental treatment that might be significantly better—but might also be worse? Computer science is no substitute for ethics, but perhaps it can offer a degree of precision that ethics alone cannot.
The question that has arisen over the last several decades is whether the standard approach to conducting clinical trials really does minimize risk to patients. In a conventional clinical trial, patients are split into equal groups, and each group is assigned to receive a different treatment for the duration of the study. (Only in exceptional cases does a trial get stopped early.) This procedure focuses on decisively resolving the question of which treatment is better, rather than on providing the best treatment to each patient in the trial itself. Maybe this is a false choice. Doctors are gaining some information about which option is better while the trial proceeds—information that could be used to improve outcomes not only for future patients beyond the trial, but also for the patients currently in it.
In 1969, Marvin Zelen, a biostatistician who spent most of his career at Harvard, proposed conducting “adaptive” trials, in which the chance of using a given treatment increases with each success and decreases with each failure. In his proposal, you start with a hat that contains one ball for each of the two treatment options being studied. The treatment for the first patient is selected by drawing a ball at random from the hat. (The ball is put back afterward.) If the chosen treatment is a success, you put another ball of the same kind into the hat—now you have three balls, two of which are for the successful treatment. If it fails, however, then you instead put another ball for the other treatment into the hat, making it more likely you’ll choose the alternative.
Zelen’s algorithm was first used in a clinical trial 16 years later, for a study of extracorporeal membrane oxygenation, or “ECMO”—an audacious approach to treating respiratory failure in infants. Developed in the 1970s by Robert Bartlett of the University of Michigan, ECMO takes blood that’s heading for the lungs and routes it instead out of the body, where it is oxygenated by a machine and returned to the heart. It is a drastic measure, with risks of its own (including the possibility of embolism), but it offered a possible approach in situations where no other options remained. In 1975 ECMO saved the life of a newborn girl in Orange County, California, for whom even a ventilator was not providing enough oxygen. But in its early days the ECMO technology and procedure were considered highly experimental, and early studies in adults showed no benefit compared to conventional treatments.
From 1982 to 1984, Bartlett and his colleagues at the University of Michigan performed a study on newborns with respiratory failure. The team was clear that they wanted to address, as they put it, “the ethical issue of withholding an unproven but potentially lifesaving treatment,” and were “reluctant to withhold a lifesaving treatment from alternate patients simply to meet conventional random assignment technique.” Hence they turned to Zelen’s algorithm. The strategy resulted in one infant being assigned the “conventional” treatment and dying, and 11 infants in a row being assigned the experimental ECMO treatment, all of them surviving. Between April and November of 1984, after the end of the official study, 10 additional infants met the criteria for ECMO treatment. Eight were treated with ECMO, and all eight survived. Two were treated conventionally, and both died.
These are eye-catching numbers, yet shortly after the University of Michigan study on ECMO was completed, it became mired in controversy. Having so few patients in a trial receive the conventional treatment deviated significantly from standard methodology, and the procedure itself was highly invasive and potentially risky. After the publication of the paper, Jim Ware, professor of biostatistics at the Harvard School of Public Health, and his medical colleagues examined the data carefully and concluded that they “did not justify routine use of ECMO without further study.” So Ware and his colleagues designed a second clinical trial, still trying to balance the acquisition of knowledge with the effective treatment of patients but using a less radical design. They would randomly assign patients to either ECMO or the conventional treatment until a pre-specified number of deaths was observed in one of the groups. Then they would switch all the patients in the study to the more effective treatment of the two.
In the first phase of Ware’s study, four of 10 infants receiving conventional treatment died, and all of the nine infants receiving ECMO survived. The four deaths were enough to trigger a transition to the second phase, where all 20 patients were treated with ECMO and 19 survived. Ware and colleagues were convinced, concluding that “it is difficult to defend further randomization ethically.”
But some had already concluded this before the Ware study, and were vocal about it. The critics included Don Berry, one of the world’s leading experts on the explore/exploit tradeoff. In a comment that was published alongside the Ware study, Berry wrote that “randomizing patients to non-ECMO therapy as in the Ware study was unethical. . . . In my view, the Ware study should not have been conducted.”
And yet even the Ware study was not conclusive for all in the medical community. In the 1990s yet another study on ECMO was conducted, enrolling nearly 200 infants in the United Kingdom. Instead of using adaptive algorithms, this study followed the traditional methods, splitting the infants randomly into two equal groups. The researchers justified the experiment by saying that ECMO’s usefulness “is controversial because of varying interpretation of the available evidence.” As it turned out, the difference between the treatments wasn’t as pronounced in the United Kingdom as it had been in the two American studies, but the results were nonetheless declared “in accord with the earlier preliminary findings that a policy of ECMO support reduces the risk of death.” The cost of that knowledge? Twenty-four more infants died in the “conventional” group than in the group receiving ECMO treatment.
Zelen’s algorithm offers one simple way to navigate the explore/exploit tradeoff, and in the decades since it was first proposed, computer scientists have developed a host of even more refined strategies for how best to gain new information while leveraging what’s been gained so far. These approaches are finally beginning to gain acceptance within the medical mainstream. Don Berry, for example, has joined the MD Anderson Cancer Center in Houston, where he uses methods developed by studying the explore/exploit tradeoff to design clinical trials for a variety of cancer treatments. And in 2010 and 2015, the FDA released a pair of draft “guidance” documents on “Adaptive Design Clinical Trials” for drugs and medical devices, which suggests that—despite a long history of sticking to an option they trust—they might at last be willing to explore alternatives.
We don’t think of a doctor choosing which treatment to give a patient as being in the same position as Google’s servers deciding which ad to show next. And we don’t think of Google’s ad servers when we waver over whether to get our favorite dish yet again or branch out. But computer science shows us that these are, in a fundamental and real way, the same problem. In so doing, it offers a profound glimpse into the structure of human decision-making. And a guide for us, in the times when it matters most.
Get the best of KQED’s science coverage in your inbox weekly.