"Why are breast cancer patients special?" asks Dr. Carolyn Compton, a pathologist and professor at Arizona State University. She has been pressuring her colleagues to improve tissue-handling standards for all types of cancer.
That means collecting samples quickly from the operating room, monitoring their condition, and getting them through the hospital to the pathology lab without delay.
"I don't think physicians think this way about their entire medical system," she says. The process of delivering tissue samples to the pathology lab "may be the weakest link in the quality chain."
This is not just a question about making sure each and every patient gets the most appropriate treatment. It affects the science behind new tests and treatments. Those tissue specimens often end up being preserved in huge collections called biobanks, which are a foundation for the field of precision medicine.
"I don't see how we're going to get precision medicine at the end of the day, when everything under the hood is so imprecise," Compton says.
She is not alone in her concerns.
"We need to be sure that the stuff they're looking at is valid, accurate, reliable and reproducible," says Dr. Richard Friedberg, who has just finished his term as president of the College of American Pathologists. He is also chairman of pathology at the University of Massachusetts Medical School-Baystate.
And he agrees with Compton that the quality of samples ending up in biobanks is simply unknown.
'Garbage in, garbage out'
This isn't just about surgical samples. For example, he says, you can't tell just by looking at a tube of blood whether it's OK. "If it was left on a window sill and hit 100 degrees, a lot of things change."
The fact is, you can still run a test and still get what looks like a valid result, "but if it's garbage in, it's garbage out," Friedberg says.
Compton's concerns also extend to the complex, expensive equipment at the core of precision medicine, including machines that read DNA sequences. Results from one test are not necessarily comparable to results from another, run on an identical sample. And because each company that makes a sequencer uses proprietary algorithms to identify genetic mutations, it's not possible for laboratory scientists to make an independent judgement about which reading is more likely to be correct, she says.
The good news is that DNA samples are less sensitive to being affected by poor treatment, so sample handling is less of an issue for genetic studies. But when there are problems, "you can't make up for a bad sample," Compton says. "So even with all of this technological magic, you can't turn straw into gold with this machine."
And results from pathology labs are not even the biggest source of error in the growing field of precision medicine. Compounding the uncertainty is that patients' electronic medical records are littered with all sorts of errors, and that further complicates efforts to extract reliable results from this pool of "big data."
But not everyone in the new world of precision medicine is so concerned about these quality-control issues.
"I am not a believer in garbage-in, garbage out at all," says Dr. Atul Butte, director of the Institute of Computational Health Sciences at the University of California-San Francisco. (He also holds a chair endowed by Mark Zuckerberg, who made himself rich by exploiting big data with Facebook).
"I know that no one scientist, no one clinician or pathologist is perfect," Butte says. "But I'd rather take 10 or 100 so-called mediocre data sets and find out what's in common, then to take one who says they're perfect at doing this kind of measurement."
Finding meaning in the 'noise' of big data
It's easier to find real things in clean data. But in the real world, he says, data are always full of errors. So when you find something in noisy data, it's more useful in real-world settings.
In Butte's view, it's far more important to make lots of noisy data available, and to as many scientists as possible. "To me, I really want to see a world where I don't just see five or 10 or 20 companies working in particular area, I want a thousand drug companies. I want 10,000 drug companies."
Butte himself used noisy public data to identify genetic markers he says could be the basis for a test to help diagnose a serious complication of pregnancy, high blood pressure or preeclampsia. He founded a company to exploit that idea and then sold the company, so Butte considers his discovery a qualified success, even though human studies have not yet determined whether the test will be valid and useful.
Carolyn Compton says she's heard similar arguments about the value of noisy data from colleagues. She says one argued that if you have a pile of manure, maybe there's a pony in there somewhere. "I don't see bending over backwards to create algorithms to do all the work to sort through all the manure when, in fact, you could have a pile of diamonds from the outset," she says.
Funding agencies prize exploration over quality control research
Compton has been pushing for more rigorous standards in specimen handling since she was an official at the National Cancer Institute a decade ago. But she says funding agencies are more interested in providing support for exploring new ideas than in funding basic research on a seemingly boring subject, such as understanding the consequences of mishandling tissue samples.
Concerns about quality control of tissue samples were a major subject of conversation at the College of American Pathologists' annual meeting in October. "We are moving faster and faster and faster as this whole precision medicine train is moving down the track," says Dr. Tim Allen, a pathologist at the University of Texas Medical Branch in Galveston.
The struggle now is that there simply isn't enough data to make science-based recommendations, but "I suspect standardization of these things is going to become a reality much quicker than I would have expected even a few years ago," Allen says.
In the meantime, samples of unknown quality continue to flood into biobanks. That's problematic because discoveries based on noisy data often turn out to be just plain wrong. So this enterprise inevitably will generate many false leads. And the expensive and most time-consuming part of research is finding out whether something that looks great in the lab actually works in people.