The Rule of Three
Stanford University dermatologist Bernice Kwong specializes in skin conditions that tag along with cancer treatments. In her practice and on patient message boards, she's constantly on the lookout for symptoms that could be drug reactions.
In January 2017, a patient came to Kwong's office with an unusual complaint. "I've noticed that when I work out, I just get really hot," he told Kwong. "I don't sweat anymore, and I used to sweat so much." He was taking a drug called Tarceva, or erlotinib, that's used against lung cancer.
At first, Kwong thought the problem might be hormonal. But soon after, two more of her patients at Stanford on the same drug reported that they'd also stopped sweating. "Anytime something hits three, I think, OK, I gotta look into this a little bit more," she says.
But she hadn't seen any reports before of a lack of sweating — hypohidrosis — as a side effect for Tarceva. Her sample size of three patients was small. She'd need more data to figure things out.
From talking with patients and perusing online forums, Kwong knew people discussed their treatments and side effects online. In fact, hundreds of thousands of people participate in support groups and communities she'd looked at on the website Inspire. She partnered with the site with the idea that its trove of patient reports could connect more dots between hypohidrosis and Tarceva.
A Sharper Data Set
Inspire's focused groups are filled with patients' experiences with diseases and treatment, so analyzing posts requires less filtering than Facebook or Twitter data would, says Nigam Shah, a Stanford University bioinformatics specialist who collaborated with Kwong. It also helped that the skin reactions they were interested in are relatively easy for patients to describe.
Still, the posts on Inspire's boards are less precise than insurance claims and health records typically used for studies on side effects.
Take loss of sweating. Most doctors would refer to that as hypohidrosis, so a records-based study could focus on that phrase. In online message boards there's a lot of variety. One person's "I can't sweat anymore" might be another's "I'm overheating."
Kwong, Shah and their colleagues used a deep learning algorithm to process the phrases surrounding reports of symptoms, basically finding contextual clues to identify the different ways patients referred to side effects.
In 8 million posts on Inspire from a 10-year period, 4,909 users mentioned Tarceva, or erlotinib generically. Although clinical reports don't link the drug and hypohidrosis, 23 patients wrote about the medicine and loss of sweating in the same post — a statistically significant connection, Kwong says. The research group's findings were published in JAMA Oncology in March.
Using the same approach to monitor posts about a different class of immunotherapy cancer drugs, the researchers found mentions of autoimmune blistering that also predated the clinical reports of the side effect.
Given the stakes of cancer treatment, Kwong says she's inclined to help patients manage side effects instead of stopping a given drug. But earlier alerts from systems like this could have made a difference in her practice. "If we had had this program already, I would've been looking out for [blistering] sooner and maybe I would've noticed it earlier in some patients," Kwong says.
How Clinical Trials Miss Side Effects
From numbers alone, it's no surprise that clinical trials for drugs don't pick up every side effect. The Food and Drug Administration first approved Tarceva in 2004 on the basis of a trial that enrolled 731 patients, 488 of whom received the drug. Uncommon effects might not show up in a group that size.
On Inspire's message boards, more than 10 times as many patients reported using Tarceva, so it's reasonable to imagine that online posts could include reports of rarer side effects.
And while drug trials do collect data on side effects, their overriding goal is to find out whether or not a drug works, says Dr. Aaron Kesselheim, a professor of medicine at Harvard University. "After a drug is approved, it is absolutely essential to continue to observe, follow and study the drug rigorously as it's used in a larger population to try to really get a handle on the safety of the drug," he says.
Collecting data about a drug from insurance claims and health records typically happens with quite a time lag. So mining the Internet and social media for casual patient reports is tempting, Kesselheim says, because of its potential scale and speed. But the approach also has drawbacks. "You just get this tidal wave of data, and it's hard to know how to assess it in a rigorous and thoughtful fashion," he says.
That hasn't stopped drug companies from wading in. Roche has sampled mentions of their products from Twitter, Tumblr, Facebook and blogs to learn more about drug safety. GlaxoSmithKline has tried it too, analyzing millions of mentions of drugs from Twitter and Facebook.
Much of the work published so far has focused on drug reactions. But scraping public social media data isn't just a matter of product safety. The company Synthesio touts its social data services for drugmakers as a way to answer customer questions, conduct market research and influence purchasing.
In terms of extending studies to mine even bigger networks, like Twitter or Facebook, for potential side effects, Kesselheim points to issues of representation and privacy. As with any analysis, a deep learning model like the one Shah used on the Inspire message boards can only make conclusions about the information it sees.
And it's hard to guarantee that message boards and social media represent all patients. In 2012, researchers gave 231 breast cancer patients in rural Michigan and Wisconsin computers, Internet access and training to use an online cancer support group. The researchers found that white women were much more likely to log on and post in the group than black women. Younger women were also more likely to post information.
While the long-standing approach to post-approval drug studies — using health records and claims data — may be slower, Kesselheim says, they're more established. "There are methodologies and tools that you can use in claims data to try to make sure that you are making conclusions that can be generalizable across different races and ethnicity and genders and parts of America," he says.
There's also the issue of privacy — patients' health records are protected by the Health Insurance Portability and Accountability Act of 1996, whereas public data online aren't, Kesselheim says.
For Stanford researcher Shah, this wasn't an issue. Inspire's privacy statement tells patients their posts may be used for research if they're not private, and Shah feels comfortable following common sense rules when using public data. "As in, if somebody did [something] with my data and I would be upset, don't do that with someone else's data," he says.
But the newness of social media makes Kesselheim wary. "There are big questions that remain about how patient privacy is upheld in those social media contexts, and I think that's a really big issue to think about moving forward as people are trying to use those outlets to provide insight into drug safety and side effects."
As a patient, Ruddick isn't bothered by the idea of researchers and pharmaceutical companies studying data from social media and patient message boards, as long as the data are public or there's mention of data sharing in a privacy statement.
She works as a communications director in New York City, so she's thought a lot about the nature of information online. "If I'm putting something out there on the Internet, it's for the Internet. I know the world is going to see it," Ruddick says.