Lies, Damned Lies, and Statistics: How to Judge Research Study Results

A recent study making headlines around the world found that some early-stage breast cancer patients could avoid chemotherapy with only a slightly higher risk of the cancer recurring and spreading than those who underwent the treatment.

The study found that after five years, women with a particular genetic profile--uncovered in a genetic test--fared hardly any better after chemo than woman who skipped the treatment.

The cancer failed to spread in 95.9 percent of the women who received chemotherapy, compared to 94.5 percent of those who went without it.

This appears to mean that if you have the right genetic profile, you would be taking on only a 1.5 percent greater risk by avoiding a costly treatment known for side effects such as hair loss, mouth sores, diarrhea and, in rare cases, leukemia or other diseases.

Deanna Attai, assistant clinical professor of surgery at the David Geffen School of Medicine at the University of California, Los Angeles, told Health News Review that one of her takeaways from the study was that it is "very clear that not all patients benefit from chemotherapy. Our old habit of recommending chemotherapy 'just to be sure' is not appropriate."

While that may be true for some clinicians and patients, a close look at the study results shows there is more potential variation in the risk rates than the headline number of 1.5 percent suggests.

That's because the narrow, 1.5 percent difference between the two groups could be just statistical noise--a random variation that would not be duplicated if the experiment were run again. If and when someone replicates this study, it could turn out that women who get chemo might do a bit worse or might do a bit better. And the same is true for women who don't get chemo.

How would you know that?

You'd have to look for two numbers that are sometimes deep in the data. These numbers are really different sides of the same coin and are related to each other. These two measures can often give you some idea about how significant a result really is.

Are the Results Significant?

One way to tell if a result is significant--meaning not due to chance-- is by looking for the number that tells you how likely it is the result is random. That number is called the p-value.

A p-value of 0.1 means there is a 10 percent chance the result happened randomly. A 0.2 translates into a 20 percent chance the result is random, and so on. Scientists have decided that in most experiments, a result is significant if the p-value is less than 0.05. (Though this threshold of validity is not without controversy.)

So when trying to determine the significance of a particular result, you should look for the p-value that's associated with it. Ideally you want this number to be lower than 0.05. Anything much higher could be a red flag.

In the breast cancer study, it turns out that the p-value for the 1.5 percent lower risk offered by chemotherapy is 0.27. That is very high for a scientific study. It means there is a 27 percent chance the result was random.

"That is pretty highly insignificant," Susan Wei, an assistant professor in the Division of Biostatistics at the University of Minnesota, told Health News Review.

In other words, the study can’t say with certainty that a woman who chooses not to have chemo actually takes on only a 1.5 percent higher risk. The risk could be somewhat higher or lower, and this may or may not make a difference in someone's treatment decisions.

How Confident Can You Be in the Results?

Another way to evaluate a result involves looking at how confident researchers are that they would get the same result if they repeated the experiment. This is another way of asking whether or not the study is relevant for a broader population.

This measure is called the "95 percent confidence interval," or CI.

The CI tells you that 95 percent of the time, when the experiment is repeated, the result will fall between a specific range of numbers. Ideally, you want this range to be small. The wider the range of possible results, the less confidence the researcher can have in the particular result they got.

It's something like the margin of error in a poll: If it's too great, it will put the outcome in doubt.

Let's take a made-up poll that shows Hillary Clinton beating Donald Trump 40-33. That sounds very good for Clinton supporters. But if the margin of error were 10 percent, that would mean the poll results could actually be 50-23 in favor of Clinton or 30-43 in favor of Trump. When the margin of error is so large it can swing the outcome in the opposite direction, that means the poll results are statistically in question.

If the margin of error were only, say, three percent, the narrowest lead Clinton could have would be 37 to 36; there is no potential change in the outcome -- she still wins.

Now let's look at the breast cancer study.

Of the women who received chemo, 95.9 percent of them made it to the 5-year mark with no breast cancer spreading.

But a more complete reporting of the results shows the CI for that group of women is a range between 94 and 97.2 percent. That means if someone repeats the experiment, the number of women who receive chemo and make it five years with no cancer spreading could be anywhere between 94 and 97.2 percent. So the benefit of getting chemo could be higher or lower than the study suggests.

For the women who didn't get chemo, the CI is between 92.3 and 95.9 percent.

Chemo: 94 to 97.2 percent

No Chemo: 92.3 to 95.9 percent

It means the study isn't able to say that if you have a particular genetic profile, you can avoid chemotherapy and do just as well after five years as women who get chemo. You can't know the answer to that because the possible results have such a wide range and overlap that they can even swing the outcome in the opposite direction:

Bottom line, the CI tells you the study can't say whether or not women with this particular genetic profile can safely avoid chemo and do almost as well five years later as those who got chemo. Researchers would need to repeat the experiment, maybe several times, maybe with a larger group or by making other changes, to get a more confident answer.

That doesn't make this a bad study; it still showed that a lot of women with a certain genetic profile did not have their cancer spread despite foregoing chemo. However, the results still need to be more firmly proved.

News Daily Newsletter

The Bay

Emma’s Must-Sees

Videos from KQED Live

Donor-Advised Funds

Lies, Damned Lies, and Statistics: How to Judge Research Study Results

Successfully subscribed.
Browse more newsletters

News Daily Newsletter

The Bay

Emma’s Must-Sees

Videos from KQED Live

Donor-Advised Funds

Lies, Damned Lies, and Statistics: How to Judge Research Study Results

Successfully subscribed. Browse more newsletters

Successfully subscribed.
Browse more newsletters