Capturing the Sound of Depression in the Human Voice

One reason for the lack of treatment for mental illness is that conditions like depression, anxiety and post‐traumatic stress disorder are difficult to diagnose.

There are no biological markers of mental illness that can be picked up in a blood test or a brain scan, so physicians must rely on patients’ self‐reports of their symptoms and on mental health questionnaires.

But self‐reporting and doctor observations can be highly subjective. Low energy, for example, can be a sign of depression, a normal response to a busy schedule, or an indicator of hypothyroidism. Subsequently, many physicians miss or misdiagnose psychiatric disorders; one study found that primary care doctors correctly identify depression in patients only 50 percent of the time.

However, there are a few consistent physiological changes that take place in the body when something is off in your brain. Some researchers think these types of changes could be used as objective flags for mental illness.

It’s Not What You Say, But How You Say It

When someone is depressed, their range of pitch and volume drop, so they tend to speak lower, flatter and softer. Speech also sounds labored, with more pauses, starts and stops. Another key indicator is the tension or relaxation of the vocal cords, which can make speech sound strained or breathy. Too much tension or relaxation has been linked to depression and suicide risk. Depressed patients’ tongues and breath may also become uncoordinated, resulting in a slight slurring of speech.

These types of vocal traits — called paraverbal features — are detectable in other mental illnesses too, including bipolar and post‐traumatic stress disorder.

Researchers have been studying these different paraverbal patterns for over a decade, but they’ve only recently been able to make use of them with the rise of advanced computer analytics. Some of the qualities are noticeable to a trained ear, but others are more subtle.

Computer algorithms can pick up on differences in tone that a human might miss, and they can also quantify them. This technology helps clinicians track patients over time by comparing individuals to their own baselines, particularly important if a person has a naturally deep or breathy voice that is not indicative of depression.

“There’s no doubt that paraverbal features can be helpful in making clinical diagnoses,” says Danielle Ramo, an assistant professor of psychiatry at UCSF who is not affiliated with either company. “To the extent that machines are able to take advantage of paraverbal features in communication, that is a step forward in using machines to inform clinical diagnoses or treatment planning.”

Ellipsis used these vocal traits to create its tool. The program was initially trained by taking millions of conversations between nondepressed individuals and mining them for key features in speech patterns, such as pitch, cadence and enunciation. Data scientists then added conversations, data from mental health questionnaires and clinical information, all from depressed patients. That trained the software to identify vocal features indicative of depression.

One potential benefit of the Ellipsis program is that it could alert physicians if their patients may be depressed, even if that’s not the reason they went to the doctor in the first place.

“There are a lot of problems in the process of screening for mental health disorders," says Mark Richman, medical director of the Disease Management Program at Northwell Health, a health network in Long Island and New York City that is in talks with Ellipsis to use its program. "Some of that has to do with limited sensitivity of the tools that are used to do that, some of it has to do with limited time to address and identify mental health disorders.”

Richman says he hopes a tool like the one from Ellipsis will help doctors catch patients who might otherwise fall through the cracks, but without taking up valuable face time in the exam room.

Ellipsis founder and CEO Mainul Mondal says that making doctor appointments more efficient was a key focus for the company.

“People like talking to their health teams. They feel loyalty
and the trust is there, and they’re already having the conversations,” he says.

By capturing and utilizing the conversation — with the patient’s consent — clinicians could screen every patient for depression without taking up extra time, following up with only those who are flagged as high risk.

Integrating the software into the normal flow of a doctor’s visit is the next step, says Elizabeth Shriberg, the Ellipsis chief scientific officer. Including Northwell, Ellipsis is now in talks with other large health care providers around the country to introduce the program into exam rooms, the company says.

Monitoring From Afar

Other physicians want to use voice and behavior analysis technology to learn more about what happens to patients when they’re outside the exam room.

Image from the dashboard of Cogito's Companion app, meant to diagnose depression and other mental health conditions. (Cogito)

“We often get data only when people come into a clinic … and we know that there's a lot that goes on, obviously, outside of the clinic walls,” says David Ahern, director of the program in Behavioral Informatics and eHealth at Brigham and Women’s Hospital in Boston. “It's this huge unmet need of both understanding the nature of a … mental health disorder as it evolves over time and the experience of patients over time.”

Ahern leads a clinical trial testing the efficacy of Cogito's Companion app. Cogito was founded eight years ago as a spin-off from the MIT Media Lab in Boston. In addition to Companion, it makes software that gives real‐time feedback to customer call centers.

With patients' consent, the Companion app mines background metadata from their phone, including text frequency, call logs and geolocation. Using this data, the program creates a daily score for each patient, which is sent to their care team, alerting them if sudden changes in behavior might be linked to a decline in mental health. For example, if a patient starts to text less and has fewer or shorter calls, it may signal that they’re isolating themselves. Their caregiver can then reach out immediately to see if they are all right rather than having to wait until the next visit.

Patients also record a short audio diary a few times a week, which the app analyzes for nonverbal markers of depression, such as tenseness or breathiness, low pitch, volume or range. These results are also included in the patient’s daily score, giving the care team several objective measures of their mental state. In addition to depression, the app is also being tested on patients with bipolar disorder and post‐traumatic disorder, with the goal of predicting the onset of acute psychiatric episodes and suicide risk.

Skyler Place, vice president of behavioral science at Cogito, says he hopes doctors will use the system to be more proactive, not just in patients’ mental health care, but in their overall quality of life. In another trial for veterans at risk for post‐traumatic stress disorder and suicide, clinicians were able to detect major lifestyle changes through the app, as when one person lost his job and another became homeless.

“While the original goal was suicide prevention, in addition to being able to capture that risk, it’s really able to provide an overall risk score for the veteran population, and the clinicians are able to then provide the right service to these veterans in the moment when they need them.”

Replacing Clinicians? Not so Fast

As with the adoption of many new technologies, users may be reasonably concerned about privacy. Both Cogito and Ellipsis say that all the proper precautions are taken to store and protect the data, and in many cases the content of the conversations or voice recordings is irrelevant and even discarded.

The voice screening software is also not perfect; Cogito’s is currently about 75 percent accurate at flagging mental illness as compared to clinical interviews with mental health professionals. Ellipsis declined to state how accurate its software is.

Adam Miner, a clinical psychologist and instructor at
Stanford University, says that, “Clinicians regularly take into account patient's tone and vocal patterns when making diagnostic decisions. If a new technology can help measure, or compare patterns over time, there is the potential to add value.” However, he cautions, there are “risks in oversimplifying the complexity of medical diagnoses.”

The two companies were quick to emphasize that the technology is not a replacement for human clinicians, but simply an aid or tool, akin to a blood test or an electrocardiogram.

“It's a buddy for health teams,” says Mondal. “It is an adviser for behavioral health; you flag patients so health systems can intervene.” After a patient is flagged as needing additional attention, either before or after diagnosis, it is still up to the clinician to administer care, an interaction that hasn’t been disrupted by artificial intelligence.

Capturing the Sound of Depression in the Human Voice

Signed up.