An illustration of sound waves

Screening for Depression Through Voice Recordings

A group of researchers, led by Elke Rundensteiner, has developed a highly effective technology that screens voice recordings for signs that a speaker is depressed, an important advance that could alert physicians and other clinicians to people who need help.

Audio-assisted Bidirectional Encoder Representations from Transformers (AudiBERT), the system developed by the researchers, leverages the words a speaker uses as well as the speaker’s tone, says Rundensteiner, William Smith Dean’s Professor of Computer Science and founding director of WPI’s Data Science Program.

“Clinicians can detect depression and other mental ailments based on the content and tone of interviews with patients,” Rundensteiner says. “With deep learning data science techniques, we have developed a digital technology that examines a speaker’s words and tone for signs of depression. If widely deployed, this tool could dramatically expand mental health screening at low costs.”

The researchers’ innovation was selected for presentation in November 2021 at the Association for Computing Machinery Conference on Information and Knowledge Management, where it received the Best Applied Research Award. The authors are Rundensteiner; Ermal Toto ’21 (PhD), previously a graduate student in computer science with Rundensteiner and now WPI assistant director of academic research computing; and ML Tlachac, a PhD student in data science with Rundensteiner.

AudiBERT also addresses a critical research challenge: Relatively few voice data sets exist that have been labeled for indicators of depression. This limits the amount of data available for training deep learning models, a type of machine learning that automatically analyzes raw digital data to produce a model that can make predictions. Generally, more data leads to better models.

“Voice recording technologies are everywhere, from our smartphones to digital home assistants, but privacy concerns about recordings mean that it’s difficult to find large voice data sets that label spoken words as signs of mental ailments,” Tlachac says. “We set out to innovate a depression-screening solution that could be trained, even using small data sets. In addition, we wanted to demonstrate that voice is an excellent modality for screening.”

Click on this switch to toggle between day and night modes.