Computational Connections

When one imagines essential apparatus for biological and biomedical research, laboratory equipment like mass spectrometers, electron microscopes, and DNA sequencers come to mind. But in the hands of computational scientists like Dmitry Korkin, the computer has become one of the most powerful and versatile tools for studying problems in fields as diverse as molecular biology, virology, and neuroscience. In fact, Korkin has shown that computers can take researchers’ places where traditional life sciences and medical studies cannot go, and at the same time help direct and accelerate the work of those in the lab and the clinic.

Korkin, Harold L. Jurist ’61 and Heather E. Jurist Dean’s Professor of Computer Science, was director of the university’s interdisciplinary program in bioinformatics and computational biology before beginning a sabbatical leave this academic year. Bioinformatics, he says, develops and applies data science tools to huge sets of biological and clinical data, searching for patterns and relationships hidden within the huge mass of numbers. Computational biology uses advanced computing methods to uncover details about biological mechanisms and biomolecular structures. Both fields offer advantages over non-computational methods.

“The cost of computational methods is orders of magnitude smaller than experimental methods,” Korkin says. “And with computational methods, we can make discovery faster.”

As one recent example, he points to work his lab undertook early in the COVID-19 pandemic. After the Chinese government released details about the genetic makeup of the newly discovered SARS-CoV-2, the coronavirus that causes COVID-19, Korkin and his students used computational tools to assemble molecular models of the proteins in the viral envelope, including the spike proteins that became targets for vaccine developers.

An illustration representing the brain and the COVID-19 spike protein

Korkin’s team also identified the likely interactions between the viral proteins and proteins in human host cells, and pinpointed key differences between SARS-CoV-2 and SARS-CoV-1, which caused a global outbreak of SARS (severe acute respiratory syndrome) in 2003. The knowledge, which they quickly shared online in February 2020 and expanded on in a March 2020 article in the journal Viruses, would prove helpful to researchers rushing to develop COVID-19 vaccines and treatments.

“We were able to develop and publish these models in a matter of weeks,” he says; in fact, his computational models of SARS-CoV-2 were available nearly three months before comparable laboratory results. The knowledge he and other computational scientists gained during the COVID-19 pandemic about the structure and function of viral proteins—and about the strengths and weaknesses of current modeling methods—will help them prepare for the inevitable next global pandemic.

Following the Isoforms

A native of the former Soviet republic of Kazakhstan, Korkin received bachelor’s and master’s degrees in applied mathematics from Moscow State University, then earned a PhD in computer science at the University of New Brunswick in Canada. During a postdoctoral research appointment at the University of California, San Francisco, he took a detour into the field of structural mathematics.

Working with Andrej Sali, a pioneer in computational biology, he learned about homology, or comparative modeling, in which researchers use a protein with a well-understood structure as a template for sussing out the structure of a similar protein. This was among the techniques Korkin used to work out the structure of the SARS-CoV-2 envelope proteins.

Korkin joined the WPI faculty in 2014 after teaching for seven years at the University of Missouri-Columbia. Over the years, the ideas and principles of structural bioinformatics have underpinned research that has taken him deeper and deeper into the complex structures, mechanisms, and interconnections of biological systems. It is a quest that has spanned the full range of scales—from molecules, to cells, to organs, to patients—and has produced data connecting all of those levels into an intricate but increasingly well-understood web.

Illustration representing computational biology

Korkin’s bioinformatics work also includes computational approaches to the various “omics” that encompass our expanding understanding of the molecular dance that plays out in living cells: genomics (the structure and function of genes); transcriptomics (the transcription of genetic information from DNA to RNA); interactomics (the interaction of molecules in the cell, particularly protein-protein interactions); and proteomics (the role and function of proteins in the cell).

Some of Korkin’s recent work, including a new study funded by a $1.3 million award from the National Institutes of Health, focuses on a phenomenon called alternative splicing. Scientists have long understood how sequences of bases, or genes, in the double-stranded DNA molecule are copied, or transcribed, into single-stranded RNA molecules; and how those sequences are then translated into the strings of amino acids that make up proteins.

Early on, it was assumed that each gene coded for a single protein. But more recently it has become clear that it is possible for a single gene to produce hundreds of different proteins. In between the transcription and translation steps, the order of information in the genetic code can be shuffled, with each reordering producing a different protein, or isoform. In this way, the 20,500 genes in the human genome can code for some 300,000 proteins.

“The study of alternative splicing is one of the newest and most exciting areas of computational biology and bioinformatics,” Korkin says. “It’s exciting, because it is such a flexible mechanism.”

Unlike mutations, he notes, which are mistakes introduced into the genetic code by radiation, chemicals, viruses, or other means, alternative splicing appears to be triggered by ambient conditions within the body and by the environment. “Mutations are changes we can inherit,” he says. “But alternative splicing can be a response to changes in your stress level or your diet. A number of complex diseases already have been linked to this mechanism.”

By using computational tools to track the products of alternative splicing over time, Korkin says, and better understanding how the varying isoforms interact with other proteins in the cell and how those interactions, in turn, may change the course of a disease, it may be possible to develop diagnostic tools that detect those changes much earlier than is now possible. That may result in treatments to counteract the effects of specific changes brought about by alternative splicing. “The implications are tremendous,” he says.

As an example, Korkin cites a study recently published by his group in the journal Cell Reports. The team used biomedical data mining to comb through the results of experiments in which genetically identical mice were fed different diets, with one group eating a high-fat, high-sugar diet that is known to trigger diabetes in the mice. Looking just at gene expression, Korkin’s team observed only changes in liver cells that are known to be linked to diabetes. But when they sorted through the isoforms produced by alternative splicing, they discovered that changes were also taking place in brain cells.

This is a big deal, because computational tools allowed us to reach levels of understanding that experimental science alone cannot reach.

“Diet, just by itself, can have a profound impact on the molecular constitution of brain cells,” he says, “and, therefore, on their function. This is a big deal, because computational tools allowed us to reach levels of understanding that experimental science alone cannot reach.”

Korkin says that because the products of alternative splicing may be among the earliest indicators of the onset of a disease, this tool has the potential to make the earliest possible detection of diseases like cancer. (“Cancer is the ‘poster child’ for what is happening at the alternative splicing level,” he says.) It also may be a particularly powerful method for diagnosing highly complex conditions like autism spectrum disorder, “where things that are manifested at the alternative splicing level may not be detectable at the gene level.”

The Power of Collaboration

In the new NIH-funded study on alternative splicing, Korkin’s lab is collaborating with the lab of Gloria Sheynkman, assistant professor of molecular physiology and biological physics at the University of Virginia School of Medicine, to better understand the functions of individual protein isoforms and to see how stable they are over time and how they interact with other proteins. Sheynkman, with laboratory tools that include the CRISPR gene editing technique, will test Korkin’s predictions and generate new data to plug into his models.

“The idea is for computational methodologies to work in a symbiotic relationship with the experimental sciences to produce a feedback loop,” Korkin says. “Computational methods use experimental data to generate predictions that the experimentalists then confirm or reject. Their results are fed back into the computational methods, allowing us to make the next round of predictions more accurate.”

Korkin says collaborations like his partnership with Sheynkman are vital to his work. “Modern science is interdisciplinary, and it is impossible for a single lab to grasp the nuances and the expertise of multiple fields,” he says. “So, the majority of our interdisciplinary projects involve collaborations.”

Korkin has collaborated with many experimental scientists, including Elizabeth Blackburn, former professor of biology at the University of California, San Francisco, and recipient of the 2009 Nobel Prize in Medicine, as well as other computational biologists, among them his postdoctoral advisor Andrej Sali, now a member of the National Academy of Sciences.

Korkin has also begun partnering with practicing clinicians. A recent paper in the European Journal of Psychotraumatology described the result of a collaboration between Korkin’s lab and McLean Hospital, the main psychiatric facility at Harvard Medical School. The project aims at zeroing in on behavioral warning signs that may more accurately identify women who are at risk of attempting suicide. The study was funded by the Julia Kasparian Fund for Neuroscience Research. Established by Harry Kasparian ’73 in memory of his daughter, Julia, who died by suicide in 2016, the fund supports collaborative work between researchers at WPI and McLean that may lead to better prevention, diagnosis, and treatment of mental illness.

In the study, Korkin and his team started with data from lengthy patient questionnaires that clinicians use as tools for diagnosing mental health disorders and identifying patients at risk of suicide. Looking at the answers provided by 90 women with histories of childhood abuse, post-traumatic stress disorder, and dissociation (a condition marked by various levels of detachment from reality), and by 30 women in a control group, the Korkin team used AI methods to sort the data in ways that uncovered hidden connections among seemingly unrelated variables. Most important, they identified a handful of questions that reliably predicted suicidal ideation. All of them point to dissociation related to past traumas, a condition that Korkin says has been understudied and underdiagnosed.

“We are talking about two or three questions, from among hundreds,” he says, “that are equally powerful in predicting suicidal ideation in patients.”

In future work with McLean, Korkin says his team may factor in results of structural MRI and functional MRI scans of patients to connect the survey data to actual changes in the brain. “And in the long run,” he says, “we hope to broaden our understanding by connecting these results to what is happening at the molecular level.”

Korkin says he sees making connections between disparate kinds of data, and between the expertise of a diverse group of collaborations, as the future of his field. “In an ideal world, as we attack the grand challenges of science and medicine, we will see all of these collaborations integrated into one big collaboration, one where all the parties talk to each other and where we, as bioinformaticians, gather and analyze the data that connects all of the various parts.”

Filling in the Holes

Much of the work Korkin tackles with his bioinformatics tools involves massive amounts of data. But what happens when data sets contain massive holes? That is a problem he helped address in a new study published in the journal Nature Communications. In the study, collaborators across a number of fields, including computer science, electrical engineering, and mathematics, looked at the challenge of determining the shortest path between points in a large network when much of the network is poorly mapped; Korkin’s role was to explore how this new knowledge can be applied to molecular networks.

The team found that the shortest paths in complex networks are not random; rather, they follow fundamental rules of organization. With an understanding of these rules, shortest paths in largely incomplete networks can be accurately predicted, and the missing network data can be recovered. This result may inform the study of networks as diverse as the Internet, social media, and the web of protein-protein interactions that Korkin studies.

“For me,” Korkin says, “the big question is, does biology follow the same principles that we uncovered in this study?” If so, this new tool may help reveal the workings of complex networks in living systems even when our understanding of those systems is full of holes.

Computational Connections

Following the Isoforms

The Power of Collaboration

Filling in the Holes

Other Stories

Fire Safety as a Human Right

Master of Light

Q&A: Get to Know President Grace Wang