AI has gotten something of a bad rap in recent years, but the Covid-19 pandemic illustrates how AI can do a world of good in the race to find a vaccine. AI is playing two important supporting roles in this quest: suggesting components of a vaccine by understanding viral protein structures, and helping medical researchers scour tens of thousands of relevant research papers at an unprecedented pace. Over the last few weeks, teams at the Allen Institute for AI, Google DeepMind, and elsewhere have created AI tools, shared datasets and research results, and shared them freely with the global scientific community.
Oren Etzioni is the CEO of the nonprofit Allen Institute for AI, and a professor of computer science at the University of Washington. Nicole DeCario is Senior Assistant to the CEO at the Allen Institute for AI.
Vaccines imitate an infection, causing the body to produce defensive white-blood cells and antigens. There are three main types of vaccines: whole-pathogen vaccines, like those for the flu or MMR, use killed or weakened pathogens to elicit an immune response; subunit vaccines, (e.g., pertussis, shingles) use only part of the germ, such as a protein; and nucleic acid vaccines inject genetic material of the pathogen into human cells to stimulate an immune response. The latter is the type of vaccine targeting Covid-19 that began trials this week in the United States. AI is useful in accelerating the development of subunit and nucleic acid vaccines.
An essential part of viruses, proteins are made up of a sequence of amino acids that determine its unique 3D shape. Understanding a protein’s structure is essential to understanding how it works. Once the shape is understood, scientists can develop drugs that work with the protein’s unique shape. But it would take longer than the age of the known universe to examine all possible shapes of a protein before finding its unique 3D structure. Enter AI.
In January, Google DeepMind introduced AlphaFold, a cutting-edge system that predicts the 3D structure of a protein based on its genetic sequence. In early March, the system was put to the test on Covid-19. DeepMind released protein structure predictions of several under-studied proteins associated with SARS-CoV-2, the virus that causes Covid-19, to help the research community better understand the virus.
At the same time, researchers from The University of Texas at Austin and the National Institutes of Health used a popular biology technique to create the first 3D atomic scale map of the part of the virus that attaches to and infects human cells—the spike protein. The team responsible for this critical breakthrough had spent years working on other coronaviruses, including SARS-CoV and MERS-CoV. One of the predictions released by AlphaFold provided an accurate prediction for this spike structure.
Another effort at the University of Washington’s Institute for Protein Design also used computer models to develop 3D atomic-scale models of the SARS-CoV-2 spike protein that closely match those discovered in the UT Austin lab. They are now building on this work by creating new proteins to neutralize coronavirus. In theory, these proteins would stick to the spike protein preventing viral particles from infecting healthy cells.
More broadly, scientific research on Covid-19 requires a Herculean effort to keep up with the results emerging from other labs. Learning about work at another lab can save months or even years of work by moving past a blind alley, avoiding re-inventing the wheel, or suggesting a shortcut. Labs report their work via published articles and increasingly via pre-print services like bioRxiv and medRxiv.
Several thousand papers relevant to Covid-19 have appeared in the first three months of 2020, and the scientific literature is growing rapidly. As a result, scientists struggle to find the papers relevant to their specific research, to review the breadth of recent findings, and uncover insights. The first challenge is to collect the relevant literature and put it in a single, accessible location. In response, we at Allen Institute for AI have partnered with several research organizations to produce the Covid-19 Open Research Dataset (CORD-19), a unique resource of over 44,000 scholarly articles about Covid-19, SARS-CoV-2, and related coronaviruses. It’s updated daily as new research is published. This freely available dataset is machine-readable, so researchers can create and apply natural-language processing algorithms, and hopefully accelerate the discovery of a vaccine.
A coalition including the White House, the Chan Zuckerberg Initiative, Georgetown University’s Center for Security and Emerging Technology, Microsoft Research, and the National Library of Medicine of the National Institutes of Health came together to provide this service. In addition, Google’s machine-learning and data science platform Kaggle is hosting the Covid-19 Research Challenge, which aims to provide a broad range of insights about the pandemic, including the natural history; transmission and diagnostics for the virus; lessons from previous epidemiological studies; and more. The research challenge was released on March 16. Within five days it had already garnered over 500,000 views, and been downloaded more than 18,000 times. Recent findings from the research community are curated on a single webpage for quick reference.
The most tantalizing prospect for automated analysis of the scientific literature is that AI will connect the dots between studies to identify hypotheses and suggest experiments, and even treatment, that would otherwise be missed. Literature-based discovery is a class of analysis methods invented by the researcher Don R. Swanson in 1988. His automated system discovered a novel treatment for migraines: magnesium. Work on literature-based discovery has continued, and its potential impact has grown with the introduction of deep-learning-based NLP tools such as SciBert.
In addition to supporting the research community in their efforts to understand the virus and develop treatments, AI has played a vital role in the Covid-19 outbreak since day 1. AI startup Bluedot detected a cluster of unusual pneumonia cases in Wuhan in late December and accurately predicted where the virus might spread. Robots have been reducing human interaction by disinfecting hospital rooms, moving food and supplies, and delivering telehealth consultations. AI is being used to track and map the spread of infection in real time, diagnose infections, predict mortality risk, and more. And the potential for future innovation cannot be overlooked.