By 10 pm EST on election night—when Florida moved decisively away from the hands of the Biden–Harris ticket—the venomous backlash against the pollsters had already begun. An especially large amount of fury was directed toward Nate Silver, America’s former forecaster laureate, who had mostly repaired his image since the infamous 2016 presidential election misfire and had millions of people hanging on his every word. As election season 2020 rolled around, Silver’s predictive model had reestablished itself as the standard.
The work of poll wonks like Silver, and the idea of predictions in elections, has been criticized for valid reasons, some offering insightful takes as to why they got things wrong in 2020. But in the midst of this backlash, the professional side of me—that builds mathematical models of epidemics like Covid-19—couldn’t help but empathize with Silver and his kind. There are many parallels between society’s response to seemingly errant predictive models for elections and the ones about the trajectory of infectious diseases. And in discussing features of each, we can learn why the forecasters aren’t to blame for our disappointment.
Most forecasting models of epidemics are mechanistic. In March, very early in the pandemic, mathematical epidemiologist Neil Ferguson and colleagues at Imperial College London developed a model that offered dire predictions for the number of individuals who might be infected and die in the US and UK (around 2 million deaths). Another model, developed by the Institute for Health Metrics and Evaluation at the University of Washington, was the center of controversy after it changed its predictions to suggest that the US is closer to the peak than we realized in many places.
These are but two of the many Covid-19 forecasts based on a presumed understanding of how the epidemic actually works. The scientists construct a version of the world, encoded in equations and bits and colored by details underlying how infectious the virus is, how people are interacting with each other, and other variables.
Many popular pollsters’ algorithms—like the one used by Silver at FiveThirtyEight—are based on an array of opinion surveys of likely voters. Their overall probabilities are based on an aggregation of these polls, which are weighted by quality, sample size, and other features. After the 2016 debacle, election forecasters were more vigilant about correcting for education status of the voters, which helped to explain some of the discrepancies in 2016. Some methods, like the one used by The Economist, use a combination of polling data and economic factors to make predictions.
Models of epidemics are often imprecise, because they take on the unrealistic burden of trying to capture all of the complexity of an epidemic, which is impossible. No computer could tabulate all the meaningful detail underlying infections on vacation cruises, superspreading events during choir practice, or maskless politicians during a Rose Garden ceremony. Math and computers may be able to capture subtle features of any one of these events, but the most popular models of Covid-19 are supposed to tell us something about how an epidemic plays out in aggregate, for millions of people, in different settings. These are often ones that we use in policy discussions.
The accuracy of predictive models of elections are similarly undermined by the vagaries of human behavior, social structure, and other stuff that we just don’t understand.
We might account for the voting trends of individuals of Latinx descent but undervalue the large differences (including political preferences) between Afro-Latinos in the Bronx, Cuban Americans in Miami, and Mexican Americans in El Paso.
We might weigh an election forecasting model based on what we think are rural, white voters in the rust belt but not account for voters who don’t bother participating in polls.