Modern statistical displays of data—grids of scatterplots for inspecting correlations, for example—succeed by being transparent and allowing trends in the data to stand out. In contrast, classic data visualizations often succeed, paradoxically, by being a bit opaque: a puzzle that a reader figures out.
Consider the visualization created by information designer Will Burtin in 1951 to summarize the effectiveness of three antibiotics—penicillin, neomycin, and streptomycin—in treating 13 bacteria. Bacterial species are arrayed in a circular layout, with three bars for each bacterial infection representing the amount of each antibiotic needed to treat it. An inversion of the scale means that longer bars represent more effective antibiotics, aligning with a spontaneous interpretation of bigger is better, while shading behind the bars neatly organizes the bacteria into two groups according to whether they result in a positive or negative gram stain test.
In science we are delighted by unexpected brilliance, which we immediately try to systematize. The same goes for visualization: When we see a new and revelatory graph, we want to take it apart and see how it works. Burtin’s sunburst design immediately captures our attention, invoking a curiosity to understand the intention behind its mandala-like form. We feel beckoned to participate in, even celebrate, the scientific discoveries it implies. The circular design also happens to make it more difficult to find the best antibiotic for treating any particular infection, or to perceive any structure in the relationship between the bacteria and the treatments. But for most viewers, these limitations only become apparent later, if they ever do at all.
We can liken this experience to narrative, a lens through which many great (and lesser) works of art have been interpreted. Narrative involves some interplay between plot and perspective, events and interpretation, storyline and characters. Similarly, the practice of science can be viewed as the interplay between data and models. Data are the facts. Models are the characters whose perspectives and assumptions shape what we take away from the story. At the simplest level, the choice of how to visualize data structures the viewer’s experience of those data by promoting certain comparisons over others. It’s a character choice, a choice of model.
Andrew Gelman is a professor of statistics and political science at Columbia University. His most recent book is Regression and Other Stories. Jessica Hullman is an associate professor of computer science and journalism at Northwestern University. Follow her on Twitter at @JessicaHullman.
This understanding of visualization design as a form of model selection, and one that highlights certain comparisons, can be very useful. It helps us reverse-engineer existing visualizations and methods of graphical display. It also helps us develop more effective visualizations, so we can do a better job of telling our stories.
The comparison implied by Burtin’s graphic focuses on a simple question, perhaps unsurprising given the excitement around antibiotics as “wonder drugs” of the time: Which bacteria can they treat? By noting the colors of the longest bars as we scan the perimeter of the circle, we use the graphic to compare antibiotics by their effectiveness. To discover these intended comparisons, a viewer must actively engage in a process of discovery not unlike that of the scientists who produced the data.
Much has been written about how different forms of narrative involve the reader in different ways, from the relatively passive engagement of viewers of a film, to the more active involvement of those following a serial television drama, to the experience of people reading novels who must in a sense create entire movies in their heads.