Skip to main content
FreeBook Review

Distant Horizons: Digital Evidence and Literary Change. Ted Underwood. Chicago: University of Chicago Press, 2019. Pp. xxii+206.

Ted Underwood’s Distant Horizons: Digital Evidence and Literary Change is an insightful account of the applications of computational methods in literary studies and a striking demonstration of the labor that goes into making them work. The book digests and draws implications from the more technical studies that Underwood has conducted during the past decade with collaborators in literature and computer science, putting them together into a coherent account of an analytical approach with which most humanists are only generally familiar. Along the way, it also offers thought-provoking accounts of periodization, genre, theme, and gender in modern English literature based on Underwood’s computational research. The book’s main goal is to explore the approach to literature that Franco Moretti calls “distant reading”; thus, its unifying claims relate to method rather than to the history of English literature per se.

Distinct as they are, computational methods, Underwood argues, neither contradict nor supersede conventional literary critical approaches. “Distant reading is simply a new scale of description,” writes Underwood. “It doesn’t conflict with close reading any more than an anatomical diagram of your hand would conflict with the chemical reactions going on inside your cells” (xvii). And Underwood works hard to meet traditionally trained literary critics on their own ground, engaging recognized arguments in literary history as hypotheses to be tested computationally. In his study of the linguistic markers of gender in nineteenth-century literature, for example, Underwood finds confirmation for Nancy Armstrong’s “well-known thesis that subjectivity was to begin with ‘a female domain’ in the novel” (122). Underwood makes a similar connection in his computational study of the relationship between elite and popular writing, engaging Andreas Huyssen’s argument for a “great divide” during the modern period (71). In contrast to Huyssen, Underwood finds the markers of elite and popular writing to be generally stable during the nineteenth century; instead of a “great divide,” then, he postulates “a gradual expansion of the reading audience, leading to a slow diversification of the market,” similar to what Richard Altick described in his classic 1957 study, The English Common Reader (72). On issues of genre and literary judgment, Underwood makes parallel gestures.

But, as Underwood knows well, advances such as his have not often been reciprocated by traditionally trained literary critics, among whom numbers remain a hard sell. This presents a challenge. “Can distant readers write quantitative literary history that is nevertheless detailed enough, streamlined enough, and lively enough to interest a wide range of readers?” Underwood asks. “If we can’t, then no argument will save us: what we are doing may be important, but it will belong in the social sciences. I hope to show that numbers can also be at home in the humanities. But I cannot prove that in advance. I can only aspire to demonstrate it by writing a book that uses statistical models to tell a suspenseful story of broad human interest” (xxii).

Does Underwood pass his own test? In one respect, clearly, yes: Distant Horizons is a suspenseful story of broad interest. But, as already noted, the main story of the book is less about literature than it is about the methods of distant reading and the potentially “vertiginous” insights that they may offer (166). Underwood’s readings of his data are invariably excellent and offer a superb model for this kind of work. In this regard, it is interesting how rich a feel for the (distant) text they convey. The details that Underwood’s models pick up often place us right in the warp of the texts themselves. From his computational analysis, for example, Underwood discovers that in nineteenth-century fiction, “Women smile and laugh, but midcentury men, apparently, can only grin and chuckle” (124). Meanwhile, one of the surest markers of masculine identity in modern literature is the possession of a pocket: “In the twentieth century,” Underwood observes of fictional men, “they are constantly putting things in it” (125).

Does Underwood hit the mark in his literary readings? In most ways, yes. These too are nuanced, and his engagement with older studies such as those previously mentioned is often ingenious. Yet, by design, his literary readings principally function as demonstrations rather than applications of method in service of a larger literary historical quarry, and the most interesting of his discoveries, as he himself points out, tend not to come from direct arguments with earlier scholars but from the effort to model the old problems in this different analytical idiom.

For a book sitting atop so much technical and computational labor, Underwood’s prose is remarkably accessible, modeling the kind of conversation that computationally minded humanists ought to be having with their home-department colleagues. Readers in the humanities who want to push their limits a little bit can follow the trail of crumbs that Underwood leaves in his appendices on data and methodology and then follow his references to the collaborative work he has published in journals with computer scientists such as David Bamman to see how well Underwood has fitted this work to the framework of the humanities.

These earlier studies are also worth seeking out to see, for example, Judith Butler’s arguments about performative gender explored in a computational linguistic context. Returning here in a more humanist-friendly guise, those arguments are, to me, the most successful and flexible experiments in the book, and the most fully engaged with contemporary criticism. Ironically, some of Underwood’s best insights on these subjects—for example, his speculations on the decline in the percentage of women novelists over the course of the nineteenth century—derive from his wide reading of literature and literary history rather than directly from his computational work.

Underwood’s computational examples are a pleasure to read. His analogy to detective fiction is not idle: he does a great job showing what a clue looks like to a distant reader. His diagrams are simple to interpret, and his main terminology (logistic regression, predictive modeling, correlation coefficient, etc.) is well explicated. Exceptions to this are terms from probability (empirical Bayes, naïve Bayes) and statistical modeling (SVM), which are discussed too briefly in an appendix (101). For a book that dings humanists for treating contemporary quantitative social scientists as just rebooted versions of “nineteenth-century determinists” (186), it seems only fair to include a page or two of crib notes on these subjects.

What are Underwood’s main arguments about literary history?

1.  The well-known historical transition from “telling” to “showing” in modern literature is part of a longer and broader process by which the language of fiction was differentiated from that of nonfiction.

2.  None of the currently favored models of genre change, including those of “generational succession” and “gradual consolidation,” offers a sufficiently complex explanation to apply across genres (40).

3.  Contrary to “the story of rapid generational reversal told in our textbooks and anthologies,” modern literature seems to move slowly in the direction of “prestigious examples” (xvi, 72).

4.  “The implicit gendering of character [in literature] grows steadily blurrier from 1840 to the present” (xvii).

This is an ambitious set of claims. Each is, moreover, presented through finely crafted readings of data, and, yes, also literature. As I am a historian, not a literary scholar, some of Underwood’s experiments interested me more than others. To me, for example, Underwood’s account of representations of gender is intrinsically more interesting than his account of the boundaries of literary genres and periods, subjects about which he himself seems to sometimes doubt the urgency (106–7). Around the question of gender, aspects of the study design seem to me particularly clever. There, Underwood employs what he calls a “perspectival model,” the ultimate aim of which is not to distinguish between representations of masculinity and femininity, but rather to determine the relative strength or weakness of gender dimorphism over time and from different perspectives such as those of male and female writers (67).

Also very interesting from the point of view of method is Underwood’s examination of the changing relationship between language in fiction and nonfiction during the modern period. Here, Underwood builds on work by Stanford graduate students Ryan Heuser and Long Le-Khac to examine changing frequencies among the 1,100 most common words in 3,281 works of biography and fiction drawn from the HathiTrust Digital Library and ECCO-TCP (Eighteenth-Century Collections Online-Text Creation Partnership) covering the period 1700–2000. The original study by Heuser and Le-Khac upon which Underwood builds was smaller and more focused than his own, concerning only nineteenth-century English novels. In these, the authors found an interesting trend: over the course of the century, novels employed concrete terms in an ever-higher proportion relative to abstractions.

In itself, that claim, or something close to it, has a venerable history in literary criticism, resonating for example with Percy Lubbock’s Craft of Fiction (1921) and F. R. Leavis’s The Great Tradition (1950). Yet the thesis of Heuser and Le-Khac, Underwood argues, sounds “more familiar than it is” (13). In the older critical literature, this transition was “traced through … examples: writers like Flaubert and James, Conrad and Woolf, who experiment with point of view. By shifting attention from narrative perspective to a looser index of physical description, Heuser and Le-Khac profoundly change the meaning of their theme…. [T]hey also create a broader, longer story: the transformation they are tracing is already well under way in 1800” (13). That last claim is worth dwelling on because it is certainly feels like something that might inspire a literary critic—perhaps a specialist in Flaubert or James—to respond, even if only to object. But the fact that such questions arise from Underwood’s book shows that it (like Heuser and Le-Khac’s pamphlet) is landing exactly where the computational humanities should be—not on the distant horizon of humanities work but right in the mix.

Importantly, Underwood doesn’t stop there. And what he does with the work of Heuser and Le-Khac offers a terrific example of what is so surprising about the computational approach for humanists raised in the traditional monastic arts. Intrigued by Heuser and Le-Khac’s observation about concrete and abstract in nineteenth-century fiction, Underwood employs additional corpora to extend their chronology by two centuries and to add biographical nonfiction as a comparator genre, greatly expanding the purchase of their observation and giving it a new dimension. One result of this maneuver is an impressively broad claim: “Many well-known changes in eighteenth-, nineteenth-, and twentieth-century fiction can be understood as parts of a single differentiating process that defined the subject, style, and pace of fiction through opposition to nonfiction” (xiii). Another nontrivial result is a freshly minted data set that, following the practices of the natural and social sciences, Underwood makes available for other researchers to use, both to critique and to extend his own work.

Of course, most literature professors do not as of now have the skills to do much with Underwood’s data, and most social and natural scientists have neither the interest nor the expertise to make heads or tails of Flaubert. But that’s precisely the point: if Underwood is right that computational methods can produce insights that humanists will want to engage with, then the potential for the computational approach is limitless. The fundamental question, says Underwood, is not whether literary scholars need advanced degrees in computer science—he sets the bar for initial entry at one semester of statistics—it is whether literary and other humanities scholars can learn enough about computational approaches to join in the conversation, and then, as an extension, whether that conversation can inspire new formations of training and of professional expectations to foster further work in the field (163). Here, I think Underwood’s hunch is right: the key is not demystifying statistics and programming, as important as that is; it is framing computational arguments in ways that bring their utility home to the humanities.

Underwood emphasizes that at its conceptual root his approach has little to do with computers or technology. Important early computational projects in the humanities were carried out with technology no more advanced than index cards (106). Moreover, the kinds of aggregate problems that Underwood takes up have deep roots in literary studies. As Underwood notes, already in the 1960s, Raymond Williams grappled with a longue durée in literature, and in the 1980s Janice Radway structured her trailblazing reader-response research as a “quantitative experiment” (164). For these very good reasons, Underwood does not label his approach with terms such as “digital humanities” and “big data,” which foreground technologies rather than methods. It is of course true that computer technology makes certain kinds of quantitative analysis remarkably simple. Witness, for example, the Voyant Tools website, or, the more difficult but remarkably sophisticated graphical user interface for the MALLET topic modeling tool. Underwood’s point is that for this meeting of methods to matter, the question cannot just be, can humanists operate the tools; it must be, will humanists come to see the need for them?

On this main point, I agree strongly, though I disagree with some of the further implications that Underwood draws from it. As already noted, Underwood puts the burden of persuasion in these matters squarely on his own shoulders and on those of his other numerate colleagues. Generous a perspective as this is, I think it is wrong on three counts. First, while Underwood is uniquely well positioned to make the argument, if he is right, the burden of openness falls equally upon everyone, just as did the burdens of understanding structuralism, poststructuralism, or postcolonial theory, which, for the record, were no picnic either. Second, I find the distinction that Underwood draws between the humanities and the social sciences to be self-defeating both from an epistemological and an institutional point of view. If Underwood’s theoretical claims are right, the line between humanities and social sciences is precisely what ought to be in question, not the placement of the work in one field or the other. Finally, if Underwood is right about the implications of his own work, then his worry should be less for the fate of distant reading but for that of the humanistic fields that ignore it. In my view, the response of literary studies—and of the humanities more broadly—to computational approaches is likely to be a bellwether for the field as a whole, not because the humanities need computation per se, but because humanities scholars need to hack the old disciplines in a multitude of ways to create a new toolkit responsive to our changing representational times.

I might also say in passing that I find some elements of Underwood’s account of recent literary criticism unpersuasive. This is not the main concern of Underwood’s book, and, as someone out of field, I am ready to be corrected. Still, I find Underwood’s claim that “for the last sixty or seventy years, we have assumed that literary history can only be interesting and edifying insofar as it is a story about conflict” to be overstated (106–7). One difficulty here is that Underwood’s literary critical references, excepting those which deal specifically with computational questions, are not mostly of very recent vintage. Of course, Underwood is right that plenty of ink has been spilled in debates over period and genre; indeed, he too continues to spill ink on these matters. But—and here Underwood’s citations of Raymond Williams and Janice Radway are apposite—to say that “we” have treated such conflicts as “the whole story of literature” seems wrongly dismissive of scholars and approaches among whom he might find natural allies. I certainly find it hard to square with my own understanding of theoretical approaches such as structuralism, poststructuralism, and hermeneutics. I also find Underwood’s dismissal of the problem of the relationship between macro- and microanalysis to be a missed opportunity (64). So much of what is interesting about his approach arises precisely from the friction between perspectives generated at macro- and microscales. Rather than dismiss that friction or attempt to resolve it into coherence, I would encourage Underwood to engage it head on. It would make a great next book.

Much more persuasive is Underwood’s argument that, as a science of uncertainty, statistics lends itself particularly well to the sorts of critical concerns that have shaped recent literary criticism (186). In this, his work contrasting the perspectives of male and female writers on gender is exemplary. Ironically, “the strength of this method comes from something that might appear to be its weakness” (36). “As Benjamin Schmidt has expressed this: the goal is not to construct an unbiased sample but to understand each ‘source through its biases’” (178).

Among the great virtues of Underwood’s book is its demonstration that, in many—or perhaps most—applications germane to the humanities, both the algorithms run in predictive models and the data upon which they operate can themselves be productively read and understood by critics with a traditional literary background. Much of Underwood’s book is devoted to modeling—in that other sense—just these kinds of readings, and I think that scholars unfamiliar with distant reading will find his demonstrations surprisingly approachable. As Underwood shows, “the models created by machine learning are not mysterious black boxes: it is quite possible to crack them open and ask how they work” (49). The fact that learning to read these models may also provide insight into technologies such as recommendation machines and search engines is of not insubstantial benefit both from the point of view of our obligations as citizens and from that of humanists making the case for our disciplines. If humanities classes end up offering our students tools that can be applied to understanding Google, Facebook, and Amazon, all the better.

It is remarkable to think that it has already been two decades since Franco Moretti published “The Slaughterhouse of Literature” in Modern Language Quarterly (2000). There, he envisioned a computational approach to literature capable of scanning downward and outward through an archive of texts outside the literary canon, thus revealing “a history very different from the one preserved in academic canons” (11). What has happened in the meantime? Not as much as one might have hoped or expected. That all these years later, Underwood still perceives the need to frame his own work as a defense and/or introduction to the approach tells us what we need to know in that respect: as a movement, distant reading—and in particular Underwood’s variety of predictive modeling—has moved only so far. On the other hand, Underwood’s book itself is persuasive evidence of the insight of which the best practitioners in the field are capable. At the start of the twenty-first century, when Moretti published his article, there was virtually no one in literary studies, Moretti included, capable of carrying out that work. As Underwood’s book shows definitively, both in its own analysis and in its network of references, the field has by now made it there and further.

As a case in point, Underwood offers what turns out, ironically, to be a devastating, statistically grounded refutation of Moretti’s original hypothesis. Moretti proved to be correct about the potential applicability of statistical methods to the great neglected archive of noncanonical works of fiction and poetry, and, for two decades, the Literary Lab that he founded at Stanford University has excelled in demonstrating the insights that such an approach can offer. One such insight offered by Underwood is that there never was a “slaughterhouse of literature,” at least not of the sort that Moretti imagined. Yes, many books have been forgotten, but, over the course of modern literary history, Underwood’s computational model shows “prominent and obscure writers are traveling together in the same direction, albeit at slightly (and interestingly) different speeds” (11). Among the savvy demonstrations of the value of computational methods that Underwood offers, this critique is particularly intriguing. One measure of a good literary analytical method is its capacity to reveal its own errors and blind spots. In this regard, in Underwood’s hands, computational modeling measures up.