12: Distant Reading
Published on 2024-02-22
Preparatory Readings:
Table of contents
- Review
- Moretti: The Problem of Studying Literature and new Challenge
- Moretti: On Solving the Problem With Distant Reading
- Ascari’s concerns with “Distant Reading”
- More Concrete Examples of Distant Reading Applications:
Review
Thus far, we have discussed three main categories where philosophers and theorists have pointed to the digital medium as making advances over the old medium.
- access immediacy (Bush)
- interconnectivity and context (Bush, Nelson)
- non-linearity or multi-sequence (Neslon, Landow)
Today, I want to look at a fourth category, that we might call:
- machine accessibility or machine assisted analysis
Here I mean that the text is not merely accessible to human readers, but also accessible in a form that it can be “acted upon” by machines with the result that the machine can assist in new forms of analysis (or is this a type of “reading”?).
We will try to get a glimpse of this approach and its potential benefits by looking at some of the writings by one of the pioneers of a controversial approach to literature, Franco Moretti. Moretti calls this approach “distant reading”.
Moretti: The Problem of Studying Literature and new Challenge
According to Moretti in Graphs Maps and Tress-1 (2003), p. 67, the study of literature faces a problem.
It’s an idea that occurred to me some years ago, when the study of national bibliographies made me realize what a minimal fraction of the literary field we all work on: a canon of two hundred novels, for instance, sounds very large for nineteenth-century Britain (and is much larger than the current one), but is still less than one per cent of the novels that were actually published: twenty thousand, thirty, more, no one really knows—and close reading won’t help here, a novel a day every day of the year would take a century or so . . . And then, a field this large cannot be understood by stitching together separate bits of knowledge about individual cases, because it isn’t a sum of individual cases: it’s a collective system, that should be grasped as such, as a whole…” (Moretti, 2003, p. 67-68)
How does this concern resurface in his essay, “Conjectures on World Literature”?
What is world literature?
What does it aim to study?
What problems or concerns does it raise about national literature?
Here are couple of passages that I found which speak to the need for machine assisted analysis.
“But I actually think that it’s our greatest chance, because the sheer enormity of the task makes it clear that world literature cannot be literature, bigger; what we are already doing, just more of it. It has to be different. The categories have to be different. ‘It is not the “actual” interconnection of “things”’, Max Weber wrote, ‘but the conceptual interconnection of problems which define the scope of the various sciences. A new “science” emerges where a new problem is pursued by a new method.’ That’s the point: world literature is not an object, it’s a problem, and a problem that asks for a new critical method: and no one has ever found a method by just reading more texts. That’s not how theories come into being; they need a leap, a wager-a hypothesis, to get started.” (Moretti, “Conjectures”, 2000, p. 55)
The idea is that the study of “world literature” is a problem, which cannot be solved by “more reading” within the normal paradigm. The scale is too big for any one reader to take in. Machine assisted analysis is a potential answer to this problem precisely because it scales to dimensions no human reader could every reach. Thus it has the potential to offer us unique glimpses at a terrain that previously remained hidden.
And again in the “Slaughterhouse of literature” (2000), Moretti says:
“The history of the world is the slaughterhouse of the world, reads a famous Hegelian aphorism; and of literature. The majority of books disappear forever—and “majority” actually misses the point: if we set today’s canon of nineteenth-century British novels at two hundred titles (which is a very high figure), they would still be only about 0.5 percent of all published novels.” (Moretti, “Slaughterhouse”, p. 207)
Moretti: On Solving the Problem With Distant Reading
What is “distant reading”? Wikipedia gives us a pretty succinct answer:
“The term “distant reading” is generally attributed to Franco Moretti and his 2000 article, Conjectures on World Literature.[1] In the article, Moretti proposed a mode of reading which included works outside of established literary canons, which he variously termed “the great unread”[2] and, elsewhere, “the Slaughterhouse of Literature”.[3] The innovation it proposed, as far as literary studies was concerned, was that the method employed samples, statistics, paratexts, and other features not often considered within the ambit of literary analysis. Moretti also established a direct opposition to the theory and methods of close reading: “One thing for sure: it cannot mean the very close reading of very few texts—secularized theology, really (‘canon’!)—that has radiated from the cheerful town of New Haven over the whole field of literary studies”.[4]” (https://en.wikipedia.org/wiki/Distant_reading)
So what does Distant Reading Look like in practice? Let’s look at Moretti’s “First Experiment” (“Slaughterhouse”, p. 212)
Why does he start with detective stories? What ‘specific device’ of exceptional visibility do they contain? (“Slaughterhouse”, p. 212)
Clues. (Remember Moretti is look for “forms” in literature that can be abstracted from individual stories.) Clues are a visible feature of “form”. Using a very visible “formal feature” he thinks it will be easier to assess what “distant reading” can tells about a genre’s form. (A “form” also seems to be something that a machine could detect without necessarily needed to understand semantics.)
By focusing on clues he develops a kind of schematic with which to organize detective stories.
He suggests that the “tree” represents a new way of approaching literature. Consider the following quotation:
“The branches are the result of the twists and turns of a device, of a unit much smaller than the text. Conversely, the branches are also part of something much larger than any text, which is the genre: the tree of detective fiction. Devices and genres: two formal units. A very small formal unit and a very large one: these are the forces behind this figure—and behind literary history. Not texts. Texts are real objects— but not objects of knowledge. If we want to explain the laws of literary history, we must move to a formal plane that lies beyond them: below or above; the device, or the genre.” (“Slaughterhouse”, p. 217)
What does he conclude from his analysis of the “form” of clues across many different stories, abstracted from the stories themselves?
It seems that he thinks he has found a reason to explain why some detective stories have survived and joined the “canon” while others have not.
He also thinks he has found an interesting anomaly. Namely that some stories by Arthur Conan Doyle remained popular despite lacking the formal features needed for success. He thinks Doyle is stumbling through his writing, working by trial and error, not always conscious of the formal features required for success.
Moretti writes:
“He [Doyle] finds the epoch-making device but does not work it out? It didn’t make sense; the tree had to be wrong. But the tree was right — in the forty-odd stories Conan Doyle wrote after the Adventures, one finds exactly the same oscillations— and it actually highlighted an important Darwinian feature of literary history: in times of morphological change, like the 1890s for detective fiction, the individual writer behaves exactly like the genre as a whole: tentatively. During a paradigm shift no one knows what will work and what won’t; not Ashdown, not Pirkis, and not Conan Doyle; he proceeds by trial and error, making fewer errors early on, when the problems are simpler—and more errors later, when they are more complex. It makes perfect sense. (“Slaughter House of Literature”, p. 215)
Ascari’s concerns with “Distant Reading”
As I read Acari’s critique, the central thrust of his specific concern parallels a common critique of uses of big data in general (topic modelling included).
The concern is the by speaking of “data” rather than “literature”, one implicitly begins to feel that they are speaking of something suddenly objective, something pure, something free from interpretation and bias.
But Ascari is eager to point out that data in literature is never complete or exhaustive. Thus it is always already in someway selected and curated. Accordingly, it is in some always already interpreted or biased.
Specifically, Ascari suggest that Moretti uses the conclusions of S.S. Van Dine’s in 1928 about what makes good detective fiction (namely, “functional clues” that the reader can use to solve the mystery). Moretti then organizes the data around this assumption. Specifically, he goes looking for this “form” within the detective stories of the 19th century. The result of his search becomes his objective data.
But is this data really so “objective”? Or does it instead reflect what Moretti was already looking for as a result of the aesthetic tastes of early 20th century readers and critics?
Form this collection of data, Moretti then begins drawing conclusion.
When a text doesn’t contain this formal feature and is not popular enough survive, Moretti concludes that it is the absence of this form that explains why it was not popular.
When the story include such clues and is popular, this confirms his presupposition, and he concludes this is why they have survived.
And in the case of the Adventures of Sherlock Homes, when a popular author like Doyle does not include such clues in his story, Moretti does view this as a check on his original assumption or the objectivity of his data collection. On the contrary, Moretti concludes, as Ascari says, this is a stumble in need of explanation.
Ascari, in contrast, wonders if he has only found the “appearance of a problem” based on his “distant analysis” of data (formal features) that have been selected based on the biases of the ascetic taste of early 20th century readers of detective stories.
Perhaps Doyle didn’t “stumble”, but purposely did not include functional clues because he was trying to do something else. Ascari worries that Moretti’s approach unjustifiably rejects this possibility.
Ascari says something to this effect on p. 7:
Moretti’s view of detective fiction uncritically rests on the normative and restrictive attitude that golden age theorists advertised as objective, adopting a self-validating positivistic stance. Van Dine and Knox, however, like Richard Austin Freeman, Dorothy Sayers, and others, were first and foremost creative writers whose theoretical and historical constructs reflect their own practice as authors of clue puzzles. The rules to which they subscribed are no longer regarded as timelessly valid but rather as culturally situated, for in the course of the twentieth century, detective fiction deeply changed in terms of both creative writing and theoretical/historical constructs.” (Ascari, p. 7)
More Concrete Examples of Distant Reading Applications:
Google Ngram Viewer
Another simple example of distant reading is the Google Ngram Viewer
In this application you can explore all google books “from a distance”, namely by looking for frequency trends in different the corpus of google books broken out by time.
See https://books.google.com/ngrams/info for more info and tips on advanced use.
Try out the application yourself. Try searching different words or phrases. Try comparing those words or against other phrases.
What kinds of searches did you try?
Why kinds of results did you see?
Do you see any trends that were interesting or potentially interesting? Tell us why.
Topic Modelling.
While Moretti digs into a specific problem aided by distant reading, he doesn’t give us a very clear picture of the specific technical methods often used in “distant reading”.
While there are many methods of computer assisted analysis, “Topic Modelling” is a very common one.
Here are a few bog posts that describe the process and its application in the humanities:
- the-lda-buffet-is-now-open-or-latent-dirichlet-allocation-for-english-majors/
- topic-modeling-made-just-simple-enough/
Here’s one quick description of what it does:
“If a computer were shown many academic articles, it might discover that articles containing the word reading frequently include others like interpretation, criticism, discourse. Without foreknowledge of these words’ meanings, it could statistically learn that there is a useful relationship between them. In turn, the computer would be able to identify articles in which this cluster of words seems to be prominent, corresponding to humanist methods. This process is popularly referred to as topic modeling, since it attempts to capture a list of many topics (that is, statistical word clusters) that would describe a given set of texts.” topic-modeling-what-humanists-actually-do-it-guest-post-teddy-roland-university (link no longer live)
During our class time, we will look at some live examples together.