Big corpus

UK publishers produce over 180,000 books each year. (About one third are in digital formats.) So that’s a lot of words, even before the outputs of other countries are taken into account, and all the other words generated online — self published, or unpublished — and journal, magazine and newspaper articles.

These large text corpuses are more than big data, but can be treated as such — counted, mined, probed, analysed, compared, correlated and turned into tables, graphs and network diagrams, without the need for anyone to understand any of it.

More precisely, scholars can use computer programs to transform literary content into different formats in order to understand it better — or at least differently. That’s distant reading, as opposed to close reading. The scholar stands back as if from afar and reviews a whole corpus (collection) of works, and combinations of corpuses. It’s less about singular texts, and more about whole collections (e.g. the complete works of William Shakespeare, all nineteenth century English novels, or the Hansard Reports).

Literary theory

Kathryn Schulz in the New York Times (2011) is suitably skeptical about this kind of study. Franco Moretti of the Stanford Literary Lab hopes to find the “unified theory of plot and style” (229), as if gathering data from the natural world. Schulz makes the obvious point that literary data is created “by design,” and not subject to the independent, distant readings science claims to make of natural phenomena. So dispassionate analysis of texts can only get us so far before we have to commit to the meaning of what it is we are reading, or don’t have time to read.

As a trial I ran my last 7 blog posts through the free-to-use voyant-tools.org for analysing corpuses of texts. Here’s some of what it comes up with.

The postings are ordered, so I guess there’s some sense here to the idea of a trend. I look forward to discovering more, but I’m reluctant to commit whole manuscripts to an online text analysis tool. At present I don’t think automated text analysis provides a substitute for reading, or vicarious reading through other readers’ interpretations.

For the interpretive scholar any text operates at a distance anyway. See posts tagged hermeneutics.

Bibliography

Moretti, Franco. 2013. Distant Reading. London: Verso
Schulz, Kathryn. 2011. What is distant reading? The New York Times, (June 24) online.

Notes

Voyant Tools is a web-based reading and analysis environment for digital texts http://voyant-tools.org
UK Book Industry Statistics (2014) are from the Publishers Association.
I’m grateful to Fabio Ciotti for alerting us to the idea of distant reading at the recent Digital Humanities Autumn School at Trier.

Discover more from Reflections on Technology, Media & Culture

Subscribe to get the latest posts sent to your email.

1 Comment

Leave a ReplyCancel reply

Literary theory

Bibliography

Notes

Print or share:

Like this:

Related

Discover more from Reflections on Technology, Media & Culture

1 Comment

Leave a ReplyCancel reply

Discover more from Reflections on Technology, Media & Culture