//
post
Research related

Big corpus

UK publishers produce over 180,000 books each year. (About one third are in digital formats.) So that’s a lot of words, even before the outputs of other countries are taken into account, and all the other words generated online — self published, or unpublished — and journal, magazine and newspaper articles.

These large text corpuses are more than big data, but can be treated as such — counted, mined, probed, analysed, compared, correlated and turned into tables, graphs and network diagrams, without the need for anyone to understand any of it.

More precisely, scholars can use computer programs to transform literary content into different formats in order to understand it better — or at least differently. That’s distant reading, as opposed to close reading. The scholar stands back as if from afar and reviews a whole corpus (collection) of works, and combinations of corpuses. It’s less about singular texts, and more about whole collections (e.g. the complete works of William Shakespeare, all nineteenth century English novels, or the Hansard Reports).

Literary theory

Kathryn Schulz in the New York Times (2011) is suitably skeptical about this kind of study. Franco Moretti of the Stanford Literary Lab hopes to find the “unified theory of plot and style” (229), as if gathering data from the natural world. Schulz makes the obvious point that literary data is created “by design,” and not subject to the independent, distant readings science claims to make of natural phenomena. So dispassionate analysis of texts can only get us so far before we have to commit to the meaning of what it is we are reading, or don’t have time to read.

As a trial I ran my last 7 blog posts through the free-to-use voyant-tools.org for analysing corpuses of texts. Here’s some of what it comes up with.

Screen Shot 2015-10-03 at 12.38.35

Screen Shot 2015-10-03 at 12.32.06

The postings are ordered, so I guess there’s some sense here to the idea of a trend. I look forward to discovering more, but I’m reluctant to commit whole manuscripts to an online text analysis tool. At present I don’t think automated text analysis provides a substitute for reading, or vicarious reading through other readers’ interpretations.

For the interpretive scholar any text operates at a distance anyway. See posts tagged hermeneutics.

Bibliography

  • Moretti, Franco. 2013. Distant Reading. London: Verso
  • Schulz, Kathryn. 2011. What is distant reading? The New York Times, (June 24) online.

Notes

About Richard Coyne

The cultural, social and spatial implications of computers and pervasive digital media spark my interest ... enjoy architecture, writing, designing, philosophy, coding and media mashups.

Discussion

Trackbacks/Pingbacks

  1. Pingback: How to make a case out of a post | Reflections on Technology, Media & Culture - November 14, 2015

What do you think?

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

University of Edinburgh logo

Richard on Facebook


Or "like" my Facebook
page for blog updates.

Try a one year research degree

Wooden specimens of Platonic solids in the background

AHRC/EPSRC/ESRC/MRC project

book cover
book cover

Enter your email address to follow this blog and receive notifications of new posts by email.

Join 283 other followers

Site traffic

  • 157,612 page views

%d bloggers like this: