Skip to content

IMPACT Final Conference – Keynote: OCR and the transformation of the Humanities

25 October, 2011
Gregory Crane from Tufts University

Gregory Crane from Tufts University

Gregory Crane (Tufts University) introduced day 2 with a presentation on the significance of OCR in the Humanities. In particular, Crane listed 3 basic changes:

1. The transformation of the scale of questions in terms of breadth and depth;

2. The rise of student researchers and citizen scholars: these figures play a critical role as professionals alone can no longer tackle the large amount of data out there;

3. The globalisation of cultural heritage: dealing with global activity and cultural heritage has to be, as the word suggests, a global effort as Europe and North America’s expertise is no longer enough.

Gregory Crane

Gregory Crane gives Keynote - OCR & the transformation of the humanities

Crane then moved on to describing dynamic variorum editions as one of OCR’s greatest challenges. How do we create self-organising collections? Crane stressed that even with the all possible crowd-sourcing, we still need to process data as automatically as possible. The ‘New’ Variorum Shakespeare series (140 years old) is a good example of this. After all, the winning system is the most useful, not the smartest!

A Classicist by origin, Crane then shifted his focus to the Graeco-Roman world and illustrated the problems ancient languages such as Latin and Ancient Greek pose to OCR technology. What do we do with 2000+  years of Latin? What do you do with dirty OCR? However bad, Crane explained, OCR helped Tufts detect how many of the 25,000 Latin books selected were actually in Latin. Unsurprisingly, OCR analysis revealed that many of these were actually Greek. Crane’s following statement was self explanatory: “OCR often tells us more than metadata can”. Ancient languages such as Classical Greek, Crane continued, can cause numerous problems to OCR technology, as we often encounter polysemy, ambiguity, and changes in terms. So how do we deal with a cultural heritage language? The key, Crane claimed, is to have multiple open-source OCR engines in order to produce better results.

Finally, Crane explained that we are not just producing OCR data, we are changing connections around the world, enabling a transformation of the humanities and the way in which the world as a whole relates to its cultural heritage.

View the presentation here:

and the video here:

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: