Skip to content

IMPACT Final Conference – Digitisation challenges & IMPACT Achievements so far

24 October, 2011

No one could say the key objectives of Hildelies Balk-Pennington de Jongh and the IMPACT project are not ambitious! As the shared vision in Europe is that all cultural heritage should become available in digital form in this decade, IMPACT has been working hard for the last four years to overcome the various challenges involved when digising historical material.

With 26 partners across Europe and headed by the National Library of the Netherlands, the IMPACT project was focused on significantly improving mass digitisation of historical printed text by:
– Innovating OCR software and language technology
– Sharing expertise and building capacity across Europe
– Providing facilities for future research and development

The benefits of their work so far is evident in the production of an improved ABBYY FineReader, several tools ready for testing in a productive environment (exceeding the original project expectations), some tools for future development and a Centre of Competence ready to be launched. Language resources include historical lexics for no less than nine languages, which are Dutch, English, German, Czech, Bulgarian, Polish, French, Slovene and Spanish.

Hildelies then gave an overview of the results, showing how they make text digitisation better, faster, cheaper. Some of the key enhancements of the state of the art included:
– Page splitting of images has gone up from 73% accuracy to 94%
– Segmentation improved from 19% to 98% in finding accurate lines
– Recognition of old fonts gives 25% better recognition in FR10 compared to FR9
– Post correction with the Error profiler is 2.7 times faster than without

Hildelies ends with a nice example showing the benefits for the end users (researchers in the humanities / the greater public). This end user is only interested in finding the words he searches for. Preliminary results of OCR combined with a dictionary on difficult 17th century Dutch material indicate already 15% increase of words found. So this means that for 1 million words, 150.000 more words will be found!
From the ambitious objectives at the beginning of the Hildelies, the benefits for all researchers in humanities, academics and the public, and amateur classicists like me, are now shown to be a reality.

View the slides here:

and the video here:

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: