Skip to content

Working in CONCERT – public participation in mass digitisation

7 May, 2010

Next on stage is Niall Anderson from the British Library, talking about public participation in mass digitisation, and “why we think that’s a good idea”.

A 2008 Conference of European Libraries survey estimated that that there were now some 8 million digitised text-based items in existence, proof that we live in an era of effective mass digitisation. However, British Library research has suggested that some 20% of all digital text produced is effectively unreadable due to poor capture, OCR deficiencies, and difficulties native to the source material. If the point of mass digitisation is to enable wider public use of previously inaccessible material, then this shortfall in readable text must be addressed. Niall outlines the history and future of collaborative correction in mass digitisation by demonstrating IMPACT’s CONCERT tool, which will make more digital documents available through crowdsourcing and structured correction of text.

CONCERT (Cooperative Engine for the Correction of Extracted Text) works in three steps: character session, word session and page-level session. Character session presents the user with a list of characters the OCR has characterised as the same letter. The user can then mark characters as “suspicious”. In the next step, theses characters are presented in word context, where the user can again decide if the characters were recognised correctly. In the final step, characters and words that are still marked as suspicious are shown on page-level.

Mark-Oliver Fischer, BSB

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: