This blog has now been frozen. Comments have been disabled and we do not intend to publish further posts. We have published the following statistics for future reference. They are intended to inform others about the lifecycle of the blog and assist people wishing to reuse resources by identifying the authors of articles etc.
Active Dates: From 10 December 2009 to 31 December 2011
Number of posts: 117
Number of comments: 16
Akismet statistics: 1750 spams caught and an overall accuracy rate of 100%.
Details of contributors: The IMPACT project (used as a generic log-in for IMPACT staff, impacteib, mariekeguy, Nora Daly, Greta Franzini, simonaitken
Categories used: admin, Bratislava (May 2010), British Library, conference, Demo Day, Deutsch (German), English, Final Conference 2011, hackday, Munich (March 2010), Munich (October 2011), myGrid – Taverna Hackathon, Nederlands (Dutch), Rouen (March 2011), taverna, The Hague (Feb 2011)
Details of blog theme: Vigilance with 4 Widgets
Details of type and version of software used: This blog was run on the free hosted version of WordPress at www.wordpress.com
Blog licence: All items on the blog are copyright of the IMPACT project and unless otherwise stated have been released under the Creative Commons License: Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License
Es hat dann doch etwas länger gedauert, aber jetzt sind alle Vorträge unserer Doppel-Veranstaltung “Historische Dokumente auf dem Weg zum digitalen Volltext” (11. – 12. Oktober 2011) und in die entsprechenden Blog-Artikel eingebunden.
Wie gehabt finden sich alle Informationen zum ersten Tag, dem “IMPACT Demo Day”, hier auf dem IMPACT-Blog, während Sie sich alles Wissenswerte zum zweiten Tag, den “Erfahrungen aus der Digitalisierungspraxis”, auf dem Blog des Münchener DigitalisierungsZentrums zu Gemüte führen können.
Viel Vergnügen beim Ansehen!
It took us a bit longer than expected, but all videos of our dual event “Turning Historical Documents into Digital Full Texts” (11 – 12 October 2011) are now online and embedded into the relevant blog posts.
For the firstday, you’ll find them here on the IMPACT blog. For the second day, please visit the blog of the Munich DigitiZation Center.
Have fun watching!
Mark-Oliver Fischer (BSB)
In the afternoon, after everyone had worked through the 3 group tasks in the practical session: ‘Workflow Development in Digitisation’, we returned to hear from the Taverna Manager – Shoaib Sufi.
Shoib gave an interesting talk about where he sees Taverna going in the next few years and the further development of Taverna 3, including some of the projects that they hope to work with.
Clemens Neudecker leads a session on using Taverna Server as a portal, using IMPACT workflows to demonstrate the functionality.
This was followed by Rob Haines from myGrid who gave more examples of Taverna Server Interfaces.
This practical session started with the attendees introducing themselves and splitting up into 3 groups, so that each could work on a different set of tasks based on a Case Study:
A collection holder wants to reduce storage costs for his collections that
are currently available as TIFF master files. She/he heard that JPEG2000 is
a good candidate for storing digital master files, and she/he heard about
the efficiency of image compression when using lossy compression.
She/he knows that JPEG2000 compression can be “visually lossless”, so that
the compression is reversible, but she/he is still concerned about the
impact the JPEG2000 compression could have on OCR.
We suggest a Taverna workflow that creates an executable processing pipeline
for studying the results.
The workflow should have 1 TIFF image as input and a list of increasing
compression parameters which are used when encoding the image. The image
should then be decompressed before applying the OCR. Finally, the impact
of the compression on the OCR should be measured by comparing the original
OCR output to the OCR output of the compressed images.
The Three Groups:
Use the toolwrapper for providing access to a JPEG2000 encoding/decoding tool:
Use Taverna for creating the workflow:
Use a Taverna beanshell for creating the Text comparison
- commons-lang-2.4.jar (/home/<youruser>/.taverna-home/lib/commons-lang-2.4.jar)
The selection of groups has shown a definite preference for the more ‘user’ based tasks rather than ‘developer’ tasks, with 12 working on Group 1, 6 on Group 2 and only 3 on Group3. However, quite a few attendees seemed happy to be involved in more than one group, or work in one, but support users in another.
General feeling is that this bodes well for tomorrow which has a more ‘practical’ based timetable.
Full details of this workshop are available through the workshop wiki at:
The day started with an introduction to IMPACT from Clemens Neudecker:
and then an introduction to Taverna from Katy Wolstencroft:
These are also embedded within the blogs on this site.
This post contains direct links to all posts made at the Final Conference. Please do feel free to add comments or thoughts below the posts.
Monday 24 October 2011
BLOCK 1: OPERATIONAL CONTEXT
- Welcome and Opening of the Conference by Hildelies Balk-Pennington de Jongh (IMPACT Project Director, KB National library of the Netherlands)
- Strategic Digital Overview by Richard Boulderstone (Director, e-Strategy and Information Systems, The British Library)
- Digitisation challenges & IMPACT achievements so far by Hildelies Balk-Pennington de Jongh (KB)
- Case study: Scanning parameters by Apostolos Antonacopoulos (USAL)
- Applied IMPACT: Does the new FineReader Engine and Dutch lexicon increase OCR accuracy and production efficiency? A case study by KB and CCS by Claus Gravenhorst (CCS)
BLOCK 2: FRAMEWORK AND EVALUATION
- Experiences in mass digitisation: examining OCR quality by Paul Fogel (University of California, California Digital Library)
- The IMPACT Interoperability Framework – Workflows for OCR and beyond by Clemens Neudecker (KB)
- IMPACT Evaluation tools, ground truth and datasets by Stefan Pletschacher (University of Salford)
BLOCK 3: TOOLS FOR IMPROVED TEXT RECOGNITION
- ABBYY FineReader: IMPACT improvements by Michael Fuchs (ABBYY Europe)
- IBM Adaptive OCR engine and CONCERT Cooperative Correction by Asaf Tzadok (IBM Haifa Research Lab)
- Crowdsourcing in the Digitalkoot project by Majlis Bremer-Laamanen (National library of Finland)
- The Functional Extension Parser: A Document Understanding Platform by Günter Mühlberger (University of Innsbruck) <this presentation was dropped from the schedule due to late running and was given during the Research Parallel Sessions on 2nd day>
- Postcorrection in IMPACT by Ulrich Reffle (CIS group, University of Munich)
Tuesday 25 October 2011
- Keynote: OCR and the transformation of the Humanities by Gregory Crane (Professor & Chair, Department of Classics, Tufts University)
BLOCK 4: LANGUAGE TOOLS AND RESOURCES
- Overview of language work in IMPACT by Katrien Depuydt (INL)
- Evaluation of lexicon supported OCR and Information Retrieval by Jesse de Does (INL)
- CLARIN and IMPACT: Crossing Paths by Steven Krauwer (CLARIN coordinator, University of Utrecht)
BLOCK 5: IMPACT CENTRE OF COMPETENCE
- The EC Digital Agenda and official launch of the IMPACT Centre of Competence by Khalil Rouhana (European Commission – Director for digital content and cognitive systems in DG Information Society and Media)
- Introduction to the IMPACT Centre of Competence by Aly Conteh (BL) and Hildelies Balk-Pennington de Jongh (KB)
- Research Session: Presentation and discussion of state of the art research tools for document analysis and OCR, hosted by Apostolos Antonacopoulos (University of Salford).
- Language Session: Presentation and demonstration of the IMPACT language tools & resources in further detail, hosted by Katrien Depuydt (INL)
- Digitisation Tips Session: Meet the expert: questions & answers on digitisation issues, hosted by Aly Conteh (The British Library)