Skip to content

The OCR process – Clemens Neudecker & Niall Anderson

12 July, 2011

IMPACT work with ABBYY

IMPACT has been working with both ABBYY and IBM on OCR and Clemens starts with outlining all the steps required during the OCRing process and the extension of the FineReader Engine so it performs better with historical text.

Clemens Neudecker talks at IMPACT BL Demo Day

Clemens Neudecker talks at IMPACT BL Demo Day

FineReader 10 now has many improvements and modifications that are based on input from the IMPACT project.

Clemens explained the advantages of adaptive binarisation and then segmentation as they have been developed for use within the ABBYY FineReader Engine.

Adaptive binarisation can now automatically apply binarisation at different thresholds across a page to improve results.

Improved segmentation tools can now help by breaking blocks down into regions and then back down into words and then into glyphs more accurately.

Examples were shown that shows this segmentation tool in action and how the new FRengine10 has improved results over the pre-IMPACT FR engine.

These improvements are also reflected in better recognition results in FR10 than were available in earlier versions of FR.

Work is being completed on the interface so that external dictionaries can be integrated into FineReader Engine.  The final aim will be for FREngine to be able to work with any language from any time-period.

Work is also going on with developing a new ALTO export format and this has been supported  since FRE10R2.

Clemens’s presentation can be viewed here:

IBM CONCERT tool

Niall then gave some background to the IBM adaptive OCR process and CONCERT.

First, he provided a quick overview of some of the other ‘collaborative correction’ processes which have worked with user-feedback to improve OCR results such as the Australian Newspaper project.

Niall Anderson talks at the IMPACT BL Demo Day

Niall Anderson talks at the IMPACT BL Demo Day

Niall then demonstrated the CONCERT tools with a live online demo of the tool and by showing the screencast here:

http://fue.onb.ac.at/impact/gwsw/vid/EE1_showcase.html

Niall then discussed some other alternatives to OCR and post-correction using techniques such as Word Spotting.

Niall’s slides are available here:

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: