Skip to content

Tools for Document Image Analysis

7 May, 2010

Ioannis Pratikakis of the NCSR – National Center for Scientific Research – “Demokritos” now provides a live demo of various image pre-processing tools.  By combining various familiar algorithms used to binarise images, an operator can get a good visual idea of which type of binarisation (or combination) will produce the best OCR results.  The same applies for segmentation algorithms, geometric defect correction, border removal and noise reduction.  The NCSR tool also provides numerical accuracy metrics for determining the best method for the source material in hand.

Ioannis now presents the “Word Spotting” module.  This works by selecting a character in a volume, defining what it is, and building a database of character results for the whole volume.  Someone who was searching for a word in that volume would not search a text or xml file (as usually happens in OCR), they would be searching a purpose-built graphical version of the volume itself – one whose output isn’t based on OCR, but on the unique formatting and condition of that volume.  As Ioannis explains, this approach could really suit books on which typical OCR technologies would struggle or fail.

Video to come very soon!

Niall Anderson, BL + Mark-Oliver Fischer, BSB

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: