Skip to content

The Functional Extension Parser – a rule-based system for flexible structural analysis

7 May, 2010

Lukas Gander of Universitäts- und Landesbibliothek Tirol (University and Regional Library Tyrol) outlines the concept behind the Functional Extension Parser: using an OCR engine’s output to create a structural map of a page or volume.  OCR engines capture much more information than simple text: for instance, they contain information about text type and position.  The Functional Extension Parser (FEP) will spot if, say, numerical values appear repeatedly at the bottom of a page and tag them as page numbers.  Similar with Table of Content, chapter headings, indices and formulae.  The FEP does this by the application of rules that have been designed to model a human’s intuitive understanding of book structures.

One of the key potential benefits of the FEP will be in e-publishing, because the information it gathers about the structure of pages will include information about the print space and margins of a page – allowing a print to be easily made from a digital version.

Lukas Gander didn’t want his presentation to be filmed, so there is no video of this talk.

Niall Anderson, BL + Mark-Oliver Fischer, BSB

No comments yet

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: