|

|
INPUT / IMPORT BIBLIODATA
Scanned documents and digital images are imported, and pre-existing bibliographical meta data are added. Communication with the library and catalogue system means the process is monitored continuously using digital tracking. The physical location and status can be retrieved at any given time.
DIGITAL IMAGE ENHANCEMENT
Scanned pages are deskewed and despeckled where necessary, as well as cropped or resized. Scanned double pages are automatically split.
LAYOUt analysis
Page zones such as headings, text blocks, charts, advertisements and illustrations are separated, highlighted and classified.
TEXT recognition
The zones that were predefined during layout analysis are transformed into digital full text. A large variety of modern and historical fonts, languages and dictionaries are used, resulting in up to 99% recognition exactitude.
In addition docWORKS also uses historical and specialist dictionaries and algorithms to allow for better recognition of mixed font and “Fraktur” (Gothic print) texts.
STRUcTURE ANALYSis
Intelligent Structure Recognition (ISR) automatically marks
-
lead-in, main body, final section,
-
chapter, subchapter, section, article etc.
-
captions for images or tables/charts, author, footnote
META DATA
Physical and logical structures are converted into XML data formats. The transferred bibliographical meta data are also assimilated.
CORRECTION/QUALITY CONTROL
Interactive quality assurance, which can also be performed online by service providers, allows for specifications to be upheld to the highest level.
EXPORT
Exporting takes place in accordance with library standards.
Migrationable, OS independent XML data, such as METS/ALTO, image data formats like TIFF, JPEG, JPEG2000 and PDF formats such as PDF, structured PDF, PDF/A etc. can all be exported.
|