Glossary of Imaging Terms


A B C D E F I L M O P R S T V W X

A
Accuracy percent: Is used to measure the numbers of characters correctly interpreted by a recognition engine. Can be misleading as the recognition engine only reports the errors that it fails to identify or that are caught through post processing (see also substitutions).

Anchor Points: Refers to crosses or other marks placed in corners of documents to allow them to be consistently lined up within a computer systems memory. This enables accurate finding of data and lining up of templates.

Audit Trail: A printed report identifying where in the scanning process each document is located.
Autofeeder: A device which is either integral to or added on to a paper scanner to accept a stack of paper and automatically feed pages. Autofeeders vary in their ability to accept differing thicknesses, sizes and qualities of paper. As paper transitions from thick to thin, double feeds can occur.

B
Barcode: Consists of a series of thin and thick black lines that when placed in defined patterns represent a numeric or alphabetic character. Various different symbologies identify the defined patterns. Barcodes can be one dimensional -- like the ones found on retail packages or two dimensional (known as 2D). 2D barcodes, which consist of a matrix of black and white blocks can contain large amounts of information. The most popular is PDF-417, developed by Symbol Technologies.

Barcode Recognition: Utilizing a scanned representation of the barcode to interpret it.

Bitonal: A term used to mean black and white images with no grayscale. Traditionally the main way to capture and store images of documents in document management systems.

Batching: Collecting multiple pages together and separating with batch separators. Batches are either fixed quantities of single pages which can be counted to identify double feeds (see autofeeders), or consist of multiple levels often based on three levels of index. Recently there has been some interest in using color coded bars scanned with a color scanner to identify batches.

Batch Control Sheets: Coded pages usually with barcodes or OCR’able characters that automatically separate pages within a batch or separate batches.

Book Scanning: Requires either specialized scanners or for the spline to be cut off. Flatbed scanners damage the spline and provide a fuzzy image at the edges.

C
Check Digit: A mathematical formula that adds a digit onto a field. When the field is captured, the check digit can be used to verify that the data was converted correctly.

Collection of Mail: A service offered by some outsource vendors where mail is received on behalf of a customer in a PO box which is routed direct to the outsource vendor.

Confidence factors: Used by recognition engines to decide the likelihood of the answer being accurate.

D
Data Color: Refers to the color of the data that must be extracted and converted. Carbonless paper can often produce a very faint image.

Data Prep: A term covering one or all of the following manual actions: the opening of envelopes, unfolding of paper, removal of staples, repair of tears.

Double Feed: the feeding of two sheets of paper at once. Sometimes on roller based scanners this can occur so cleanl;y that it cannot be detected.

DPI: Dots per Inch. A measurement of resolution of the scanned image. normally 200 dpi is adequate to represent a mainly textual document. Much OCR works better with 300 dpi, but this does NOT mean that it works even better with even higher resolutions -- depending on the algorithms it can work less well.

Drop-Out Ink: Inks that are not visible to the light spectrum of the scanner. Can either be pastels, particularly in the yellow/green range or specific color inks that match the color of the light source. New color scanners often include the ability to remove, or drop-out specific colors. Users want to drop-out background colors in order to capture the foreground information so as to apply OCR or some other recognition to it.

Duplex Scanning: The ability to scan both sides of a piece of paper in one pass.

E
Edit Checks: refers to the validation of types of fields. for example a field can be numeric only, alphabetic only or a specific pattern.

Endorser: usually provided with a programmable ink-jet, provides a method of printing on scanned documents to ensure that all the pages are scanned. Also provides a method to find specific pages.

F
False Positives: A term used in OCR to denote those characters which the conversion engine thought were wrong but were in fact correct. false positives tend to rise if the engine accuracy requirements are set too high (see also substitutions).

Fire Damage: Causes charring of paper and can cause degradation or destruction of image. The image can sometimes be reconstructed electronically. Also can make paper very brittle which means that a straight through scanner should be used to create the images.

Fire Protection: In paper intensive environmentsdry extinguishers should be installed in outsource vendor. Standard sprinklers cause paper damage (see water damage).

Flatbed Scanners: Scanners that contain an autofeeder and a piece of glass where the paper can be placed and scanned. Can be useful for certain non-standard papers, but is slow and not good for production scanning (see transport).

Form Colors: Normally refers to the overall color of the form which can have an impact on image quality. For example a black or blue image placed on a dark pink or red background will not provide adequate contrast on a black and white scanner. Form colors can also refer to the color of the background form (see drop-out ink), or to the color of the data image (see data color).

Form Redesign: Refers to the ability to improve the automated processing of the form through redesigning. Should be carried out in conjunction with the service provider.

I
ICR: literally Intelligent Character Recognition. Initially used as a term to differentiate Kurzweil’s OCR from other vendor’s products. Recently come to mean hand print recognition. Usually related to neural net technologies, can be used also to identify marks such as check-off boxes or stylized pattern fonts such as OCR-A, OCR-B or MICR.

IDR: A term used to denote intelligent document recognition. usually relies on full text OCR of a document the results of which are then used to analyse the content of the document and extract relevant fields of information.

L
Levels of Index: (see also batching). Documents may be filed by ‘cabinet’, ‘file’, and folder. This represents a 3 level index.

M
Mainframe: often needed to provide validation tables which may be down loaded. Service provider must be able to provide data and images in readable format on acceptable media.

Microfilming: refers to the ability to capture images on microfilm concurrently with digital media. Can be useful for human readable archival data.

Missed Scan: see double feed.

O
OCR: Optical Character Recognition. A method of using pattern recognition of images of characters to create computer readable data. different OCR software works better than others on certain types of data.

Off-Shore: the ability to send images or paper for manual intensive key entry to low cost locations. Historically these were located in the Caribbean as it was easy to fly documents there and the time zone is the same as for the East Coast. Now, though with the advent of low cost communications, off-shore service bureaus are springin up in India, Sri Lanka, China, Philippines, Mauritius, Zimbabwe and other english speaking locations.

OMR: Optical Mark Recognition. Sometimes called mark sense. Conversion of check-off marks to meaningful data. Simple and accurate way to capture survey type information automatically from people.

Overhead Scanners: Similar to planetary microfilm cameras, these scan a page placed on a platen (see also book scanners).

P
Paper Size: varies from business card size to 11x17 in business documents.

Paper Weight: varies from 9lb onionskin to 120lb cardstock.

Post Office Boxes: can be used to speed the input and collection of data (see collection of mail).

R
Reflectance: Refers to how much the ink and background paper reflect the light within the scanner. Affects the quality of image.

Repair: refers to the manual keyboard correction of characters wrongly converted by OCR or ICR.

S
Scanning Paper: the conversion of a page to a digital representation. Normally a page is broken into 200x200 or 300x300 dots per inch (dpi).

Schema: the defined layout for a specific business document using XML syntax.

Set-Up: the process of creating a new job.

Skew: the angling of the paper which can cause failure of OCR. some scanners will angle small paper badly.

Substitutions: Traditionally the most expensive errors to correct. Consists of those characters that a recognition engine is convinced it got right but that are in fact wrong. High levels of 'accuracy' reported by an OCR engine can mean that there are many substitutions. The alternative is to set tolerances very high -- then the engine will often report low accuracy -- but there may be many correctly interpreted characters which are labelled wrong.

T
Transport: the method by which the paper is moved past the digitizing scanner. Affects speed of throughput and types of paper.

Tumble Printed: refers to those double sided papers that get turned over from top to bottom. Requires duplex scanner to rotate 180 degrees.

V
Validation: performed against totals or against downloaded tables to ensure accuracy of data (see ‘mainframe’ and also ‘edit checks’).

Verification: the only proven way to ensure 100% data accuracy as opposed to 99.x%. Requires the rekeying of all data by a separate party.

Voting: a method of improving recognition through the use of multiple recognition engines, voting on the result voting can be internal or external -- internal voting tends to be preferable as the engines have reference to the internal confidence factors.

W
Water Damage: causes images particularly if hand written to bleed. Image can be recovered with sophisticated image processing.

X
XML: eXtensible markup language provides content and structure for B2B based forms through allowing fields and structures to be tagged and layout to be enforced.

XSL: eXtensible style language defines the styles associated with XML files XSLT: EXtensible Style Language Translation allows for XML formatted documents to be automatically transalated and reformatted.

 

 

 

 
 
Harvey Spencer Associates™ and this web site HarveySpencer.com and all information contained are
copyright protected by Harvey Spencer Associates ©2008 - All rights reserved.

This web site was designed to be viewed with Javascript & Style Sheets enabled.