Paper Title:
Offline OCR System for Machine-Printed Turkish Using Template Matching
  Abstract

One of the most important application these days in Pattern Recognition (PR) is Optical Character recognition (OCR) which is a system used to convert scanned printed or handwritten image files into machine readable and editable format such as text documents. The main motivation behind this study is to build an OCR system for offline machine-printed Turkish characters to convert any image file into a readable and editable format. This OCR system started from preprocessing step to convert the image file into a binary format with less noise to be ready for recognition. The preprocessing step includes digitization, binarization, thresholding, and noise removal. Next, horizontal projection method is used for line detection and word allocation and 8-connected neighbors’ schema is used to extract characters as a set of connected components. Then, the Template matching method is utilized to implement the matching process between the segmented characters and the template set stored in OCR database in order to recognize the text. Unlike other approaches, template matching takes shorter time and does not require sample training but it is not able to recognize some letters with similar shape or combined letters, for this reason, this OCR system combines both the template matching and the size feature of the segmented characters to achieve accurate results. Finally, upon a successful implementation of the OCR, the recognized patterns are displayed in notepad as readable and editable text. The Turkish machine-printed database consists of a list of 630 names of cities in Turkey written by using Arial font with different sizes in uppercase, lowercase and capitalizes the first character for each word. The proposed OCR’s result show that the accuracy of the system is from 96% to 100%.

  Info
Periodical
Advanced Materials Research (Volumes 341-342)
Edited by
Liu Guiping
Pages
565-569
DOI
10.4028/www.scientific.net/AMR.341-342.565
Citation
A. Dena Rafaa, J. Nordin, "Offline OCR System for Machine-Printed Turkish Using Template Matching", Advanced Materials Research, Vols. 341-342, pp. 565-569, 2012
Online since
September 2011
Export
Price
$32.00
Share

In order to see related information, you need to Login.

In order to see related information, you need to Login.

Authors: Ai Wen Jiang, Gao Rong Zeng
Abstract:Video text provides important semantic information in video content analysis. However, video text with complex background has a poor...
827
Authors: Yong Lin Wang
Abstract:A feature extraction method based on the geometric properties of printed numerals was descrided. After reading a single binary numerical...
1523
Authors: Jing Ping Jia
Chapter 9: Signal & Data Processing Technology and System
Abstract:In this paper we proposed an effective but simple table recognition algorithm in the OCR field. First a binary image template is built for...
932
Authors: Aissa Boudjella, Brahim Belhouari Samir, Omar Kassem Khalil
Chapter 10: Sensors, Measurement, Detection and Intelligent Information and Data Processing, Fault Diagnosis
Abstract:This paper describes a new feature extraction method which can be used very effectively in combination with Cluster K-Nearest Neighbor (CKNN)...
1629