A Novel Text Extraction Method from Pure Text Images Using Morphological Operations

Article Preview

Abstract:

This paper presents a new method to achieve effective text extraction using mathematical morphology. Firstly, the document is segmented and divided into several parts based on the layout. And then, every part is dilated to big connected regions, whose biggest skeleton will be extracted and serve as a structure element (SE). Finally, a proposed region-concatenated operation with the SE will be employed, whose result can be the input of subsequent OCR system. Experimentally, the proposed method is robust to noise, the text orientation, font style and size, language and layout.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 989-994)

Pages:

3768-3772

Citation:

Online since:

July 2014

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] LA. Fletcher and R. Kasturi., A robust algorithm for text string separation from mixed text/graphics images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(6): 910918, (1988).

DOI: 10.1109/34.9112

Google Scholar

[2] Q. Yuan and C. L. Tan, Text extraction from gray scale document images using edge information.

DOI: 10.1109/icdar.2001.953803

Google Scholar

[3] Jisheng Liang ; Phillips, I.T. ; Haralick, R.M. , An Optimization Methodology for Document Structure Extraction on Latin Charater Documents, Pattern Analysis and Machine Intelligence, IEEE Transactions on, page(s): 719 – 734, (2001).

DOI: 10.1109/34.935846

Google Scholar

[4] Parodi, P. ; Piccioli, G. A fast and flexible statistical method for text extraction in document pages, " Computer Vision and Pattern Recognition, 1996. Proceedings CVPR , 96, 1996 IEEE Computer Society Conference on, page(s): 619 - 624.

DOI: 10.1109/cvpr.1996.517137

Google Scholar

[5] A.K. Jain and S. Bhattacharjee., Text segmentation using Gabor filters for automatic document processing., Machine Vision and Applications, 5: 169-184, (1992).

DOI: 10.1007/bf02626996

Google Scholar

[6] Clark, A. ; Filev, D. , Clustering techniques for rule extraction from unstructured text fragments, Fuzzy Information Processing Society, 2005. NAFIPS 2005. Annual Meeting of the North American, page(s): 793 – 798, (2005).

DOI: 10.1109/nafips.2005.1548641

Google Scholar