A Novel Text Extraction Method from Pure Text Images Using Morphological Operations

Xuan Qi Chen; Biao He; Guo Cheng Wang; Yao Xin Li

doi:10.4028/www.scientific.net/AMR.989-994.3768

Paper Titles

An Improved Otsu Image Segmentation Algorithm
p.3751

Radar Compressed Sensing Imaging Method with Two-Dimensional Separable Sampling
p.3755

A MMSE Channel Estimation Method in QC-LDPC Coded OFDM Systems
p.3759

Novel Infrared and Visible Images Fusion Algorithm Based on NSCT
p.3763

A Novel Text Extraction Method from Pure Text Images Using Morphological Operations
p.3768

Long-Range Communication Scheme Based on Majority Combing
p.3773

Electromagnetic Scattering from Perfectly Conducting Periodic Rough Surfaces Using Improved Complex Images Method
p.3777

Application and Analysis of the Full-Wave Band CCD System in Image Interpretation Technology
p.3782

An Improved DFT-Based Channel Estimation Algorithm for OFDM System on Time-Varying Multipath Fading Channels
p.3786

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 989-994A Novel Text Extraction Method from Pure Text...

A Novel Text Extraction Method from Pure Text Images Using Morphological Operations

Abstract:

This paper presents a new method to achieve effective text extraction using mathematical morphology. Firstly, the document is segmented and divided into several parts based on the layout. And then, every part is dilated to big connected regions, whose biggest skeleton will be extracted and serve as a structure element (SE). Finally, a proposed region-concatenated operation with the SE will be employed, whose result can be the input of subsequent OCR system. Experimentally, the proposed method is robust to noise, the text orientation, font style and size, language and layout.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Advanced Materials Research (Volumes 989-994)

Pages:

3768-3772

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.989-994.3768

Citation:

Cite this paper

Online since:

July 2014

Authors:

Xuan Qi Chen*, Biao He, Guo Cheng Wang, Yao Xin Li

Keywords:

Document Image Process, Mathematical Morphology, Skeleton, Text Extraction

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] LA. Fletcher and R. Kasturi., A robust algorithm for text string separation from mixed text/graphics images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 10(6): 910918, (1988).

DOI: 10.1109/34.9112

Google Scholar

[2] Q. Yuan and C. L. Tan, Text extraction from gray scale document images using edge information.

DOI: 10.1109/icdar.2001.953803

Google Scholar

[3] Jisheng Liang ; Phillips, I.T. ; Haralick, R.M. , An Optimization Methodology for Document Structure Extraction on Latin Charater Documents, Pattern Analysis and Machine Intelligence, IEEE Transactions on, page(s): 719 – 734, (2001).

DOI: 10.1109/34.935846

Google Scholar

[4] Parodi, P. ; Piccioli, G. A fast and flexible statistical method for text extraction in document pages, " Computer Vision and Pattern Recognition, 1996. Proceedings CVPR , 96, 1996 IEEE Computer Society Conference on, page(s): 619 - 624.

DOI: 10.1109/cvpr.1996.517137

Google Scholar

[5] A.K. Jain and S. Bhattacharjee., Text segmentation using Gabor filters for automatic document processing., Machine Vision and Applications, 5: 169-184, (1992).

DOI: 10.1007/bf02626996

Google Scholar

[6] Clark, A. ; Filev, D. , Clustering techniques for rule extraction from unstructured text fragments, Fuzzy Information Processing Society, 2005. NAFIPS 2005. Annual Meeting of the North American, page(s): 793 – 798, (2005).

DOI: 10.1109/nafips.2005.1548641

Google Scholar