Paper Title:
An Improved Method for Mathematical Formula Extraction in Printed English and Chinese Documents
  Abstract

Accurately locating mathematical formulas in scientific documents is the basis of their recognition. The existing formula extraction methods mostly aim at the documents in one language, which is inadaptable to the documents in other languages. This paper describes an improved method to extract formulas not only in Chinese but also in English documents. First, using run-number as the features to distinguish the documents’ language; and then according to the difference between Chinese and English documents, corresponding features and parameters are chosen for the formula extraction. The experimental results show that this method can improve the robustness of formula extraction.

  Info
Periodical
Edited by
Qi Luo
Pages
1174-1179
DOI
10.4028/www.scientific.net/AMM.20-23.1174
Citation
X. D. Tian, X. Liang, "An Improved Method for Mathematical Formula Extraction in Printed English and Chinese Documents", Applied Mechanics and Materials, Vols. 20-23, pp. 1174-1179, 2010
Online since
January 2010
Export
Price
$32.00
Share

In order to see related information, you need to Login.

In order to see related information, you need to Login.

Authors: Wei Ran Lin, Zhi Hui Wu, Li Chao Feng, Wai Bin Huang
Abstract:KNN algorithm is used for Chinese text classification in this paper. First, TF-IDF is chosen as the feature weighting method. To the...
700
Authors: Yong Hua Yin, Ying Jin, Quan Yin Zhu, Yun Yang Yan
Chapter 8: Computer Software Engineering
Abstract:In order to efficient tap the potential value in Chinese PDF documents and use Chinese PDF documents, an unique idea that extracting images...
887
Authors: Rong Fen Gong, Mao Xiang Chu, Yong Hui Yang
Chapter 5: Numerical Methods, Computation Methods and Algorithms for Modeling, Simulation and Optimization, Data Mining and Data Processing
Abstract:An extraction method based on invariance geometric feature is proposed in this paper. This method extracts two types of feature from the...
1570