An Improved Method for Mathematical Formula Extraction in Printed English and Chinese Documents
Accurately locating mathematical formulas in scientific documents is the basis of their recognition. The existing formula extraction methods mostly aim at the documents in one language, which is inadaptable to the documents in other languages. This paper describes an improved method to extract formulas not only in Chinese but also in English documents. First, using run-number as the features to distinguish the documents’ language; and then according to the difference between Chinese and English documents, corresponding features and parameters are chosen for the formula extraction. The experimental results show that this method can improve the robustness of formula extraction.
X. D. Tian and X. Liang, "An Improved Method for Mathematical Formula Extraction in Printed English and Chinese Documents", Applied Mechanics and Materials, Vols. 20-23, pp. 1174-1179, 2010