Automatic Term Recognition Using Hybrid Method Based on Rewriting and Statistic

Article Preview

Abstract:

Machine aided human translation (MAHT) for the abstract of patent texts is an important step to the deep processing of the patent data, where the terms have significant application value. This paper investigates the automatic term recognition (ATR), and proposes a new hybrid method based on two-phase analysis and statistic to generate English candidate terms. The segments including stop words were not simply discarded; instead, a rewriting method using beginning patterns, ending patterns, and inner patterns on the phase two was employed for the processing of the segments. In the meantime, generalized statistical measures were used for the evaluation of the candidates such as the generalized mutual information (MI), Log-Likelihood Ratio (LLR), and C-value to filter the low score’s candidate terms and to attain the intersection set of them. The experiments on the patent abstract texts extracted randomly show the availability of the method.

You might also be interested in these eBooks

Info:

Periodical:

Advanced Materials Research (Volumes 1049-1050)

Pages:

1544-1549

Citation:

Online since:

October 2014

Authors:

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] M.T. Pazienza: A Domain Specific Terminology Extraction System, in: International Journal of Terminology. Benjamin Ed., Vol. 5. 2 (1999) 183-201.

Google Scholar

[2] K. Kageura and B. Umino: Methods of Automatic Term Recognition, in: Terminology, 3(2). (1996).

Google Scholar

[3] D. Fedorenko, N. Astrakhantsev and D. Turdakov: Automatic Recognition of Domain-Specific Terms: an Experimental Evaluation, in: Proceedings of the Ninth Spring Researcher's Colloquium on Database and Information Systems, Kazan, Russia, (2013).

Google Scholar

[4] L.L. Earl: Experiments in Automatic Extracting and Indexing, in: Information Storage and Retrieval, 6(X) (1970) 273-288.

Google Scholar

[5] D. Bourigault: Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases, in: Proc. of Fifteenth International Conference on Computational Linguistics (1992).

DOI: 10.3115/992383.992415

Google Scholar

[6] D.A. Evans and C. Zhai: Noun-Phrase Analysis in Unrestricted Text for Information Retrieval, in: Proceedings of the 34th Conference on Association for Computational Linguistics, Santa Cruz, California (1996) 17-24.

DOI: 10.3115/981863.981866

Google Scholar

[7] B. Daille: Approach mixte pour l'extraction de termilogie: statistique lexicale et filters linguistiques, in: PhD Thesis, C2V, TALANA, Universitè Paris VII (1994).

Google Scholar

[8] J. Justeson and S. Katz: Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text, in: Natural Language Engineering, 1 (1995) 9-27.

DOI: 10.1017/s1351324900000048

Google Scholar

[9] E. Brill: Some Advances in Transformation-based Part-of-Speech Tagging, in: Proceedings of the 15th International Conference on Computational Linguistic, 1034-1038 (1994).

Google Scholar

[10] R. Basili, M.T. Pazienza and P. Velardi: An Empirical Symbolic Approach to Natural Language Processing, in: Artificial Intelligence, vol. 85 (1996).

DOI: 10.1016/0004-3702(95)00116-6

Google Scholar

[11] G. Salton, C.S. Yang and C.T. Yu: A Theory of Term Importance in Automatic Text Analysis, in: Journal of the American Society for Information Science 26(1) (1975) 33-44.

DOI: 10.1002/asi.4630260106

Google Scholar

[12] L.P. Jones, E.W. Gassie and S. Radhakrishnan: INDEX: The Statistical Basis for an Automatic Conceptual Phrase-Indexing System, in: Journal of the American Society for Information Science 41(2) (1990) 87-97.

DOI: 10.1002/(sici)1097-4571(199003)41:2<87::aid-asi2>3.0.co;2-8

Google Scholar

[13] R. M. Fano: Transmission of Information: A statistical Theory of Communications (MIT Press, Cambridge, MA. 1961).

Google Scholar

[14] T. Dunning: Accurate Methods for the Statistics of Surprise and Coincidence, in: Computational Linguistics 19(1) (1994) 61-74.

Google Scholar

[15] K.T. Frantzi and S. Ananiadou: Extracting Nested Collocations, in: COLING 1996. 41-46.

Google Scholar

[16] F.A. Smadja, K. McKeown and V. Hatzivassiloglou: Translating Collocations for Bilingual Lexicons: a Statistical Approach, in: Computational Linguistics, 22: 1 (1996).

Google Scholar

[17] K. W. Church, E. Gale, P. Hanks and D. Hindle: Using Statistics in Lexical Analysis, in: Lexical Acquisition: Using On-line Resources to Build a Lexicon, Lawrence Erlbaum, (1991).

DOI: 10.4324/9781315785387-8

Google Scholar

[18] S. Ananiadou and D. Maynard: Identifying Contextual Information for Term Extraction, in: Proc. of 5th International Congress on Terminology and Knowledge Engineering (1999).

Google Scholar

[19] A. Hu, J. Zhang and J. Liu: Chinese Term Extraction Based on Improved C-value Method, in: XIANDAI TUSHU QINGBAO JISHU, (2013).

Google Scholar

[20] J. Liu, T. He and X. Liu: Extracting Chinese Multi-Word Units from Large-Scale Balanced Corpus, in: The 17th PACLIC Conference, 2003: 282-289.

Google Scholar