Paper Titles

Using of Downsampling Theory Base on Shannon in Software Radio
p.1526

An Efficient Ear Recognition Method from Two-Dimensional Images
p.1531

Body Sensor Network Processing Mechanism for Micro-Data Security Publishing
p.1536

A Method of Chinese Text Detecting Errors Based on Recognition Errors by OCR
p.1540

Automatic Term Recognition Using Hybrid Method Based on Rewriting and Statistic
p.1544

A Study of Based on Suffix Array Technology to Identify Network Hot Topic
p.1550

Research of Riveting Structure Identification and Characteristic Parameter Analysis Based on SVM
p.1554

A Study of Recognition Algorithms of Large-Scale Image Based on the Fusion of SIFT Features and BP Neutral Network
p.1558

Moving Object Detection Based on Improved Background Updating Method for Gaussian Mixture Model
p.1561

HomeAdvanced Materials ResearchAdvanced Materials Research Vols. 1049-1050Automatic Term Recognition Using Hybrid Method...

Automatic Term Recognition Using Hybrid Method Based on Rewriting and Statistic

Article Preview

Abstract:

Machine aided human translation (MAHT) for the abstract of patent texts is an important step to the deep processing of the patent data, where the terms have significant application value. This paper investigates the automatic term recognition (ATR), and proposes a new hybrid method based on two-phase analysis and statistic to generate English candidate terms. The segments including stop words were not simply discarded; instead, a rewriting method using beginning patterns, ending patterns, and inner patterns on the phase two was employed for the processing of the segments. In the meantime, generalized statistical measures were used for the evaluation of the candidates such as the generalized mutual information (MI), Log-Likelihood Ratio (LLR), and C-value to filter the low score’s candidate terms and to attain the intersection set of them. The experiments on the patent abstract texts extracted randomly show the availability of the method.

You might also be interested in these eBooks

Modern Technologies in Materials, Mechanics and Intelligent Systems

Info:

Periodical:

Advanced Materials Research (Volumes 1049-1050)

Pages:

1544-1549

DOI:

https://doi.org/10.4028/www.scientific.net/AMR.1049-1050.1544

Citation:

Cite this paper

Online since:

October 2014

Authors:

Wen Xiong*

Keywords:

Automatic Term Recognition (ATR), Information Retrieve (IR), Machine Aided Human Translation (MAHT), Term Extraction, Text Mining (TM)

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] M.T. Pazienza: A Domain Specific Terminology Extraction System, in: International Journal of Terminology. Benjamin Ed., Vol. 5. 2 (1999) 183-201.

[2] K. Kageura and B. Umino: Methods of Automatic Term Recognition, in: Terminology, 3(2). (1996).

[3] D. Fedorenko, N. Astrakhantsev and D. Turdakov: Automatic Recognition of Domain-Specific Terms: an Experimental Evaluation, in: Proceedings of the Ninth Spring Researcher's Colloquium on Database and Information Systems, Kazan, Russia, (2013).

[4] L.L. Earl: Experiments in Automatic Extracting and Indexing, in: Information Storage and Retrieval, 6(X) (1970) 273-288.

[5] D. Bourigault: Surface Grammatical Analysis for the Extraction of Terminological Noun Phrases, in: Proc. of Fifteenth International Conference on Computational Linguistics (1992).

DOI: 10.3115/992383.992415

[6] D.A. Evans and C. Zhai: Noun-Phrase Analysis in Unrestricted Text for Information Retrieval, in: Proceedings of the 34th Conference on Association for Computational Linguistics, Santa Cruz, California (1996) 17-24.

DOI: 10.3115/981863.981866

[7] B. Daille: Approach mixte pour l'extraction de termilogie: statistique lexicale et filters linguistiques, in: PhD Thesis, C2V, TALANA, Universitè Paris VII (1994).

[8] J. Justeson and S. Katz: Technical Terminology: Some Linguistic Properties and an Algorithm for Identification in Text, in: Natural Language Engineering, 1 (1995) 9-27.

DOI: 10.1017/s1351324900000048

[9] E. Brill: Some Advances in Transformation-based Part-of-Speech Tagging, in: Proceedings of the 15th International Conference on Computational Linguistic, 1034-1038 (1994).

[10] R. Basili, M.T. Pazienza and P. Velardi: An Empirical Symbolic Approach to Natural Language Processing, in: Artificial Intelligence, vol. 85 (1996).

DOI: 10.1016/0004-3702(95)00116-6

[11] G. Salton, C.S. Yang and C.T. Yu: A Theory of Term Importance in Automatic Text Analysis, in: Journal of the American Society for Information Science 26(1) (1975) 33-44.

DOI: 10.1002/asi.4630260106

[12] L.P. Jones, E.W. Gassie and S. Radhakrishnan: INDEX: The Statistical Basis for an Automatic Conceptual Phrase-Indexing System, in: Journal of the American Society for Information Science 41(2) (1990) 87-97.

DOI: 10.1002/(sici)1097-4571(199003)41:2<87::aid-asi2>3.0.co;2-8

[13] R. M. Fano: Transmission of Information: A statistical Theory of Communications (MIT Press, Cambridge, MA. 1961).

[14] T. Dunning: Accurate Methods for the Statistics of Surprise and Coincidence, in: Computational Linguistics 19(1) (1994) 61-74.

[15] K.T. Frantzi and S. Ananiadou: Extracting Nested Collocations, in: COLING 1996. 41-46.

[16] F.A. Smadja, K. McKeown and V. Hatzivassiloglou: Translating Collocations for Bilingual Lexicons: a Statistical Approach, in: Computational Linguistics, 22: 1 (1996).

[17] K. W. Church, E. Gale, P. Hanks and D. Hindle: Using Statistics in Lexical Analysis, in: Lexical Acquisition: Using On-line Resources to Build a Lexicon, Lawrence Erlbaum, (1991).

DOI: 10.4324/9781315785387-8

[18] S. Ananiadou and D. Maynard: Identifying Contextual Information for Term Extraction, in: Proc. of 5th International Congress on Terminology and Knowledge Engineering (1999).

[19] A. Hu, J. Zhang and J. Liu: Chinese Term Extraction Based on Improved C-value Method, in: XIANDAI TUSHU QINGBAO JISHU, (2013).

[20] J. Liu, T. He and X. Liu: Extracting Chinese Multi-Word Units from Large-Scale Balanced Corpus, in: The 17th PACLIC Conference, 2003: 282-289.