New Words Identification Based on Ensemble Methods

Chen Zhang; Yu Quan Chen

doi:10.4028/www.scientific.net/AMM.602-605.1626

Paper Titles

Iris Recognition Method Based on Feature Discrimination and Category Correlation
p.1610

The Recognition Method for the Supersecondary Structure of DNA-Binding Protein
p.1614

Database Design on Physical Function Monitoring System for Sports Training
p.1618

Data Persistence on Physical Function Monitoring System for Athletes
p.1622

New Words Identification Based on Ensemble Methods
p.1626

Research on Technical Conditions Testing Line of Certain Tank Gun Weapon System
p.1630

Network Intrusion Detection Method Based on RS-LSSVM
p.1634

A Moving Object Detection Algorithm Based on ORB under Dynamic Scene
p.1638

Exploring Data Mining and Aided Diagnosis System of Hepatopathy
p.1642

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 602-605New Words Identification Based on Ensemble Methods

New Words Identification Based on Ensemble Methods

Abstract:

In order to identify new words in huge Chinese corpus efficiently, this paper comes up with an algorithm based on ensemble methods. At first we perform Chinese word segmenting with Trie and build segment-tree. Then we select words pattern drawing method, frequency filtering, independent word probability and naive Bayes model to be sub-models of ensemble methods and train them independently. At last we integrate results from different sub-models with a multi-layer model. In experiment, this algorithm is proved to be quite fast as well as product precise and high-coverage results.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 602-605)

Pages:

1626-1629

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.602-605.1626

Citation:

Cite this paper

Online since:

August 2014

Authors:

Chen Zhang*, Yu Quan Chen

Keywords:

Ensemble Methods, IWP, New Words Identification, Trie, Word Pattern

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] LIN Ling. How Chinese new words prevailing in Internet survive. Journal of Chengdu University, 2: 110–113, (2008).

Google Scholar

[2] LI Xiao-hua. Semantic motivation and cognition of Chinese homophonic neologisms. Journal of Langfang Teachers College (Social Sciences Edition), 28(6): 39–41, (2012).

Google Scholar

[3] ZHANG Hai-jun, SHI Shu-min, ZHU Chao-yong, and HUANG He-yan. Survey of Chinese new words identification. Computer Science, 37(3): 6–16, (2010).

Google Scholar

[4] LIN Zi-fang and JIANG Xiu-feng. A new method for Chinese new word identification based on the improved PWP. Journal of Fuzhou University (Natural Science Edition), 39(1): 43–48, (2011).

Google Scholar

[5] LIU Jian-zhou, HE Ting-ting, and LUO Chang-ri. Automatic new words detection based on corpus and web. Computer Applications, 24(7): 132–134, (2004).

Google Scholar

[6] LI Dun, GAO Yuanda, and WAN Yueliang. Internet oriented new words identification. Journal of Beijing University of Pots and Telecommunications, 31(1), (2008).

Google Scholar

[7] Thomas G. Dietterich. Ensemble methods in machine learning. In Multiple Classifier Systems, volume 1857 of Lecture Notes in Computer Science, pages 1–15. Springer Berlin Heidelberg, (2000).

DOI: 10.1007/3-540-45014-9_1

Google Scholar

[8] L. Kuncheva and C. Whitaker. Measures of diversity in classifier ensembles. Machine Learning, (51): 181–207, (2003).

Google Scholar

[9] CUI Shiqi, LIU Qun, MENG Yao, YU Hao, and Nishino Fumihito. New word detection based on large-scale corpus. Journal of Computer Research and Development, 43(5): 927– 932, (2006).

DOI: 10.1360/crad20060524

Google Scholar

[10] Hwanjo Yu, ChengXiang Zhai, and Jiawei Han. Text classification from positive and unlabelled documents. In Proceedings of the twelfth international conference on Information and knowledge management, pages 232–239. ACM, (2003).

DOI: 10.1145/956863.956909

Google Scholar

[11] NLPIR Chinese words segmenting system. http: /ictclas. nlpir. org.

Google Scholar