Extraction Research about Parallelization of Named Entity Based on Hadoop Platform

Article Preview

Abstract:

With the era of big data approaching, data becomes more and more important. Faced with such massive amounts of data space, how to quickly identify the contents of a field that the users are interest in and extract them out, is an urgent problem to be solved. To identify the content that users are interested in, we can use NLPIR Chinese word segmentation framework for speech segmentation, and identify named entity according to part of speech tagging. For extraction, using Hadoop, parallel cluster platform based on a big data MapReduce framework, using the Hadoop Distributed File System (HDFS) for efficient data access and starting Map and Reduce tasks to extract the information of named entity. This task extracts the required information from the interactive encyclopedia and then stores them in the knowledge base. It implements the task of extracting the information data of parallelization of named entity based on Hadoop platform.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

2309-2312

Citation:

Online since:

September 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

* - Corresponding Author

[1] H.P. ZHANG, H. K. Yu, D. Y. Xiong, et al. HHMM-based Chinese Lexical Analyzer ICTCLAS, Second SIGHAN workshop affiliated with 41th ACL; Sapporo Japan, July, 2003, 1-2.

DOI: 10.3115/1119250.1119280

Google Scholar

[2] HDFS Architecture: http: /hadoop. apache. org/docs/current/hadoop-project-dist/hadoop-hdfs/ HdfsDesign. html.

Google Scholar

[3] T. Wbite.Hadoop The Definitive Guide[M].O'Reilly Media Inc,2013, 223-224.

Google Scholar

[4] C. Lam. Hadoop in Action[M]. Manning Publications Co. (2011).

Google Scholar