Extraction Research about Parallelization of Named Entity Based on Hadoop Platform

Quan Shi; Zhen Dong Yang; Lu Xu

doi:10.4028/www.scientific.net/AMM.397-400.2309

Paper Titles

A PAPR Reduction Method for Hexagonal Constellation-Based OFDM Signals
p.2287

Based on k-Medoids and c5.0 Joint Constraint of the Drug Information Mining Algorithm
p.2291

Research on Asset Recognition Algorithm of Information Security Product Based on Decision Tree Algorithm
p.2296

A Dynamic Sliding Window Based on Otsu Method for Binary License Plate and Character Recognition
p.2301

Extraction Research about Parallelization of Named Entity Based on Hadoop Platform
p.2309

Trademark Recognition Based on Hu Modified Invariant Moments
p.2313

Fast Tone Mapping Based on Sorting
p.2318

Research on Online Visual Inspection System for Multi-Feature Trademark Defect
p.2322

The Analysis and Data Mining of Students’ Online Data Based on Digital Campus
p.2326

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 397-400Extraction Research about Parallelization of Named...

Extraction Research about Parallelization of Named Entity Based on Hadoop Platform

Abstract:

With the era of big data approaching, data becomes more and more important. Faced with such massive amounts of data space, how to quickly identify the contents of a field that the users are interest in and extract them out, is an urgent problem to be solved. To identify the content that users are interested in, we can use NLPIR Chinese word segmentation framework for speech segmentation, and identify named entity according to part of speech tagging. For extraction, using Hadoop, parallel cluster platform based on a big data MapReduce framework, using the Hadoop Distributed File System (HDFS) for efficient data access and starting Map and Reduce tasks to extract the information of named entity. This task extracts the required information from the interactive encyclopedia and then stores them in the knowledge base. It implements the task of extracting the information data of parallelization of named entity based on Hadoop platform.

You might also be interested in these eBooks

Advanced Design and Manufacturing Technology III

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 397-400)

Pages:

2309-2312

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.397-400.2309

Citation:

Cite this paper

Online since:

September 2013

Authors:

Quan Shi*, Zhen Dong Yang, Lu Xu

Keywords:

Chinese Word Segmentation, Hadoop, Named Entity Extraction, Named Entity Recognition

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] H.P. ZHANG, H. K. Yu, D. Y. Xiong, et al. HHMM-based Chinese Lexical Analyzer ICTCLAS, Second SIGHAN workshop affiliated with 41th ACL; Sapporo Japan, July, 2003, 1-2.

DOI: 10.3115/1119250.1119280

Google Scholar

[2] HDFS Architecture: http: /hadoop. apache. org/docs/current/hadoop-project-dist/hadoop-hdfs/ HdfsDesign. html.

Google Scholar

[3] T. Wbite．Hadoop The Definitive Guide[M]．O'Reilly Media Inc，2013, 223-224.

Google Scholar

[4] C. Lam. Hadoop in Action[M]. Manning Publications Co. (2011).

Google Scholar