p.2287
p.2291
p.2296
p.2301
p.2309
p.2313
p.2318
p.2322
p.2326
Extraction Research about Parallelization of Named Entity Based on Hadoop Platform
Abstract:
With the era of big data approaching, data becomes more and more important. Faced with such massive amounts of data space, how to quickly identify the contents of a field that the users are interest in and extract them out, is an urgent problem to be solved. To identify the content that users are interested in, we can use NLPIR Chinese word segmentation framework for speech segmentation, and identify named entity according to part of speech tagging. For extraction, using Hadoop, parallel cluster platform based on a big data MapReduce framework, using the Hadoop Distributed File System (HDFS) for efficient data access and starting Map and Reduce tasks to extract the information of named entity. This task extracts the required information from the interactive encyclopedia and then stores them in the knowledge base. It implements the task of extracting the information data of parallelization of named entity based on Hadoop platform.
Info:
Periodical:
Pages:
2309-2312
Citation:
Online since:
September 2013
Authors:
Price:
Сopyright:
© 2013 Trans Tech Publications Ltd. All Rights Reserved
Share:
Citation: