Research on Entity Resolution Algorithm Based on Domain Ontology Using MapReduce

Article Preview

Abstract:

DO-Swoosh algorithm maps the input data to domain ontology and count the amount of data for each leaf node. Then define the distance between nodes according to the hierarchical relationships reflected by domain ontology and propose a node merging algorithm based on the principles of data balance and nearest merge. At last, perform entity resolution for each group according to the node merging result. DO-Swoosh still keeps the good generality and gets better performance and scalability with the aid of MapReduce.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

1669-1674

Citation:

Online since:

August 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2013 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Ahmed K. Elmagarmid, Panagiotis G. Ipeirotis, Vassilios S. Verykios. Duplicate Record Detection: A Survey. EEE Transactions on Knowledge and Data Engineering, vol. 19, no. 1: pp.1-16, (2007).

DOI: 10.1109/tkde.2007.250581

Google Scholar

[2] BiYulong. Research and Implement ion of Entity Recognition System based on Hadoop Platform. Heilongjiang University, Harbin, Heilongjiang, China, (2012).

Google Scholar

[3] LiuYongnan, Wang Hongzhi, Gao Hong. Entity Resolution Method Based on Wave of Strings Using MapReduce. Journal of Frontiers of Computer Science and Technology, no. 8, pp.730-739, (2011).

Google Scholar

[4] Omar Benjelloun, Hector Garcia-Molina, David Menestrina, Qi Su, Steven Euijong Whang, Jennifer Widom. Swoosh: a generic approach to entity resolution. The VLDB Journal, no. 18, p.255–276, (2009).

DOI: 10.1007/s00778-008-0098-x

Google Scholar

[5] ChangWeili. Application Research on Chinese Named Entity Recognition Based on Domain Ontology. Wuhan University of Science and Technology, Wuhan, Hubei, China, (2011).

Google Scholar

[6] GuarinoN. Semantic Matching: Formal Ontological Distinctions for information Organization, Extraction and Integration. Proceeding SCIE '97 International Summer School on Information Extraction: A Multidisciplinary Approach to an Emerging Information Technology, pp.139-170, (1997).

DOI: 10.1007/3-540-63438-x_8

Google Scholar