Feedback Clustering Algorithm for Detecting Approximately Duplicate Records

Dong Mei Hu; Xiao Feng Zhou; Xiao Feng Chen

doi:10.4028/www.scientific.net/AMM.602-605.2138

Paper Titles

Speech Feature Parameter Extraction and Recognition Based on Interpolation
p.2118

Monitor System and Safety Status Analysis of Large Power Equipment Transportation
p.2124

Research of the Intelligent Medical Infusion Monitoring System
p.2130

Research and Implementation of a Wireless Network Security Inspection Platform for a Power Supply Enterprise
p.2134

Feedback Clustering Algorithm for Detecting Approximately Duplicate Records
p.2138

Design and Implementation of Automation Testing Framework Based on Keyword Driven
p.2142

Image Classification Recognition for Rock Micro-Thin Section Based on Probabilistic Neural Networks
p.2147

A Geometric Method for Near Sea Target Localization Based on GNSS-R in Passive Radar
p.2153

Visualization of IDS Logs Based on Regular Polygon of Even Sides
p.2157

HomeApplied Mechanics and MaterialsApplied Mechanics and Materials Vols. 602-605Feedback Clustering Algorithm for Detecting...

Feedback Clustering Algorithm for Detecting Approximately Duplicate Records

Abstract:

Detecting and merging approximately duplicate records is not an emerging issue in the field of data cleansing, the majority of duplicated records detecting method is based on the "sort-merge" thinking. Although clustering methods have been applied to data cleaning, a large number of non-duplicated records exist in clusters after analysis as a result of the increasing records. Response to this shortcoming, this paper presents a data cleansing method based on Clustering Feedback Pattern. Comparison results of clustering are fed back to the cluster process so that recall and precision improve.

You might also be interested in these eBooks

View Preview

Info:

Periodical:

Applied Mechanics and Materials (Volumes 602-605)

Pages:

2138-2141

DOI:

https://doi.org/10.4028/www.scientific.net/AMM.602-605.2138

Citation:

Cite this paper

Online since:

August 2014

Authors:

Dong Mei Hu*, Xiao Feng Zhou, Xiao Feng Chen

Keywords:

Approximately Duplicate Records, Cluster, Data Cleansing, Feedback

Export:

RIS, BibTeX

Price:

Permissions CCC:

Request Permissions

Permissions PLS:

Request Permissions

Сopyright:

Citation:

* - Corresponding Author

References

[1] He L, Zhang Z, Tan Y, et al. An efficient data cleaning algorithm based on attributes selection[C]/Computer Sciences and Convergence Information Technology (ICCIT), 2011, 6th International Conference on. IEEE, 2011: 375-379.

Google Scholar

[2] Rahm E, Do H H. Data cleaning: Problems and current approaches[J]. IEEE Data Eng. Bull., 2000, 23(4): 3-13.

Google Scholar

[3] GuoJun H, Ping H. An approach for detecting Approximately Duplicate Data Warehouse records[C]/Computer Application and System Modeling (ICCASM), 2010 International Conference on. IEEE, 2010, 3: V3-679-V3-682.

DOI: 10.1109/iccasm.2010.5620724

Google Scholar

[4] Chen Wei , Wang Hao. Computer Applications and Software. 2000, 37(10): 1153-1159. In Chinese.

Google Scholar

[5] Borah B, Bhattacharyya D K. An improved sampling-based DBSCAN for large spatial databases[C]/Intelligent Sensing and Information Processing, 2004. Proceedings of International Conference on. IEEE, 2004: 92-96.

DOI: 10.1109/icisip.2004.1287631

Google Scholar