Paper Title:
Correlation Based Method to Detect and Remove Redundant Web Document
  Abstract

The enrichment of internet has resulted in the flooding of abundant information on WWW with more replicas. As the duplicated web pages increase the indexing space and time complexity, finding and removing these pages becomes significant for search engines and other likely system which will improve on accuracy of search results as well as search speed. Web content mining plays a vital role in resolving these aspects. Existing algorithm for web content mining focus attention on applying weightage to structured documents whereas in this research work, a mathematical approach based on linear correlation is developed to detect and remove the duplicates present in both structured and unstructured web document. In the proposed work, linear correlation between two web documents is found out. If the correlated value is 1 then the documents are said to be exactly redundant and it should be eliminated otherwise not redundant.

  Info
Periodical
Advanced Materials Research (Volumes 171-172)
Edited by
Zhihua Xu, Gang Shen and Sally Lin
Pages
543-546
DOI
10.4028/www.scientific.net/AMR.171-172.543
Citation
G. Poonkuzhali, R. K. Kumar, R. K. Keshav, P. Sudhakar, K. Sarukesi, "Correlation Based Method to Detect and Remove Redundant Web Document", Advanced Materials Research, Vols. 171-172, pp. 543-546, 2011
Online since
December 2010
Export
Price
$32.00
Share

In order to see related information, you need to Login.

In order to see related information, you need to Login.

Authors: Ming Ying Luo
Abstract:This paper presents a candidate set in the case does not produce only scan through a database to extract the top-k strongly correlated item...
1103
Authors: Cheng Yang, Tao Feng
Chapter 3: Functional Manufacturing and Information Technology
Abstract:In order to identify engine status correctly, a novel method of abnormal noise diagnosis of internal combustion engine based on the wavelet...
168
Authors: Zong Zuo Yu, Jing Feng Wang
Chapter 1: Advanced Material Engineering and Dynamic Systems
Abstract:This paper introduces arithmetic to calculate correlation of two long sequences by using fast correlation method, and the design method of...
35
Authors: Zong Tao Li, Yu Guo, Heng Ye
Chapter 4: Data Acquisition, Data Mining and Data Processing
Abstract:Results of autocorrelation analysis algorithm by the LabVIEW are different from the theoretical results. To address the problem, a...
1125
Authors: Weera Kompreyarat, Thanasin Bunnam
Chapter 7: Image, Data and Signal Processing
Abstract:In this paper, we propose a development of Thai Buddha amulet identification using simple local correlation features. By using this...
531