Outlier | Scientific.Net

Outlier Detection Clustering Algorithm Based on Density

Authors: Yong Lin Leng, Hua Shen, Fu Yu Lu

Abstract: K-means is a classic algorithm of clustering analysis and widely applied to various data mining fields. Traditional K-means algorithm selects the initial centroids randomly, so the clustering result will be affected by the noise points, and the clustering result is not stable. For this problem, this paper proposed a k-means algorithm based on density outlier detection. The algorithm firstly detected the outliers with the density model and avoided selecting outliers as the initial cluster centers. After clustering the non outlier, according to distance of the outlier to each centroids, the algorithm distributed the outliers to the corresponding clustering. The algorithm effectively reduced the influence of outliers to K-means and improved the accuracy of clustering result. The experimental result demonstrated that this algorithm can effectively improve the accurate rate and stability of the clustering.

1808

Wind Data Anomaly Detection and Interpolation of Missing Data

Authors: Mao Yang, Jun Cheng Dong

Abstract: Data accuracy and completeness of the wind farm has great significance in wind power research. Because of the wind farm in the process of gathering data and transmission appears distorted and missing, and that leads the accuracy and integrity of data is greatly reduced, so the need for a wind farm data, outlier detection and missing data imputation. This paper outlier detection by statistical method based on 3σ criterion under the normal distribution, and use of the effectiveness of the recently distance interpolation and regression interpolation for missing data, outliers and replacement and interpolation, filled after data and accuracy are improved.

302

Evaluation of Outlier Specific Correction Procedure for Areal Surface Texture

Authors: Mohd Fauzi Ismail, Talib Ria Jaafar, Sharzali Che Mat, Muhammad Arif Abdul Hamid Pahmi

Abstract: Surface texture data measured by optical type profilometer such as confocal microscope often contains outliers which may disturb the characterization of the surface texture. This paper evaluates an outlier specific correction procedure (OSCP) for areal surface texture data which can removes the outliers without affecting normal data points. The outliers are identified based on the median of its relative height to neighbouring data points within the detection window. The application of OSCP to areal topography data measured by confocal laser scanning microscope is compared to Gaussian filter and median filter. The result shows that, OSPC is better in outlier correction without affecting normal data points but there is a room of improvement.

137

The Online Identification of Single Variable Outliers Based on a Three-Sliding Window-Bayesian Method

Authors: Yun Lian Liu, Tie Bin Wu, Wen Li, Yun Cheng, Tao Yun Zhou

Abstract: A method for online identifying and processing single variable outliers was proposed based on a three-sliding window-Bayesian method. Generally, the method utilized the characteristic that the flow rate and temperature in metallurgical production do not change suddenly. Based on this characteristic, the research accurately identified outliers and variation of normal working points by analyzing the change of Bayesian posterior probability and conditional probability of the detection data in the three sliding windows.

1960

Improved K-Means Algorithm Based on Outliers Detection in Review Spam Filtering

Authors: Zhe Yuan Ding, Ming Ke He, Ming Ze Gao, Fang Fang Li

Abstract: K-means algorithm is common in text clustering algorithm. The traditional K-means algorithm has sensitivity to the initial centers. The result of clustering depends on the initial centers excessively. For different input, the output fluctuated considerably. The K-means algorithm combined features dictionary with density based on outlier detection to detect the outliers in text data. In the first stage, the density parameter is given to all of the data objects using the custom distance function. In the second stage, K-means is used to cluster base on the distribution of density. K data objects are chosen to be the initial clustering centers as they belong to high density area and have the farthest distance for each other. In the third stage, the exception text sets can be identified from the clustering by the outlier detection algorithm. Experimental results show that the proposed approach can efficiently detect outliers in data set.

2233

Fuzzy Classification Maximum Likelihood Clustering with T-Distributions

Authors: Miin Shen Yang, Chih Ying Lin, Yi Cheng Tian

Abstract: Classification maximum likelihood (CML) procedure is a maximum likelihood mixture approach to clustering. In 1993, Yang first extended the CML to a so-called fuzzy CML (FCML), by combining fuzzy c-partitions with the CML function for a normal mixture model. However, normal distribution is not robust for outliers. In this paper we consider FCML with t-distributions and create a clustering algorithm, called FCMLT. Numerical examples and real data applications with comparisons are given to demonstrate the effectiveness and superiority of the proposed method.

392

Assessing the Harmonic Emission Level Based on Partial Least-Squares Regression with Data Envelopment Analysis

Authors: Xiang Li, Min You Chen, Yong Wei Zheng

Abstract: A novel method is used for assessing the harmonic emission level, which is based on the partial least-squares (PLS) regression with data envelopment analysis (DEA). Based on measuring the harmonic voltage and current at the point of common coupling (PCC) and removing the inefficiency data with DEA, regression coefficients are worked out through partial least-squares algorithm. Consequently the harmonic emission level of customer is calculated.The proposed approach removes the effect of outlying data points and gets accurate estimation results. The simulation results prove that the proposed method is more effective than PLS.

3367

Variability Analysis by Statistical Control Process and Functional Data Analysis — Case of Study Applied to Power System Harmonics Assessment

Authors: Joaquín Sancho, Jorge Pastor, Javier Martínez, Miguel Angel García

Abstract: Functional data appear in a multitude of industrial applications and processes. However, in many cases at present, such data continue to be studied from the conventional standpoint based on Statistical Process Control (SPC), losing the capacity of analyzing different aspects over the time. In this study is presented a Statistical Control Process based on functional data analysis to identify outliers or special causes of variability of harmonics appearing in power systems which can negatively impact on quality of electricity supply. The results obtained from the functional approach are compared with those obtained with conventional Statistical Process Control that has been done firstly.

118

RETRACTED: Survey of Clustering and Outlier Detection Techniques in Data Mining: A Research Perspective

Authors: R. Delshi Howsalya Devi, M. Indra Devi

Abstract: The Outlier detection is one of the major issues that has been worked out deeply within the Data Mining domain. It has been used to detect dissimilar observations within the data taken into the account. Detection of outliers helps to recognize the system faults and thereby helping the administrators to take preventive measures before it rises. In this paper, we recommends a comprehensive survey of an outlier detection. We anticipate this survey will support a better understanding of various directions in which experimental approach can be done on this topic.

511

An Application of IPA to Analyze Numeracy of Statistics and Mathematics of Employment-Required Knowledge of University Students

Authors: Hsiu Lan Ma, Chin I Jen, Szu Hsing Lin, Der Bang Wu

Abstract: This study used Importance-Performance Analysis (IPA) to analyze statistics and mathematics employment-required knowledge of university students.

774

Papers by Keyword: Outlier