Authors: M.N. Noor, A.S. Yahaya, N.A. Ramli, Mohd Mustafa Al Bakri Abdullah
Abstract: Hourly measured PM10 concentration at eight monitoring stations within peninsular Malaysia in 2006 was used to conduct the simulated missing data. The gap lengths of the simulated missing values are limited to 12 hours since the actual trend of missingness is considered short. Two percentages of simulated missing gaps were generated that are 5 % and 15 %. A number of single imputation methods (linear interpolation (LI), nearest neighbour interpolation (NN), mean above below (MAB), daily mean (DM), mean 12-hour (12M), mean 6-hour (6M), row mean (RM) and previous year (PY)) were calculated to fill in the simulated missing data. In addition, multiple imputation (MI) was also conducted to compare between the single imputation methods. The performances were evaluated using four statistical criteria namely mean absolute error, root mean squared error, prediction accuracy and index of agreement. The results show that 6M perform comparably well to LI. Thus, this show that the effect of smaller averaging time gives better prediction. Other single imputation methods predict the missing data well except for PY. RM and MI performs moderately with the increasing performance in higher fraction of missing gaps whereas LR makes the worst methods for both simulated missing data percentages.
923
Abstract: In this paper, we propose a weighted quantile regression method for partially linear models with missing response at random. The proposed estimation method can give an efficient estimator for parametric components, and can attenuate the effect of missing responses. Some simulations are carried out to assess the performance of the proposed estimation method, and simulation results indicate that the proposed method is workable.
1013
Abstract: With the development of information technology and data collection capabilities improve, the amount of data accumulated increase, missing data problems are more and more obvious. Traditional clustering methods can not cluster data set which contained missing data directly. In this paper, we proposed a novel missing data measurement method based on the incomplete information system theory and designed the similarity measure criterion for the discrete and successive of attributes separately. The experiment uses K-means clustering to test algorithm accuracy from different missing data rate and different amount of data two aspects, results demonstrate that the method can cluster missing data set efficiently and accurately.
1500
Authors: Yu Liu, Feng Rui Chen
Abstract: This study aims to present a new imputation method for missing precipitation records by fusing its spatio-temporal information. On the basis of extending simple kriging model, a nonstationary kriging method which assumes that the mean or trend is known and varies in whole study area was proposed. It obtains precipitation trend of each station at a given time by analyzing its time series data, and then performs geostatistical analysis on the residual between the trend and measured values. Finally, these spatio-temporal information is integrated into a unified imputation model. This method was illustrated using monthly total precipitation data from 671 meteorological stations of China in April, spanning the period of 2001-2010. Four different methods, including moving average, mean ratio, expectation maximization and ordinary kriging were introduced to compare with. The results show that: Among these methods, the mean absolute error, mean relative error and root mean square error of the proposed method are the smallest, so it produces the best imputation result. That is because: (1) It fully takes into account the spatio-temporal information of precipitation. (2) It assumes that the mean varies in whole study area, which is more in line with the actual situation for rainfall.
1488
Authors: Zhi Hui Fu, Cui Xin Peng, Bin Li
Abstract: Missing data are often a problem in statistical modeling. How to estimate item parameters with missing data in item response theory (IRT) is an interesting issue. The Bayesian paradigm offers a natural model-based solution for this problem by treating missing values as random variables and estimating their posterior distributions. In this article, based on a data augmentation scheme using the Gibbs sampler, we propose a Bayesian procedure to estimate the multidimensional two parameter Logistic model with missing responses.
3830
Authors: Wen Fu Wu, Hong Xun Tian, Lin Qi Ma, Juan Su, Song Huai Du
Abstract: The reliability assessment of power system components is the key to power system safe and reliable operation. According to the record of actual operation of power system components characteristics, a practical method of reliability evaluation for fault date which is missing components is introduced in this paper. The method is based on Weibull distribution model. It is also a calculation and solving methods of reliability evaluation fault system of missing data components. The system mainly includes sample selection of data, parameter calculation, model checking and failure rate curve drawing. So the transformer as an example, verify the effectiveness of this method. In addition the paper analyses the influencing factors in the transformer reliability, such as voltage grade, manufacturer and product types. In a word, this model provides a systematic and effective method for the practical application of the reliability evaluation of the electrical components.
1886
Authors: Zhen Dong Li, Meng Meng Li
Abstract: Is EM algorithm parameter estimation under Rayleigh distribution sensitive to missing data and if it is, what extent is it By designing computer simulation methods, contrast and analyze the results of maximum likelihood estimation and EM algorithm estimation under different missing rate. It shows that the results were almost identical when the missing rate is below 0.30, but the efficiency of EM algorithm gradually deteriorates as the missing rate increases. Meanwhile the results also show that the EM algorithm is sensitive to sample size and the selection of initial value.
278
Authors: Chong Chen, Hua Yu, Ju Yun Wang
Abstract: Under the background of learning Bayesian network structure, we proposed a new method based on the KNN algorithm and dynamic Gibbs sampling to fill in the missing data, which is mainly used to solve the problem of how to learn the Bayesian network structure better with missing data sets. The experiments based on Asia Network show that, this method can restore the original data very well, which will make it available to use some Bayesian network structure learning algorithm only based on complete data. This method will expand the scope and improve the effect of Bayesian networks application.
906
Authors: Ying Zhong Shi, Min Xu, Pei Lin Liu, Ping Li
Abstract: The classical regression systems modeling methods only consider the single scene, which has the weakness: partial information missing may weaken the generalization abilities of the regression systems constructed based on this dataset. A regression system with the Knowledge transfer learning abilities, i.e. Knowledge Based ε-Support Vector Regression (KB-ε-SVR for brevity) is proposed based on ε-support vector regression. KB-ε-SVR can use the current data information sufficiently, and learn from the existing useful historical knowledge effectively, so that remedy the information lack in the current scene. Reinforced current model is obtained through control the similarity between current model and history model in the object function and current model can benefit from history scene when information is missing or insufficient. Experiments show that KB-ε-SVR has the better performance and adaptability than the traditional ε-support vector regression methods in scenarios with insufficient data.
472
Authors: Cheng Dong Wei, Fu Wang, Huan Qi Wei
Abstract: We discuss the empirical Bayesian estimation and the noninformative prior Bayesian estimation of Exponential parameter in the missing data occasion. By setting different prior distributions, we get different bayesian risks and compare the numerical simulation results through the MATLAB programming.
904