Filling Missing Data Using Interpolation Methods: Study on the Effect of Fitting Distribution

Article Preview

Abstract:

The presence of missing values in statistical survey data is an important issue to deal with. These data usually contained missing values due to many factors such as machine failures, changes in the siting monitors, routine maintenance and human error. Incomplete data set usually cause bias due to differences between observed and unobserved data. Therefore, it is important to ensure that the data analyzed are of high quality. A straightforward approach to deal with this problem is to ignore the missing data and to discard those incomplete cases from the data set. This approach is generally not valid for time-series prediction, in which the value of a system typically depends on the historical time data of the system. One approach that commonly used for the treatment of this missing item is adoption of imputation technique. This paper discusses three interpolation methods that are linear, quadratic and cubic. A total of 8577 observations of PM10 data for a year were used to compare between the three methods when fitting the Gamma distribution. The goodness-of-fit were obtained using three performance indicators that are mean absolute error (MAE), root mean squared error (RMSE) and coefficient of determination (R2). The results shows that the linear interpolation method provides a very good fit to the data.

You might also be interested in these eBooks

Info:

Periodical:

Key Engineering Materials (Volumes 594-595)

Pages:

889-895

Citation:

Online since:

December 2013

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2014 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Hawthorne, G. and Elliot, P. (2005) Imputing Cross-Sectional Missing Data: Comparison of Common Techniques. Australian and New Zealand Journal of Psychiatry 39, 583-590.

DOI: 10.1080/j.1440-1614.2005.01630.x

Google Scholar

[2] Plaia, A. and Bondi, A.L. (2006). Single Imputation method of missing values in environmental pollution data sets. Atmospheric Environment 40, 7316-7330.

DOI: 10.1016/j.atmosenv.2006.06.040

Google Scholar

[3] Junninen, H., Niska, H., Tuppurrainen, K., Ruuskanen, J., Kolehmainen, M., (2002) Methods for Imputation of Missing Values in Air Quality Data Sets. Journal of Atmospheric Environment 38, 2895-2907.

DOI: 10.1016/j.atmosenv.2004.02.026

Google Scholar

[4] Chapra, S.C. and Canale, R.P., (1998) Numerical Methods for Engineers. Singapore: McGraw-Hill.

Google Scholar

[5] Ayyub, B.M. and McCuen, R.H., (1996) Numerical Methods for Engineers. New Jersey: Prentice-Hall.

Google Scholar

[6] Evans, M., Hastings, N. And Peacock, B. (2000) Statistical Distribution. United States of America: Wiley Series.

Google Scholar