Filling the Missing Data of Air Pollutant Concentration Using Single Imputation Methods

Article Preview

Abstract:

Hourly measured PM10 concentration at eight monitoring stations within peninsular Malaysia in 2006 was used to conduct the simulated missing data. The gap lengths of the simulated missing values are limited to 12 hours since the actual trend of missingness is considered short. Two percentages of simulated missing gaps were generated that are 5 % and 15 %. A number of single imputation methods (linear interpolation (LI), nearest neighbour interpolation (NN), mean above below (MAB), daily mean (DM), mean 12-hour (12M), mean 6-hour (6M), row mean (RM) and previous year (PY)) were calculated to fill in the simulated missing data. In addition, multiple imputation (MI) was also conducted to compare between the single imputation methods. The performances were evaluated using four statistical criteria namely mean absolute error, root mean squared error, prediction accuracy and index of agreement. The results show that 6M perform comparably well to LI. Thus, this show that the effect of smaller averaging time gives better prediction. Other single imputation methods predict the missing data well except for PY. RM and MI performs moderately with the increasing performance in higher fraction of missing gaps whereas LR makes the worst methods for both simulated missing data percentages.

You might also be interested in these eBooks

Info:

Periodical:

Pages:

923-932

Citation:

Online since:

April 2015

Export:

Price:

Permissions CCC:

Permissions PLS:

Сopyright:

© 2015 Trans Tech Publications Ltd. All Rights Reserved

Share:

Citation:

[1] Bello, A.L.: Imputation techniques in regression analysis: Looking closely at their implementation. Computational Statistics & Data Analysis 20, pp.45-57 (1995).

DOI: 10.1016/0167-9473(94)00024-d

Google Scholar

[2] Little, R.J.A.: Robust estimation of the mean and covariance matrix from data with missing values, Applied Statistics 37, pp.23-28 (1998).

DOI: 10.2307/2347491

Google Scholar

[3] Donders, A.R. T, van der Heijden, G.J.M.G., Stijnen, T., Moons, K.G.M.: Review: A gentle introduction to imputation of missing values. Journal of Clinical Epidemiology 59, pp.1087-1091 (2006).

DOI: 10.1016/j.jclinepi.2006.01.014

Google Scholar

[4] Bono, C., Ried, L.D., Kimberlin, C., Vogel, B., 2007: Missing data on the Center for Epidemiologic Studies Depression Scale: A comparison of 4 imputation techniques. Research in Social and Administrative Pharmacy 3, pp.1-27 (2007).

DOI: 10.1016/j.sapharm.2006.04.001

Google Scholar

[5] Plaia, A., Bondi, A.L.: Single imputation method of missing values in environmental pollution data sets. Atmospheric Environment 40, pp.7316-7330 (2006).

DOI: 10.1016/j.atmosenv.2006.06.040

Google Scholar

[6] van der Heijden, G.J.M.G., Donders, A.R.T., Stijnen, T., Moons, K.G.M.: Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: A clinical example. Journal of Clinical Epidemiology 59, pp.1102-1109 (2006).

DOI: 10.1016/j.jclinepi.2006.01.015

Google Scholar

[7] Junninen, H., Niska, H., Tuppurainen, K., Ruuskanen, J., Kolehmainen, M.: Methods for imputation of missing values in air quality data sets. Atmospheric Environment 38, pp.2895-2907 (2004).

DOI: 10.1016/j.atmosenv.2004.02.026

Google Scholar

[8] Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1987).

Google Scholar

[9] Schafer, J.L.: Analysis if incomplete multivariate data. Monographs on Statistics and Applied Probability No. 72. Chapman & Hall, London (1997).

Google Scholar

[10] Barzi, F., Woodward, M.: Imputations of Missing Values in practice: Results from imputations of serum cholesterol in 28 cohort studies. American Journal of Epidemiology 160, 34-45 (2004).

DOI: 10.1093/aje/kwh175

Google Scholar

[11] Olinsky, A., Chen, S., Harlow, L.: The comparative efficacy of imputation methods for missing data in structural equation modelling. European Journal of Operational Research, 151, pp.53-79 (2002).

DOI: 10.1016/s0377-2217(02)00578-7

Google Scholar

[12] Li, K.H., Le, N.D., Sun, L., Zidek, J.V.: Spatial-temporal models for ambient hourly PM10 in Vancouver. Environ-metrics 10, 321-328 (1999).

DOI: 10.1002/(sici)1099-095x(199905/06)10:3<321::aid-env355>3.0.co;2-d

Google Scholar

[13] Noor, N.M., Yahaya, A.S., Ramli, N.A., Abdullah, M.M.A.: Estimation of missing values in air pollution data using single imputation techniques. ScienceAsia 34, pp.341-345 (2008).

Google Scholar

[14] Weiss, A. and Hays, C.Y.: Calculating daily mean air temperatures by different methods: implications from a non-linear algorithm. Agricultural and Forest Meteorology 128, pp.57-65 (2005).

DOI: 10.1016/j.agrformet.2004.08.008

Google Scholar

[15] Engels, J.M., Diehr, P.: Imputation of missing longitudinal data: A comparison of methods. Journal of Clinical Epidemiology 56, pp.968-976 (2003).

DOI: 10.1016/s0895-4356(03)00170-7

Google Scholar

[16] Chen, J.L., Islam, S., and Biswas, P.: Nonlinear Dynamics of Hourly Ozone Concentrations: Nonparametric Short Term Prediction. Atmospheric Environment 32, pp.1839-1848 (1998).

DOI: 10.1016/s1352-2310(97)00399-3

Google Scholar