[2]
1 Samples It is important to select samples with uniform distribution of the constituent or constituents to be determined because accuracy of prediction and the robustness of calibration model depend on the range of the flour quality parameters[[] M. Baslar and M. F. Ertugay: TÜBİTAK Vol. 35, (2011), p.139
Google Scholar
[2]
2 Chemical analysis Samples were analyzed in triplicate and averaged. The starch parameter was performed according to the hydrochloric acid polarimetry analytical methods which was widely used in Chinese by using the automatic recording polarimeter(WZZ-1S, SPOIF, Shanghai, China), sample mass of about 2.5±0.01g and the solution of Zinc sulfate and Potassium Ferro cyanide[4].
Google Scholar
[2]
3 NIRS analysis Every kind of sample was transmitted into two parallel product cups. Flour sample scans were taken over the wavelength range from 570 to 1100nm by using a spectrometer (Infratec TM 1241, FOSS TECATOR) equipped with autocap module. Spectra were collected and managed using WinISIⅡ software, version1.50. This software allows not only the spectral acquisition but also the data treatment and the development of the quantitative models. Every product cup was scanned 60 times and obtained the representing spectra by averaging the all spectral data. The final spectra data of every sample which was used to develop model was gained by averaging representing spectral data of two parallel product cups.
DOI: 10.7554/elife.28075.010
Google Scholar
[2]
3.1 Outliers eliminating The flour samples were collected randomly and contained the samples whose spectral graph were very different with the most samples and were named outliers. If these outliers were used to developing calibration models, the adaptability and accuracy and predictive ability of models will be influenced negatively[[] V. M. Fernandez-Cabanás, A. G. Varo, J. G. Olmo and E. D. Pedro: Chemo-metrics and Intelligent Laboratory Systems Vol. 87, (2007), p.104
DOI: 10.7554/elife.31835.006
Google Scholar
[2]
3.2 Pretreatment A large amount of spectral data is usually obtained from NIR instruments and yields useful analytical information. However, the data acquired from NIR spectrometer contains background information and noise and samples physical information besides chemical composition models[[] N. Shetty and R. Gislum: Field Crops Research Vol. 120, (2010), p.31
Google Scholar
[2]
3.3 The range of spectral Different component of the flour contains molecular groups of different types and quantities. At the same time the spectral was obtained by the absorption band composited by the molecular groups, as a result the content information of different component of flour concentrated on different spectral region[[] V. T. Edward: Analytical Chemistry Vol. 15, (1994), p. 795A ,[] X. L. Chu, H. F. Yuan and W. Z. Lu: Progress in Chemistry Vol.16, (2004), p, 528 ]. In order to find the concentrated information region of starch we divide the spectral region ranging from 570-1100nm into 6 partition at intervals of 88nm. Then develop models under different spectral region, lower of the SEC and SECV value and higher of the Rc2 and 1-VR value means that the model has the better prediction ability and the corresponding spectral region is the concentrated information region of starch.
DOI: 10.3724/sp.j.1047.2012.00398
Google Scholar
[2]
3.4 The method of modeling The modeling methods usually used contains PCA (Principle Component Analysis), PLS (Partial Least Squares), MPLS (Modified Partial Least Squares, ANN (Artificial neural network) and LC (Local Calibration). PCA and PLS is the most commonly used multivariate calibration method which forms a model that specifies the relationship between a response variable(Y) and a set of dependent variables(X). However, PCA suffers from some significant limitations, the most important is the over fitting of data when there are large numbers of highly correlated variables (significantly more than the number of samples), as is often the case with hyper-spectral reflectance measurements. PLS can overcome this limitation and is slightly better than the PCR because they don't include latent variables that are less important to describe the variance of the quality paraments. PCA and PLS is not always the best option when a nonlinear model is required. MPLS is often more stable and accurate than the standard PLS algorithm. In MPLS, the NIR residuals at each wavelength, obtained after each factor is calculated, are standardized (divided by the standard deviations of the residuals at a wavelength) before calculating the next factor. When developing MPLS equations, cross validation is recommended in order to select the optimal number of factors and avoid overfitting[[] D. C. Pérez-Marín, A. Garrido-Varo, J. E. Guerrero-Ginel and A. Gómez-Cabrera: Animal Feed Science and Technology Vol. 116, (2004), p.333
Google Scholar
[2]
3.5 Validation procedure The external validation exercises were carried out using the corresponding models and validation sample set for predicting the starch content. In assessing the soundness of the calibrations performance, the main considerations were the standard error of prediction (SEP), bias, slope and the coefficient of determination in validation (Rv2). The prediction output from the calibration model for direct NIR MI measuring was compared both with the reference values and with prediction values of parameters described, using the paired samples T-test. In the process of prediction, we can use the function of bias adjustment of the WinISI Ⅱsoftware to change the bias of the calibration model, after that the prediction ability of the model can have an enhancement in a degree. Results and discussion The description statistics for all the sample sets are shown in Table1 and a wide range in starch content was observed. The standard deviation and mean indicated that the formed sets were characterized by even constituent distributions, suggesting that calibration sets will weight the calibration model equally across the entire concentration range, with minimal residuals at the extremes and relatively equal weighting at the centre. The NIR spectroscopy of the whole sample set is shown in Fig.1. From the spectroscopy graph, we can see that all the graphs have a similar changing trend. Table1 Description statistics for calibration and validation sets with regard to starch content Sample set
N
Range (%)
Mean
SD
Whole sample set
131
58.315-73.145
68.151
2.849
Structured sample set
101
58.315-73.145
68.136
2.887
Validation sample set
29
61.280-72.745
68.206
2.764
N: number of samples; SD=Standard Deviation Fig. 1 The NIR spectroscopy graph of all the flour samples
Google Scholar
[3]
1 The eliminating of outliers Using the different loading type and eliminating method, different outliers are eliminated and the models developed by using the remains have different features, which are shown in Table 2. We can concluded that the predictive ability of models developed under the combination of PCA and H-statistic expressed in terms of SEC, R2, SECV, 1-VR is higher than the other models, and this combination was determined as the best conditions of eliminating outliers. Table 2 Internal validations of quantitative NIRS models built under different outlier eliminating method loading type
Calculating method
N
SEC
R2
SECV
1-VR
RPD
PCA
R-statistic
115
1.0676
0.7699
1.2103
0.7124
1.8388
PCA
H-statistic
123
1.0524
0.8616
1.2395
0.8086
2.2825
PL1
R-statistic
115
1.0676
0.7699
1.2103
0.7124
1.8388
PL1
H-statistic
126
1.2495
0.8047
1.3849
0.7597
2.0416 Because different outliers will have a latent influence to the final model, under the best conditions, we try to determine the number of outliers. The outlier number was set from 1 to 20, this result in 20 calibration models with different calibration features which are shown in Fig.2. From the graph we can find that the optimal calibration statistics are higher under the condition of 4 outliers than the other number of outliers and we can observed that the prediction ability of models changed gradually as damping vibration and has a lowering trend along the increasing of outlier number, this is in accord with the theory descript in 2.3.1. Fig. 2 The calibration statistics of the optimal NIRS models on the basis of the structured sample set and 0-20 number of outliers elimination passes
Google Scholar