[1]
It is well known that artificial neural networks have been widely used in many fields [] Martin T. Hagan, Howard B. Demuth, Mark Beale. Neural Network Design. Beijing: China Machine Press, 2002. ]. BP algorithm is trained by the steepest descent-like algorithms and can not find global minima in many applications. Many techniques have been introduced to improve the performance of the steepest descent-like algorithms. Those studies mainly focus on: improving the learning speed [] S. Ergezinger, E. Tomsen. An accelerated learning algorithm for multilayer perceptrons: optimization layer by layer. IEEE Trans. Neural Netw., 1995, 6(1): 31-42. , [] N. Ampazis, S. J. Perantonis. Two highly efficient second-order algorithms for training feedforward networks. IEEE Trans. Neural Netw., 2002, 13 (5): 1064-1073. ], changing the network's architecture, growing and pruning algorithm [] G. B. Huang, P. Saratchandran, N. Sundararajan. A generalized growing and pruning RBF (GGAP-RBF) neural network for function approximation. IEEE Trans. Neural Netw., 2005, 16(1): 57-67. , [] M. Bortman, M . Aladjem. A growing and pruning method for radial basis function networks. IEEE Trans. Neural Netw., 2009, 20(6): 1039-1045. ], optimizing the initialized weights or some other values [] L. Behera, S. Kumar, A. Patnaik. On adaptive learning rate that guarantees convergence in feedforward networks. IEEE Trans. Neural Netw., 2006, 17(5): 1116-1125. ], convergence of online training procedure [] Z. B. Xu, R. Zhang, W. F. Jing. When does online BP training converge?. IEEE Trans. Neural Netw., 2009, 20(10): 1529-1539. ], using more additional parameters, deterministic convergence, online gradient method [] W. Wu, G. R. Feng, Z.X. Li, Y. S. Xu. Deterministic convergence of an online gradient method for BP neural networks. IEEE Trans. Neural Netw., 2005, 16(5): 533-540. ], tuning network's parameters, neural network's algorithms using genetic or evolutionary methods [] A. Khashman. A modified backpropagation learning algorithm with added emotional coefficients. IEEE Trans. Neural Netw., 2008, 19(11): 1896-1909. ], hybrid neural networks and so on. Unfortunately, the drawbacks can not be overcome completely arising from the steepest descent-like algorithms, such as the local minima, the slow convergence speed and the limited scale of problems. Therefore, the statistical sensitivity defined in (1) can hardly be used to find correct results. In recent years, a new class of algorithms for training feedforward neural networks has been proposed by Zhang [] D. Y. Zhang. New Theories and Methods on Neural Networks. Beijing: Tsinghua University Press, 2006 (in Chinese). ], who changes the weights from constant data to the cubic spline functions (weight functions), and the arguments of the cubic spline functions are input patterns. The new algorithms proposed by [13] not only simplify the architecture of neural networks, but also overcome the drawbacks by using early algorithms, such as local minima, slow convergence and difficult to obtain the global optimal point. Distinct form [13], a new algorithm for training neural networks using B-spline weight functions are introduced in this paper and the mathematic expression for statistical sensitivity is also obtained, in which the definition of statistical sensitivity is different from (1). Finally, the simulation examples show that the correctness of theoretical sensitivity formula derived from this paper. Fundamentals of training neural networks by B-splines Weight Functions Fig. 1 shows a model of B-splines function neural network, where denotes B-splines weight function. Suppose there are patterns, and is the theoretical weight function which represents the connection with the i-th input node , i. e. the i-th component of m dimensional input vectors.
Google Scholar
[4]
where, the interpolate nodes can be expressed as.
Google Scholar
[5]
Fig. 1 The architecture of neural network using B-splines function The approximate weight function and the theoretic weight function have the same value on the interpolation points, but there are some errors out of the interpolation points. Let, we have.
Google Scholar
[6]
where is called the i-th B-splines function of degree k, and.
Google Scholar
[7]
Any spline function can be expressed as.
Google Scholar
[8]
where is a constant. For the interval,the spline space is dimensional, and the function has interpolation nodes, let the interpolation points be.
Google Scholar
[9]
The k-th spline interpolation satisfies the following conditions , (10) Whereis the output value, and satisfied (10) is called the k-th spline interpolation function. If we take as spline interpolation nodes, from (10) and (8) we have.
DOI: 10.1137/1.9781611970555.ch4
Google Scholar
[11]
We can solve the linear system (11) for finding the coefficient . Statistical Sensitivity Analysis of B-splines Weight Function Neural Networks Different from (1), the definition of statistical sensitivity can be described as.
Google Scholar
[12]
Where is standard deviation of the disturbances on input patterns, represent the value of sensitivity. When noise is embedded, the theoretic error of B-splines weight function neural networks can be expressed as.
Google Scholar
[13]
Where represents some kinds of norms. Generally speaking, the theoretical noise errors of neural networks contain model error and approximation noise error. Firstly, we will analyze model error. Many norms can be adapted to measure errors. Here we use Chebyshev norm. For continuous function, Chebyshev norm is the maximum of absolute value. That is.
Google Scholar
[14]
If and in the interval are continuous functions, then and in the interval can get the maximum values, we have.
Google Scholar
[15]
where p is the index of subinterval , and , . Suppose the input without perturbation is, the input perturbation is and , then the approximation error will be.
Google Scholar
[16]
From (16) and (6), we can get.
Google Scholar
[17]
where . Let the dimension of input and output be m and n respectively, and represent approximate output and target output respectively, the output perturbation is.
Google Scholar
[18]
If the function () is continuous, then.
Google Scholar
[21]
Therefore, the sensitivity of B-splines weight function can be expressed by (12) in the following.
Google Scholar
[22]
Simulations To show the results proposed in this paper, an example is given below. The architecture of the network is 3-4, and the learning cure [13] is.
Google Scholar
[23]
The output patterns are obtained by.
Google Scholar
[24]
Then we can get B-splines weight functions by the training patterns. The statistical sensitivity of B-splines weight function neural network can be calculated by (22). For the trained neural network, suppose the output without disturbance is , and the disturbed output is , then relative error can be got by the formula as follows.
Google Scholar
[25]
When the input perturbation is very small, theoretical noise sensitivity matches approximate noise sensitivity, see Fig. 2 Fig. 3 illustrates that the output relative error is accordingly very small (close to zero), so we can conclude that the network has generation ability. As can be seen from Fig. 2 and Fig. 3, when the input perturbations increase to some extent, the sensitivity changes quite so obvious, i. e. the sensitivity may reduce, may increase, or increase and then decrease. But the output relative error increases rapidly at the same time. The network doesn't remain stable, very sensitive to the input perturbations and the output deviates from the target values. Fig. 2 Sensitivity of B-splines weight functions Fig. 3 Relative errors Conclusions Although some results of statistical sensitivity analysis have been achieved using early algorithms, further analysis of the method proposed in this paper is necessary for improved insight into its effectiveness. The sensitivity formula of B-splines weight function neural networks is derived and the correctness of theoretical sensitivity formula is verified by simulations. It can be seen that the theoretical value of sensitivity is determined by several factors. The sensitivity depends not only on the weight functions, but also on the input patterns, which could be determined by measuring the sensitivity of the network. Acknowledgment This work was supported by the Project Funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (yx002001). I thank Y. Dong, my graduate student, for her simulation examples given in this paper. References.
Google Scholar