Estimation of Uncertainty in the Lateral Vibration Attenuation of a Beam with Piezo-Elastic Supports by Neural Networks

Quantification of uncertainty in technical systems is often based on surrogate models of corresponding simulation models. Usually, the underlying simulation model does not describe the reality perfectly, and consequently the surrogate model will be imperfect.In this article we propose an improved surrogate model of the vibration attenuation of a beam with shunted piezoelectric transducers. Therefore, experimentally observed and simulated variations in the vibration attenuation are combined in the model estimation process, by using multi--layer feedforward neural networks. Based on this improved surrogate model, we construct a density estimate of the maximal amplitude in the vibration attenuation.The density estimate is used to analyze the uncertainty in the vibration attenuation, resulting from manufacturing variations.


Introduction
In this article we introduce a new method for uncertainty quantification in complex technical systems. Our main goal is to estimate the uncertainty in the maximal vibration amplitude of a beam system with piezo-elastic supports described in Figure 1. This system consists of a beam with circular cross- Fig. 1: A CAD model of the lateral vibration attenuation system with piezo-elastic supports and a sectional view of one of the piezo-elastic supports, cf. [3]. section embedded in two piezo-elastic supports A and B where support A is used for lateral beam vibration excitation and support B is used for lateral beam vibration attenuation, as proposed in [3]. The two piezo-elastic supports A and B are located at the beam's end and each consist of one elastic membrane-like spring element made of spring steel, two piezoelectric stack transducers arranged orthogonally to each other and mechanically prestressed with disc springs as well as the relatively stiff axial extension made of hardened steel that connects the piezoelectric transducers with the beam. For vibration attenuation in support B, optimally tuned electrical shunt circuits are connected to the piezoelectric transducers.
If we produce and assemble the piezo-elastic support several times, the system properties will differ due tu manufacturing and assembly variations. Consequently the outcome of an experiment with such a system will vary. E.g., building such systems ten times and measuring the maximal attenuated vibration amplitude in an experiment with each of the built systems, we got the following ten values: y 1 = 14.50, y 2 = 14.17, y 3 = 14.37, y 4 = 14.16, y 5 = 14.28, y 6 = 13.51, y 7 = 14.73, y 8 = 13.21, y 9 = 13.05, y 10 = 16.26. (1) We assume in the sequel that y 1 , …, y 10 are independent realizations of a real-valued random variable Y , and in order to get information about the distribution of Y we try to estimate the density g : R → R of Y with respect to the Lebesgue measure (which we assume to exist). The classical statistical approach of doing this is to assume that Y is, e.g., normally distributed, to estimate its mean and its variance by maximum likelihood and to use the density of the corresponding normal distribution as an estimate of the density of Y . For the data in (1) this results in the black solid curve in Figure 2. However, the maximum vibration amplitudes represent extreme values of the lateral beam vibration transfer behavior. According to [2], the distribution of extreme values is characterized by a non-symmetric distribution about the most likely value. Thus, this approach does not seem promising.

294
Uncertainty in Mechanical Engineering III The standard approach in modern statistics would be to use a nonparametric estimate of the density of Y , e.g. the classical kernel density estimate of [8] and [7] which we apply in the above formula to random variables Y 1 , …, Y n which are independent identically distributed as Y . Here K : R d → R (so-called kernel, which is assumed to be a density) and h n > 0 (so-called bandwidth) are parameters of the estimate. E.g., computing this kernel density estimate with the routine ksdensity() in M AT LAB results in the gray dashed curve in Figure 2.
The obvious drawback of the first approach is that the error of this parametric estimate might be rather large in case that the true density of Y is not the density of a normal distribution, in particular if it cannot be approximated well by any such density. However, due to the small sample size in this example it is not clear that the second approach, i.e., the nonparametric density estimate, yields an estimate which is better than the parametric estimate. So in general neither of these two approaches will lead to satisfying results.
Unfortunately, due to cost and time limitations, it is not really possible to increase the sample size 10 of the data (1) in such a way that a nonparametric estimate seems promising, since experiments with the above system (in particular the construction and replacement of the membrane-like spring elements) are extremely time consuming. What we do instead in the sequel is to use some knowledge outside of the data (e.g., knowledge from engineering science about attenuation systems) in order to improve our estimation.
Often this is done in the framework of Bayesian statistics, where some kind of a priori distribution describing the system under consideration is assumed to be given, and under the assumption that this is indeed true, estimates are constructed which achieve good results even for very small sample sizes. However, this is an example of the saying "We buy information with assumptions" [1], which of course might lead to wrong informations in the case of wrong assumptions. And since it is not obvious how to transform the knowledge in engineering science into assumptions about an a priori distribution, we will not use this approach.
Instead, we will use the following knowledge in engineering science in order to construct an improved estimate: It is known that five parameters of the membrane in the attenuation system vary during the construction of the attenuation system and influence the maximal vibration amplitude: the lateral stiffness in direction of y (k lat,y ) and in direction of z (k lat,z ), the rotatory stiffness in direction of y (k rot,y ) and in direction of z (k rot,z ), and the height of the membrane (h x ). For given values of these five parameters it is possible to compute in a physical model of the attenuation system the corresponding maximal vibration amplitude. In order to generate values of these five parameters we need to determine their distributions. Therefore we measured the corresponding parameters for the ten built systems. As a result we got the data in Table 1. The four discrete stiffness properties as well as the height are assumed to be normally distributed and besides we assume that they are independent. Consequently we use the 10 data points for each of the five parameters in order to estimate parametrically a normal distribution by a maximum likelihood estimate for each of these parameters, cf., Figure  3. With the physical model and the now known distribution of the 5 dimensional random vector X we can compute the outcome Y of an experiment with our technical system by a computer program m : R 5 → R. We assume that the computer program is a good approximation of the values we get for the maximal vibration amplitude in experiments.
In this stochastic model of our attenuation system we can generate independent data X n+1 , …, X n+Ln , compute m(X n+1 ), …, m(X n+Ln ) and define a kernel density estimate bŷ ) .  However, the evaluation of the computer program for our technical system will often be rather time consuming and consequently L n (although much larger than n) might not be really large. One possibility to circumvent this problem is to define an estimate of g on the basis of the data

Uncertainty in Mechanical Engineering III
Computing such an surrogate density estimate results in the yellow line in Figure 2. Alternatively, one can also ignore the simulation model completely, and can use instead the data . . , X n+Ln+Nn (6) in order to construct an estimate of m * (x) = E{Y |X = x}, and can define the corresponding surrogate density estimate bŷ The main question which we want to investigate by simulations in this paper is whether there exist situations in which suitably defined estimates based on the complete data (where n, L n , N n ∈ N) achieve simultaneously better rate of convergence results than the estimates (2), (5) and (8).

A Method for Estimating Uncertainty by Improving a Surrogate Model with Real Data
In the following section we introduce a method which combines experimental and computer model data to generate an estimate of the probability density function g : Definition of a class of neural networks In order to construct such an estimate, we proceed as follows: Let σ : R → R be the so-called logistic squasher σ(x) = 1/(1 + exp(−x)) (x ∈ R).
For M ∈ N, d ∈ N and d * ∈ {0, . . . , d}, we denote the set of all functions f : R d → R that satisfy We will use the following recursively defined classes of neural networks (with parameters K, M , d, d * ∈ N): For l = 0, we define our space of hierarchical neural networks by For l > 0, we define recursively These classes of neural networks where introduced in Kohler and Krzyżak (2017) as an approximation for so-called hierarchical interaction models. We do not explain the advantages of hierarchical interaction models at this point, but we want to remark that it is a realistic assumption in connection with complex technical systems which are built in a modular way.
Definition of the estimate Given the data from Table 1 and a computer model of the technical system we want to estimate the density g of Y.
We start by estimating the distribution of our independent variables X with a maximum likelihood estimate, where we assume that the underlying distribution is known. With the estimated distribution of X we generate a samples of size L n and N 2,n . We use the first sample to generate the data set (3). The second one is necessary in the density estimation later on.
Next we define a surrogate estimate m Ln (·) = m Ln (·, (X n+1 , m(X n+1 )), . . . , (X n+Ln , m(X n+Ln ))) : of the function m. For this we use a least squares estimate defined by where K 1 , M 1,n , d * ∈ N are parameters of the estimate. For simplicity we assume here and in the sequel that the minimum above indeed exists and to compute an approximation of the minimum we use the Levenberg-Marquardt algorithm, cf. [5] or [6]. In the implementation of the surrogate estimate we use the MATLAB routine lsqnonlin() which implements this algorithm. To choose suitable parameters K 1 , M 1,n and d * we use the splitting of the sample technique, where we calculate the minimum on 2 3 · ⌈L n ⌉ training data and evaluate the L 2 error on the remaining L n − 2 3 · ⌈L n ⌉ test data. The parameter combination with the smallest L 2 risk is selected, where we choose parameter from the sets l ∈ {0, 1, 2}, K 1 ∈ {1, 2}, d * ∈ {1, . . . , d} and M 1,n ∈ {1, . . . , 5, 6, 16, . . . , 46}.
Next we define an estimate of m * − m Ln on the basis of the residualŝ Therefore we definê mε n (·) = arg min for some weight w (n) ∈ [0, 1] and parameters K 2 , M 2,n , d * ∈ N. Again we use the Levenberg-Marquardt algorithm to compute an approximation of the solution of the above minimization problem.
Therefore we choose a kernel K : R → R and a bandwidth h N 2,n > 0 and set where we use the MATLAB routine ksdensity().

Application to Simulated and Real Data
To be able to rate the performance of our newly proposed estimate we compare it to other classical approaches in a simulated environment. Finally we will apply the new method to the data from our lateral vibration attenuation system. We compare our estimate (est. 4) with three other density estimates. The first one (est. 1) is a standard kernel density estimate applied to a sample of size n of Y , cf. (2). The estimates 2 and 3 are surrogate density estimates where the kernel density estimate of MATLAB is applied to a sample of size N 2,n of the surrogate model. For the second estimate (est. 2) the surrogate model is chosen as a neural network trained on L n realizations of (X, m(X)), cf. (5). For the third estimate (est. 3) the surrogate model is chosen as a neural network trained on n realizations of (X, Y ), cf. (8).
In all simulations we neglect the parametric density estimation of the input variable X and assume a perfect approximation of the density is available. We choose X as a uniformly distributed random vector on [0, 1] d . Furthermore we choose a random error ϵ which is independent from X and uniformly distributed on [−1, 1]. The dependent variable Y , which is the outcome of an experiment with the technical system is defined by Y = m * (X) + σ * · λ * · ϵ for some m * : R d → R, a noise factor σ * ∈ {0.05, 0.2} and λ * > 0 selected as the empirical interquartile range of m * (X). We define the computer program which simulates the technical system by a functional shift of m * , i.e.
We consider three different models. In each model we use sample sizes n = 10, L n = 200, N 1,n = 200 and N 2,n = 10 4 . The different functions used as m * are listed below.
As mentioned before, the parameter λ * is chosen as the empirical interquartile range of m * (X) calculated on 10 5 realizations of X. The used values are λ * 1 = 9.11, λ * 2 = 5.68, λ * 3 = 13.97 and λ * 4 = 1.64. We set v m (x) =x 1 + tan(x 2 ) + x 3 3 + log(x 4 + 0.1) + 3 · x 5 + x 6 thus λ m = 8.70 The density of Y is the convolution of the density of m(X) and a uniform density. We do not try to compute its exact form, instead we compute it approximately by a kernel density estimate (as implemented in the MATLAB routine ksdensity()) applied to a sample of size 10 6 . In order to evaluate the performance of our density estimates the result is treated as if it were the real density.
The estimates are compared by their L 1 error. Therefore we approximate the integral by a Riemann sum defined on an equidistant partition consisting of 10 4 subintervals. Since we need to take the randomness of the L 1 error into account, we repeat each simulation 50 times and report in Table 2 and  Table 3 the median (and in brackets the interquartile range) of the 50 L 1 errors.  Our newly proposed estimate outperforms the other three estimates in 14 of 18 cases. In all cases if σ m is sufficiently small our estimate yields a smaller L 1 error than estimates 1 and 3, where the biggest difference is in model four where it is eight times smaller. In any simulation except one it is able to reduce the L 1 error compared to the surrogate estimate on computer model data (est. 2). Due to the complexity of the used model functions m * and the small sample size of 10 the estimate 3 yields in any simulation the worst results.

Uncertainty in Mechanical Engineering III
We apply the four different estimates on the lateral vibration attenuation system data and illustrate the results in Figure 4. The number of experimental data is equal to 10. To improve the stability of our estimate we increase the sample sizes L n and N 1,n to 500.  Obviously the results of our newly proposed estimate are not satisfying. If one observes the positions of the Y values the estimated density of estimate 4 should be shifted to the right in comparison to the estimated density of estimate 2. We guess that this is due to the fact that the input variables are dependent. In the sequel we assume that X is multivariate normally distributed. We estimate its distribution with the mlest() routine of the mvnmle package of the statistic software R and obtain the expectation vectorμ We use the expectation vector and the covariance matrix to to generate a data set of size 10 4 of X values. These values are used to evaluate the surrogate models of the estimates 2 to 4. We illustrate the results in Figure 5.
These new results for the estimates 2 and 4 are much more plausible, if one considers the position of the Y values. The result for estimate 3 are not included in the figure, because they where shifted to the negative. But since we only have 10 real data, it is unclear how reliable the estimates 1 and 3 are.