Resilience in Mechanical Engineering - A Concept for Controlling Uncertainty during Design, Production and Usage Phase of Load-Carrying Structures

Resilience as a concept has found its way into different disciplines to describe the ability of an individual or system to withstand and adapt to changes in its environment. In this paper, we provide an overview of the concept in different communities and extend it to the area of mechanical engineering. Furthermore, we present metrics to measure resilience in technical systems and illustrate them by applying them to load-carrying structures. By giving application examples from the Collaborative Research Centre (CRC) 805, we show how the concept of resilience can be used to control uncertainty during different stages of product life.


Introduction
Load-carrying structures in mechanical engineering have traditionally been developed for a given design point (e.g. a given load the structure should be able to sustain). In the last decades and in the course of many catastrophes and product recalls, however, researchers have realized that uncertainty is part of any application of a product and cannot be disregarded [1]. While designing structures which fulfill their purpose in a whole neighborhood of the design point, the so-called uncertainty set [2], already mitigates some of the uncertainty, the next logical step is to design resilient structures that can even cope with failures of components or other effects disregarded during design phase.
Moreover, resilience is a concept that cannot only be applied to control uncertainty during design, but also during operation by integrating the four resilience functions monitoring, responding, learning and anticipating [3]. In this context, resilience can be regarded as a paradigm shift: Instead of designing systems and processes that are robust regarding specific single assumptions made during the design phase (asking "What if…?") the goal is to build systems and design processes that perform "No matter what!".
The aims of this paper are the following: i.) Give a literature overview of resilience in neighboring fields and show how the concept of resilience can be extended to mechanical engineering. ii.) Provide a first approach for the quantification of resilience of technical systems.

Resilience of Infrastructures and Networks.
Regarding infrastructures and networks, the term resilience has been used when investigating the influence of disruptions and extreme events. Todini [11] introduces a so-called resilience index to measure the capability of water distribution networks to cope with internal losses due to component failures and proposes a heuristic to increase its value. In computer science, the concept of resilience is often used to describe the vulnerability of networks like the internet to attacks on specific nodes or edges [12]. Many of the approaches in the network community use graph-theoretic concepts to measure the resilience of networks, like the size or width of the largest connected component after a failure [13,14].
Resilience in the Field of Mathematical Optimization. In some contexts, mathematical optimization models can be used to describe and increase resilience. General tools to tackle the underlying uncertainty within mathematical optimization problems are stochastic optimization and robust optimization. The former models uncertain parameters by random variables and optimizes or constrains the expected value or value at risk of some function on these random variables, see, e.g., [15]. Robust optimization [2], on the other hand, defines an uncertainty set of possible realizations of the uncertain parameters. Robust solutions are sought, which are feasible for each realization. By generalizing these uncertainty sets to more than just uncertain parameters, these approaches can also be used to increase the resilience of the optimal systems. Optimization techniques have been used to increase the resilience in the areas of infrastructure networks [16,17,18,19], process engineering [20], industrial symbioses [21] and optical networks [22]. For these cases, resilience is usually defined by maintaining a predefined minimal performance after failures.

188
Uncertainty in Mechanical Engineering III

Extending the Concept of Resilience to Mechanical Engineering
Since the definitions described above either focus significantly on human factors or ignore the possibility of recovery, an understanding specifically tailored to mechanical engineering is needed. It should, on the one hand, be applicable to autonomous systems without human interference, but on the other hand, should also allow for possible repairs and a subsequent recovery. The corresponding definition, developed within CRC 805, is the following: Resilient Technical System. A resilient technical system guarantees a predetermined minimum of functional performance even in the event of disturbances or failure of system components, and a subsequent possibility of recovering at least the setpoint function. Resilience can be increased by adjusting the system state via monitoring, responding, learning and/or anticipating, as well as by systematically designing the system topology.
Given this understanding, resilient systems are characterized not only by their ability to withstand disturbances, but also to be "safe-to-fail" [23] and to be able to recover. One way to achieve these properties is to equip technical systems with the four resilient core functions monitoring, responding, learning and/or anticipating [3]. In product development, general design principles for resilient systems can be derived, and mathematical optimization can be applied to systematically build resilient topologies.

Resilience Metrics
Once an understanding of the concept of resilience is established, the next logical step is to engineer the resilience of technical systems. Key to achieve or improve resilient system properties is their assessment and quantification. Different concepts of measuring resilience have been proposed in the literature, both qualitatively [24], and quantitatively [25]. However, many of the proposed concepts are described in a general manner and are not applied to actual technical systems.
In this chapter, we show a compilation of resilience metrics that can be used to measure and thus engineer the resilience of load-carrying structures. Some were developed within CRC 805, others are based on results from the literature as indicated in the text. The different metrics are illustrated with the help of diagrams. In these diagrams, the functional performance of a technical system is plotted versus the influencing factors. For reasons of clarity, we chose a univariate depiction. In general, there may be more than one factor influencing the system's performance.
Performance Range. The Performance Range of a technical system describes the range of possible values of influencing factors for which the system is able to achieve a predefined required minimum functional performance min . This can mathematically be expressed by the so-called "superlevel set" of a mathematical function : ⟶ ℝ: The performance range corresponds to the subset of influencing factors for which the functional performance reaches at least min . For illustration, compare Fig. 1 which visualizes the performance range for a univariate function.
As a metric based on this, we introduce the Radius of Performance. This measure describes the minimum distance between the design point and a realization of an influencing variable for which min can no longer be reached, cf. Fig. 1. Further metrics based on the performance range are possible (e.g. the area above min , or the performance range times a weighing factor that gets smaller with a growing distance from the design point) and can be defined depending on the specific application. Margin. According to [9, p. 23], the Margin of a technical system describes how closely or how precarious the system is currently operating relative to one or another kind of performance boundary. We quantify the Margin by measuring the distance between the functional performance at the design point and the required minimum of functional performance, cf. Fig. 2. It can thus be calculated by where ( ) is the functional performance at design point , and min is the required minimum of functional performance. Gracefulness. The Gracefulness describes the behavior of a technical system at the boundary of its performance range (sometimes also referred to as "graceful degradation" [9, p. 23]). Mathematically, it is defined by the directional derivative of the functional performance curve in the direction of a given influencing factor or a vector of multiple influencing factors, cf. Fig. 2. In case of non-differentiability, the limit from the direction of the design point can be used, if it exists.
Buffering Capacity. The term Buffering Capacity [9, p. 23], refers to the extent of tolerable disturbances that the system can bear. In case of load-carrying structures, we define the Buffering Capacity as a measure for the amount of structural change after which the fulfillment of a predetermined required minimum of functional performance min can still be guaranteed. Depending on the context, the Buffering Capacity can attain continuous or integer values. In case of integer values, it describes the maximum number of components that can fail while still maintaining the required minimum of functional performance. It is important to note that this definition refers to the worst-case functional performance ( ), cf. Fig. 3.

Uncertainty in Mechanical Engineering III
where is the uncertainty set. The smallest possible set includes only the design point of the system. Fig. 4 shows an example for a system with Buffering Capacity = 3. Note that the concept of the integer-valued Buffering Capacity is linked to the measure "(m,k)-survivability" in resilient network design [26]. Rapidity. The Rapidity measures the system's capacity to restore functionality in a timely way, containing losses and avoiding disruptions [6]. It can be computed by the difference post − pre , cf.    Figure 5. Metrics for measuring the system's ability to recover: Rapidity and "Resilience Triangle". Cf.
[5] Figure 6. Alternative measure for the system's ability to restore its functionality based on the indefinite integral and discounted future gains or losses.
"Resilience Triangle". Another measure for the restoration of the system's functionality is the socalled Resilience Triangle [6]. The idea of this metric is to measure the total losses (in utility or revenue) until the functionality is restored. It was originally proposed in [6] to measure this total loss by approximating the difference between the actual performance curve and the original preimpact performance using a triangle, cf. Fig. 5. The mathematically more profound way to measure it, however, is to take the integral over the loss function between pre-and post-impact times see, e.g., [10]. One of the well-known shortcomings of this measure, however, is that it fails if the pre-and post-impact steady states differ, cf. Fig. 6. One possibility to overcome this would be to use an idea from finance and discount the future gains/losses using some sort of "interest rate" and then take the indefinite integral taking all future gains/or losses into account: Comparison between static and time-dependent Metrics. Fig. 7 shows the correspondence between the static and time-dependent plots. Two versions of the same system are shown. In black the original system, which can recover to its original steady state after the disturbance. In gray the same system after suffering irreversible damage at time pre , which therefore has a reduced steady state that is reached after post .

Application Examples
In this chapter, we show how the concept of resilience can be applied to control uncertainty during the design, production, and usage phase of load-carrying structures. The proposed resilience measures are illustrated based on example systems from CRC 805.

Uncertainty in Mechanical Engineering III
Resilient Design Principles. For developing resilient system topologies, general design principles can be derived. In this chapter, two design principles are illustrated on the example of a joint brake, cf. Fig. 8. This mechanism is e.g. used in a dental treatment unit. In general, the system consists of a vertically moving object of mass which can be held in different vertical positions. The mass leads to the torque m on the joint. A spring and a brake in the joint keep the object in place by balancing m with the spring's and the brake's torque ( S + B ).
This design is robust since it ensures that the holding torque is mostly independent of considered environmental influences like temperature, wear or component tolerances. However, as soon as the system needs to operate under unexpected conditions, like overload, strong disturbances or failure of components, it is no longer reliable and the mass may fall.
Resilient system design helps to control uncertainty associated with such unexpected usage scenarios. By applying the principle of self-reinforcement, a minimum of functional performance min can be guaranteed in case of failures: if the main spring (spring 1 in Fig. 8) breaks, the brake's torque may not be sufficient to hold the mass. Therefore, additional springs (springs 2) are added. These additional springs cause a movement of the main spring's end stop which then pushes the lever of the brake in a self-reinforced state, and the mass is prevented from falling down. Using the principle of self-reinforcement, one is able to realize functional redundancy and a higher Buffering Capacity with a minimum of additional costs. Es müsste jetzt funktionieren, wenn es nur schwarz-weiß gedruckt wird Another way to make the system more resilient is to increase the Margin by using the principle of bi-stability. In case of failure of the main spring and overload, the self-reinforced brake will be damaged if it does not switch into a different, stable state. In this example, the self-reinforced state can be left via spring 3 which switches the system into a different state with a lower brake torque. A recovery of the system, i.e., an adaption of the system to the new conditions, could be realized by replacing the broken spring 1 [27].
Resilient Truss Topology Design. The control of uncertainty and the concept of resilience in particular cannot only be integrated into the design process guided by experienced engineers. They can also be included into software-based design processes, like those using mathematical optimization. Examples are the resilient design of fluid systems [28] or truss structures [29].
The optimization of truss structures has been a typical application for robust optimization [2]. Most of the classical papers, however, only treat uncertain forces but disregard potential failures of components, which may still happen due to wear. Complete failures of bars have only been investigated recently, where, e.g., Kanno [30] optimized trusses under the constraint that after failure of at most bars, the displacements of the nodes should still be smaller than some given bounds, similar to our definition of the Buffering Capacity. In the following, we also want to minimize the volume of a truss under the constraint that its Buffering Capacity should be at least , but in contrast to [30] we will use the semidefinite model [31] to ensure stability even after worst-case bar failures for all forces in an ellipsoidal uncertainty set described by a matrix . Given a ground structure consisting of a set of nodes (some of which are fixed to the surroundings) and a set of potential bars ∈ with constant lengths ℓ and a given upper bound on the compliance max , we can state the model for optimizing the cross-sectional areas as where ( ) = ∑ ∈ is the stiffness matrix of the truss and are the bar stiffness matrices. To design a truss with buffering capacity , we want the SDP-constraint in (6) Note that problem (7) consists of an exponential number of SDP-constraints which can make its solution computationally challenging. For a more in-depth discussion of this approach, see [29]. In Fig. 9, results are given for optimizing a crane-structure for Buffering Capacity zero, one and two, which have been solved using MOSEK 8.1 [32]. In Fig. 10, the maximum forces these structures can sustain for different angles are plotted both for the original state and after the worstcase failure of a single bar. It can be observed that the structures optimized for Buffering Capacity one and two also have larger Performance Range and Margin than the regular robust optimum. Moreover, these structures can indeed still sustain the original force after failure of any single bar.

Uncertainty in Mechanical Engineering III
Resilience in Production: Sensor-Integrated Tapping Tool. Another technology investigated within CRC 805 is a sensor-integrated tapping tool. Tapping is a two stage process at the end of the value adding chain. Therefore, there is a high demand in process stability, as tool breakage may lead to rejection of the part or time-consuming rework. With a sensor-integrated tool holder, the tool load can be monitored. So far, this technique was used to monitor the occurring cutting forces and tool displacement within a reaming process [33]. Currently, the sensor-integrated tool holder is adapted to the tapping process in order to detect process faults like axial offset between pre-drilled hole and tap, or synchronization errors. Having detected these errors, they can be compensated by adapting the machine control. A challenge is the limited time to change parameters. A way of coping with this is to gain a deeper understanding of the process, e.g. to understand why disturbances are occurring and how they can be anticipated. In case of the investigated tapping process, learning is achieved by iteratively building a model of the process. By comparing measurement and model data, indicators can be found which allow to anticipate imminent tool breakage and to take action before critical faults occur. Possible actions are switching to another tool or adjusting the process parameters in advance.

Resilience in Usage: Fluid Dynamic Vibration Absorber.
Vehicles are load-carrying systems for which controlling uncertainty during the usage phase is of great significance regarding safety. A technology developed within CRC 805 to increase resilience is the Fluid Dynamic Vibration Absorber (FDVA). In general, a vibration absorber in a vehicle suspension can be used to divert the vibration energy from the wheel to the structural expansion and thus improve driving safety. According to Mitschke [34], driving safety is defined as the standard deviation of the wheel force, the wheel load fluctuation = √Var( W ) .
The FDVA is a dynamic vibration absorber with a hydraulic transmission of inertia. Instead of a solid mass, hydraulic oil is moved by a piston through a duct. By changing the duct's cross section, the inertia is adjusted and thus the natural frequency as well [35]. The wheel flutter can be measured by the wheel transfer function =̂w/̂0, where ̂w and ̂0 are the amplitudes of the wheel travel and the road excitation, respectively. To eliminate wheel flutter, the FDVA yields a wheel transfer function close to 1 in its design point. Fig. 11 shows the wheel transfer function for a standard suspension with and without FDVA for different excitation frequencies.
However, if the wheel mass is changed, for example due to a change from summer to winter tires, the system may get out of tune and the FDVA might even worsen its behavior, cf. Fig. 12. We define the performance of the system as the difference of wheel load fluctuation without and with FDVA, i.e. as 0 − . The black line in Fig. 13 shows the performance of the system with a FDVA that was designed for a wheel mass of 40 kg. With this FDVA, the system's performance is improved for wheel masses above around 36 kg, but gets negative, i.e. is worsened, below.        If we define the minimum of functional performance min as an improvement of driving safety, i.e. 0 − ≥ 0, we can assess the performance range of the FDVA optimized for a wheel mass of 40 kg: Its limit is given by 36 kg, where the performance reaches zero. However, the actual performance range of the FDVA is bigger: If the system recognizes a changed wheel mass not corresponding to the design point of 40 kg, the FDVA can be adapted to these new conditions by changing the cross section of the duct, and thus changing its Performance Range and Gracefulness. The light grey line and the dark grey line display the performance for increased and decreased cross section, respectively. The system with FDVA with increased cross section yields a larger performance range than the original one, since an increase in performance can already be observed at lower wheel masses. However, from a wheel mass of about 38 kg upwards, a minimized functionality compared to the original FDVA is shown, as long the adaptation is not undone. In contrast, the reduction of the cross-sectional area leads to a deterioration in performance for lower wheel masses, but proves to be better for higher weights.

Resilient Process Chain.
While the examples above illustrate how uncertainty can be governed by resilience during the different phases of the product life, the concept of resilient process chains must not be limited to the production process itself. Usually, the production and usage phase of a product are taken into account separately, thus information is lost. By generating feedback from the production phase into the usage phase and vice versa, additional information can be used to control uncertainty throughout the product's life. This concept is currently investigated on the example of hydraulic actuators. These actuators are part of an active air spring developed in CRC 805 [36]. The characteristics of the spring can be actively adjusted by changing the load-carrying areas of its rolling pistons via two hydraulic diaphragm actuators [37]. For the control of uncertainty during the production phase, a holistic approach describing the process chain drilling/reaming was developed [38,39] which allows to predict the quality of bores. To investigate the possibilities of a lifespanning resilient process chain, the actuators are intentionally furnished with typical uncertainties in geometric parameters that could occur during mass production. In an experimental setup, the influence of these parameters on different quality properties (e.g. efficiency, operating time to failure) is investigated. Correlating the experimental results and the predicted quality parameters during production yields two results: Firstly, the prediction of the actuator properties during usage is possible, secondly, data retrieved during usage can be used to increase the quality of the production models, and the production itself.

Summary
In this paper, we showed how to extend the general concept of resilience to mechanical engineering. A definition of resilience specifically tailored to technical systems has been developed to take into account both the restoration process, and the possibility of autonomous systems. Furthermore, a catalogue of resilience metrics to measure and compare the resilience of different system designs was presented. Both the definition and the metrics were illustrated on different applications from the design, production and usage phase of load-carrying structures.