A Method for Benchmarking of FEM Packages for Multi-Stage Sheet Metal Forming Simulations

Computer simulation plays a crucial role in the designing of sheet metal stamping processes for the prediction of process output, before try-out die sets are manufactured. Different commercial software packages are available on the market for sheet forming simulation, but their accuracy can vary, depending on the selection of the pre-processing parameters and on their formulation. Software benchmarking can be used to select the most appropriate package for a given application. Calibration, i.e. the inverse determination of the correct set of pre-processing parameters, can be used for improving the prediction accuracy. The scientific literature on numerical simulations of sheet metal forming processes presents some examples of software calibration and very few examples of benchmarking. The literature generally neglects a critical and important issue: the inherent variability of real forming processes. In this work, the experimental results of two similar multi-stage deep drawing processes are presented and compared to the simulation output of two popular software packages used in the industry. Statistical methods for benchmarking and calibration are proposed. The paper demonstrates how benchmarking can be misleading if process variability is not considered.


Introduction
Computer simulation plays a crucial role in the designing of sheet metal stamping processes for the prediction of process output, before try-out die sets are manufactured, to increase productivity, compress time-to-markets and improve product quality [1]. Most companies in the automotive industry perform sheet stamping simulations on a regular basis [2], and the Finite Element Method (FEM) is the dominant technology in this field. Many software packages are commercially available for sheet metal forming simulation, both general purpose (e.g. Ls-Dyna and Abaqus/Explicit) and specially designed (e.g. Autoform, Pam-Stamp, Optris, Indeed, Stampack). The accuracy of FEM packages can vary, depending on the selection of the pre-processing parameters and on their mathematical formulation [3]. For this reason, benchmarking of alternative software solutions has been the focus of a few studies in the scientific literature. As an example, the springback predictions with Optris and Ls-Dyna were compared already in year 2000 in a SAE technical paper [4]. The Numisheet conference regularly organizes benchmark comparisons to verify the state of the art in the prediction of complex phenomena such as: blank draw-in and springback [5], forming limits [6], mechanics of incremental forming [7], etc. Other benchmark geometries have been proposed by Roberts at el. [8]. Most examples in the literature focus onto the geometrical definition of the benchmark and on the mechanical response variable of interest, but little attention is generally given to the mathematical or statistical treatment of the results.
Recently, Pimentel et al. [9] proposed a comprehensive study based on the Numisheet 2008 Benchmark #2, to compare three commercial packages. The authors conclude that the accuracy of the FEA tools is roughly the same. Amaral et al. [10] used the Numisheet 2016 springback benchmark to discuss some critical numerical issues in the prediction of springback in sheet metal forming.
In the Numisheet benchmarks and in the above cited papers, the results are not compared according to a quantitative method and, besides, they are not confronted to the variability of the process (process capability).  Processes for the production of the two components. It can be seen that, although processes are very similar, Part B is produced through a higher number of forming stages.
In this paper, we propose a method for benchmarking of different codes, which uses a statistical test in order to take the process capability into account and we demonstrate how benchmarking can lead to misleading results if the real process capability is not considered. The paper will compare the results of the two codes (AutoForm and Pam-Stamp), which have a great industrial diffusion. However, the purpose of the paper is not to perform the benchmark, but to propose a methodology for benchmarking. The results cannot be taken as an indication to compare the two software packages in general terms for any applications, but they can only be limited to the present study.
In the following Section, the two multi-stage deep drawing experimental test cases will be described. Then, the numerical setups of two FEM models developed with two different FEM commercial codes are described. In the last Section, the benchmarking between the two codes is performed with statistical tests, both for the draw-in and for the thickness.

Experimental Test Cases
The test cases are two stamping processes of stainless steel AISI 304 exhaust components for automotive applications. The components, shown in Figure 1, are characterized by the same initial sheet thickness (t0=1.75 mm), similar dimensions and shape, same level of requested tolerances. The main geometrical difference is a more severe asymmetry in the shape of the second component, called part B.
The stamping process for both parts is performed out of a steel strip (width w0= 240 mm) and using progressive dies. The whole cycle includes several stages for trimming, forming, punching, flanging and calibration, but only the forming operations will be simulated. The first part of the production cycles is shown in Fig. 2 The most relevant responses for these parts are the thinning of the part, especially in the collar and the draw-in. Both responses play an important role in the design stage of the die set. When thinning exceeds a limit prescribed by the customer, the part is defective. When excessive draw-in occurs during forming, the outer profile might cross the external trim line, and the part cannot match its designed shape after the flanging operation. Both risks must be reduced thanks to FEM simulations, which should be accurate with respect to the draw-in and thickness prediction.
On each of the 28 samples, 7 measurements of draw-in and 7 measurements of thickness have been taken, on the locations (or sites) shown in Fig. 3. The draw-in is not measured as conventionally, i.e. as a displacement of the flange contour, but the distances at corner locations on the parts (indicated in Fig. 3 too) have been taken. This unusual method of measuring the draw-in has been chosen because it significantly reduces the measurement error, i.e. it reduces the measured process variability.

FEM Simulation Setups
FEM simulations have been performed using AutoForm plus R6 by AutoForm Engineering GmbH and Pam-Stamp 2015.1 by ESI Group.
Positioning and constraint of the sheet were set using physical pilots and blankholder force was set as variable, considering the stiffness of the gas springs that apply their load onto the blankholder. The benchmarking simulations were run using the default input parameters suggested by the two codes. The material in both cases was modeled as elastic-plastic with no dependence on strain rate nor kinematic hardening. Since AISI 304 is a very common material, both software codes provide a default set of material parameters within their built-in data-bases. The default associated flow rules and hardening laws have not been changed. The only modification to the default values has been done to the anisotropy Lankford's coefficients, because a preliminary sensitivity study has shown that the thickness and draw-in results are very significantly influenced by these parameters and the default values were not correct. Therefore, the sheet metal has been tested according to the ASTM E517 procedure and the correct experimental Lankford's coefficients have been used. Values of main parameters can be found in Table 1.
The simulations were performed on the same desktop computer, with significantly different CPU times. Each run required about 60-80 minutes with Pam-Stamp and about 5-8 minutes with AutoForm.
In post-processing, measurements of the thickness have been taken by creating an auxiliary geometry with the measurement points using CAD software and then importing it in the simulation packages, for having precise references of the measurement points. Draw-in results instead are measured exporting the boundaries of the deformed parts into a CAD software, and then measuring the distances into the CAD environment.
where t is the thickness, d is the draw-in and the subscripts indicate: s = AutoForm, Pam-Stamp software packages j = 1, 2, 3, 4 forming stages k = A, B, C, D, E, F, G measurement sites m = 1, 2, 3, 4 experimental replicates The errors can also be measured as absolute percentage values �%∆ �, �%∆ � or as absolute differences: A total of 168 values (2 sw packages x 3 stages x 7 sites x 4 experimental replicates) are therefore available for part A and 224 for part B, which has 1 more forming stage.

Benchmarking Methodology and Results
A benchmarking method is here proposed, based on the statistical analysis of the numericalexperimental errors, using the ANOVA (Analysis of Variance) technique, which performs multiple

Achievements and Trends in Material Forming
tests of hypothesis, with the Fischer's F statistics. The statistical software package Minitab has been used for computations and graphs. This method can be useful whenever, as in the present case, multiple measurements, multiple experimental replicates and multiple forming stages are available. The software package can be therefore statistically tested as one of the factors of the ANOVA. Benchmark on the draw-in prediction. As a first, most general comparison, a general test has been conducted using all the 392 values of errors on the draw-in. The ANOVA table on the error with sign %∆ is reported as Table 2. It clearly indicates that there is a difference between the two software packages, as testified by the p-value being equal to zero.  The corresponding boxplot of data, grouped by software type, is given in Figure 4. The mean error is -0.71% for Autoform and -0.023% for Pam-Stamp. In conclusion, the percentage prediction error is extremely small for both software codes, but Autoform underestimates, on average, the material draw-in.
The same kind of analysis can be run again using the absolute errors �%∆ � instead of the errors with sign. The ANOVA has been run by performing a so called "Box-Cox" transformation of the response variables. This transformation is required when the residuals on the analysis are not normally distributed, in order to improve their normality. The corresponding Table 3 presents the results. Here again the software package is statistically significant, with an average absolute error for Autoform equal to 0.74% and for Pam-stamp equal to 0.55%. From an engineering point of view, the predicting capability is not that different, if looking at the absolute percentage error.  A deeper analysis can be done, adding additional factors to the ANOVA and making different benchmarks for the two parts A and B. The other factors are the forming stage and the measurement location. Two different ANOVA analyses have been performed using the absolute difference �∆ � (see equation 2) as the response variables, respectively for parts A and B. The ANOVAs for the draw-in are reported in Table 4. The advantage of the ANOVA is that it performs a simultaneous benchmark test over all sites and stages. The analysis shows that all factors and all first and second order interactions are statistically significant, since all the p-values are zeros. Therefore, one software package significantly predicts the draw-in better that the other. However, the presence of interactions means that the difference between the two software packages is not uniform over all forming stages and measurement sites.

PAM-STAMP AutoForm
In combination with an interaction plot ( Figure 5), the ANOVA table allows to effectively perform a benchmark. For part A, Figure 5 shows that Pam-Stamp overperforms AutoForm in most measurement sites and stages, except sites E and F. For part B, Figure 5 shows that Pam-Stamp overperforms AutoForm in stages 1 and 4, and in sites A to D.
In this case, all factors and interactions are statistically significant. Therefore, a non-statistical comparison of the two packages based on the graphical snooping of the errors ∆d sjkm or a comparison of average errors would have led to similar conclusions.
From a technological point of view, it must be noted that the errors in location A for part A and in locations B and D for part B have the largest values for both codes, but they are larger for AutoForm.

Benchmark on the thickness prediction.
A similar approach has been followed for the comparison on thickness. As a first general comparison, an overall test has been conducted using all the available data. The ANOVA table on all 392 values of %∆ ( Table 5) clearly indicates that there is a difference between the two software packages, as testified by the p-value being equal to 0. The corresponding boxplot of data, grouped by software type, is given in Figure 6. The mean error is negative (-1.81%) for Autoform and +0.70% for Pam-Stamp. In conclusion, the percentage prediction error is very small for both software codes, but Autoform underestimates the thickness, on average, while Pam-stamp yields an overestimation, although closer to zero. The same analysis can be run using the absolute errors �%∆ �, rather than the signed percentage error. The ANOVA has been run by performing a square root transformation of the response variables, to improve normality of the residuals. This transformation is required to improve the normality of regression residuals. The corresponding Table 6 presents the results. Here there is no statistical nor practical difference between the two software packages. While the average absolute error in thickness for pam-stamp is 1.79% and it is 1.95% for Autoform, this difference is not statistically significant. Model Summary for Transformed Response S R-sq R-sq(adj) R-sq(pred) 0,566189 0,32% 0,06% 0,00% A deeper analysis can be done, adding additional factors to the ANOVA and making different benchmarks for the two parts A and B. The benchmark analyses in this case is more complicated than for the draw-in estimation.
For part A, the main factor "stage" is not statistically significant, i.e. no package has an overall better performance. However, there is a significant interaction with the measurement site. As the interaction plot in Figure 7 shows, Pam-Stamp has a significant error on thickness at site F, while AutoForm has a larger error at site A. The errors in other sites and across the three forming stages are different but these differences are not statistically significant, i.e. they are not larger than the natural scatter of the experimental data. In this case, a non-statistical comparison of the two packages based on the graphical snooping of the errors ∆t sjkm or on a comparison of average or maximum errors would have provided misleading conclusions.
For part B, the main factor "stage" is statistically significant with its first order interactions, and this is well explained by the bottom part of Figure 7. The figure shows that Pam-Stamp generally overperforms AutoForm on all locations, except for measuring site B.