#### Abstract

An automatic framework for tuning plastic constitutive models is proposed. It is based on multistart global optimisation method, where the objective function is provided by the results of multiple elastoplastic finite element analyses, executed concurrently. Wrapper scripts were developed for fully automatic preprocessing, including model and mesh generation, analysis, and postprocessing. The framework is applied to an isotropic power hardening plasticity using real load/displacement data from multiple steel buckling tests. M. J. D. Powell’s BOBYQA constrained optimisation package was used for local optimisation. It is shown that using the real data presents multiple problems to the optimisation process because the objective function can be discontinuous, yet relatively flat around multiple local minima, with similar values of the objective function for different local minima. As a consequence the estimate of the global minimum is sensitive to the amount of experimental data and experimental noise. The framework includes the verification step, where the estimate of the global minimum is verified on a different geometry and loading. A tensile test was used for verification in this work. The speed of the method critically depends on the ability to effectively parallelise the finite element solver. Three levels of parallelisation were exploited in this work. The ultimate limitation was the availability of the finite element commercial solver license tokens.

#### 1. Introduction

Experimentally validated constitutive models are one of the most important components of nonlinear finite element (FE) analyses, especially if the goal is accurate and robust system performance predictions to a range of different loading scenarios. However, obtaining robust and optimally calibrated constitutive models from historic, experimental tests can require the solution of a large number of nonlinear FE analyses which may be prohibitively computationally expensive. A complex nonlinear inverse problem needs to be solved. This inverse problem can be recast as a nonlinear multidimensional optimisation of objective function . Note that may well contain problems associated with many local optima and discontinuities. In this paper we present a generic and rigorous framework for exploring this problem automatically with parallel distributed computation.

As a test case we seek to obtain the nonlinear constitutive model of steel reinforcing bars from a limited set of real, historic bar buckling tests of different geometries [1–5]. The general context of the buckling of vertical reinforcement is that it is the most common type of observed failure mechanism in concrete columns subject to earthquake loading [6–8]. This example highlights a number of difficulties with adopting an ad hoc approach rather than a rigorous and generic one based on nonlinear optimisers, such as line search or trust region methods [9, 10]. These complicating factors include (i) low gradients of , (ii) many local optima of , and (iii) a loading arrangement and boundary conditions which can result in discontinuities of .

In this paper the parameters of an isotropic power hardening plastic model are tuned using elastoplastic buckling tests of reinforcing bars. The only experimental data used in this work were the load-displacement traces from five buckling tests. All rods had the same diameter, mm, and different lengths, . The bars are identified by their ratios, 5, 8, 10, 15, and 20 [1, 2, 5].

Various inverse techniques for identification of material parameters have been investigated in recent years [11]. Although point measurement approaches (strain gauges) can deliver useful identifications, even at elevated temperatures [12], most new methods grew from rapid advances in full field experimental methods, in particular the digital image correlation (DIC) technique. Accordingly most of the published inverse methods rely on specially designed specimens, which exhibit the widest variability of stress-strain histories and/or stress triaxialities and which supply ample displacement or strain data via DIC measurements [13]. In cases where neither is available, some of the methods, for example, the virtual fields method, cannot be used effectively. Moreover, some methods are restricted by the assumption of elasticity, for example, the constitutive equation gap method, the equilibrium gap method [11], or the elastodynamics experimental approach [14, 15]. Some other inverse methods are specific to optimisation of elastoplastic properties during the manufacturing process design stage [16].

All engineering inverse methods are a subset of the simulation-based optimisation research area [17, 18]. In common with other simulation-based optimisation problems [19, 20], this work highlights two major problems: (i) the evaluation of objective functions is costly and (ii) the form of the objective function is unknown. Effective parallelisation and full automation are the key to successful simulation-based global optimisation. Multistart optimisation [21–23] is a good choice for parallel distributed computers. The choice of starting points, for the global optimiser, is taken at random or at predetermined intervals. Therefore the choice of the subsequent starting points does not depend on the local minima, , calculated from previous starting points. In such cases no interprocess communication is involved, and multiple local optimisers run completely independently. The main computational advantage is that the time of investigating many starting points is the same as for a single point. In principle, the extent of parallelisation is limited by the size of the computer.

In practice, other factors limit parallel scaling, for example, licensing restrictions on running multiple concurrent instances of commercial FE software. In this work multiple local searches are used to find the estimate of the global optimum independently. This is in contrast to the most popular hybrid global optimisation methods, where local search is used only to refine in the regions suggested by the initial stage of the global search [24–26]. While a hybrid scheme can reduce the required number of starting points and thus the number of evaluations of , it necessarily introduces latency because the choice of new starting points is made after has been evaluated from some previous starting points. This latency would limit the extent of parallel scaling and thus increase the time required to find the estimate of the global optimum.

Full automation of the simulation-based optimisation process, including automatic FE mesh generation, simulation and data extraction, and hierarchical parallelisation are the main novelties of this work. Also, this might be the first time that BOBYQA was applied to a real world industrial problem.

#### 2. The Method

Many parameters affect the results of an FE simulation of buckling of steel reinforcing bars, the elastic tensor, the compliance of the testing machine, (see Figure 1), initial imperfection or misalignment, the type of FE boundary conditions, and the plastic constitutive model. Not all parameters affect the results in equal measure. Although the compliance of the testing machine affects the displacements at all stages in the simulation, it is assumed that the plastic hardening model will dominate the onset of yield and postbuckling behaviour.

Assuming isotropic elasticity, the elastic behaviour is dominated by Young’s modulus. Accordingly it is possible to fit the elastic and the plastic parameters separately. Critically the elastic properties and the test machine compliance are subject to* linear *optimisation and hence are much easier to fit. This subproblem is not addressed here, but the results are GPa and kN/mm. Poisson’s ratio has negligible effect on the elastic part of buckling response; hence the most often quoted value for steel, of 0.33, was adopted for it.

Isotropic power hardening is assumed here for the plastic constitutive model:where is the von Mises flow stress, is the equivalent strain, is the initial yield stress, is the hardening modulus, and is the hardening exponent.

The FE simulation of buckling is conducted in two stages. In the first stage the first principal mode is calculated via the solution of the eigenvalue/eigenvector problem. In the second stage the geometry of the model is perturbed by the first principal mode multiplied by a small factor. A schematic of the FE model is shown in Figure 1. All side surface nodes in region A of the mesh are attached to a spring of stiffness along the axis of the cylinder. This was done to simulate experimental machine compliance, to make sure the FE predicted load/displacement traces can be compared against the experimental data. All side surface nodes in region B had a prescribed axial displacement . These boundary conditions were chosen to simulate the grip constraints of the experiment. The maximum initial imperfection amplitude for each model is given in Table 1.

Although it is possible and desirable to include the imperfection factor in the list of parameters to optimise, this was not done in this work for the following reasons. Higher values of imperfection are required in the shorter FE models to induce the buckling mode of deformation, and lower values are required in the longer FE models to reach the peak load observed in experiments. It is not clear whether different values of initial imperfections are consistent with the experiments. Allowing for different imperfection in each of the five FE models would add a further five parameters to optimise and make global optimisation run times prohibitively high. More details on run times are given in Sections 3 and 4.

FE analysis of buckling calculates load/displacement data. The objective function, , was constructed aswhere is the vector of plastic properties: (the factor of is used to ensure good scaling); is the number of load/displacement data points calculated by th FE model; and are the FE and the experimental loads for th displacement data point from th model and th experiment; models are used in total. Double scaling, over the number of the data points in each FE model, , and over the number of the FE models, , ensures that the objective function is not sensitive to changing these parameters.

The mechanical meaning of , as defined by (2), is the square of the load residual per point.

Each experiment provided roughly 4,000 load/displacement data points. Implicit FE solver required to load steps for the complete deformation paths in buckling models. Therefore, only about 1% of available experimental data was used in . Linear interpolation of experimental data was used if no exact experimental displacement matching the FE value was available. Linear interpolation was accepted because of relatively very high sampling rate in the experiments.

In this work equal importance is given to all FE data points. However, it is possible, in future, to add weights to (2), for example, to give more importance to data at the onset of buckling, or to postbuckling behaviour [27].

The problem is cast in the standard constrained local minimisation form:where and and , are respectively, the lower and the upper bounds for .

This minimisation problem is a nonlinear least squares problem where is supplied by the postprocessing of the FE analysis. No prior knowledge of is available. In particular, analytical derivatives of are not available. However, in this work is expected to be at least piecewise smooth, with derivatives of approximated providing useful information.

BOBYQA [28] package was used in this work for local optimisation. BOBYQA stands for bound optimisation by quadratic approximation. It does not require derivatives of . The method is based on iterative fitting of a quadratic surface over trial points. The choice of the trial points is based on the trust region and on the conjugate gradient method [28]. BOBYQA was chosen for this work because it offers a good overall performance for a range of optimisation problems [29, 30], with particular strengths in refining a near optimal solution and solving problems with large numbers of variables, for example, over 30 [31], it is easy to use [32], it is written in standard portable Fortran, and it is distributed under GPLv2 license, allowing user modification.

The Abaqus 6.14 FE package was used in this work. First-order hexahedra elements were used throughout. Parametric, script driven mesh generation is important to ensure a fully automatic FE solution pipeline, including pre- and postprocessing. First a unit square is meshed with a regular square grid. Then the unit square is extruded in the third direction for a required length to generate 1st-order hexahedra. Finally each unit square is mapped conformally onto a circle of arbitrary radius to create a circular cylinder (Figure 2). This approach is preferred in this work because it is fully automatic for arbitrary and , which makes it optimal for simulating experiments with different ratios. Conformal mapping ensures no mesh distortion for any mesh density. The Fortran code implementing conformal mapping was adopted from the Matlab code by Fong [33, 34]. All our code used for this work is freely available online at http://optpack.sf.net under 2-clause BSD license. Figure 3 shows the FE meshes used in the buckling analysis. Prior to the application of conformal mapping all finite elements are cubes with edge lengths of 1 mm. In the final mesh most elements are of similar size, except for four clusters of small elements, about 0.1 mm to 0.5 mm; see Figures 2 and 3.

#### 3. Local Optimisation and Parallelisation

The complete process for evaluating from each starting point on a Linux cluster is as follows. Five FE models are generated with to match the experiments. Eigenvalue/vector analyses are performed and the first principal modes (deformed shapes) are saved to file. Note that the eigenvalue/vector analysis does not depend on the parameters which are being optimised. Hence these analyses need to be run only once. The portable batch system (PBS) is used to set up the parallel environment, reserve the computational nodes for the job, and launch the main optimisation executable. This executable program calls subroutine BOBYQA. BOBYQA sets trial values for and calls subroutine CALFUN. CALFUN prepares the Abaqus user hardening subroutine UHARD [35] from and launches a shell script that concurrently runs five Abaqus FE models of buckling. Each Abaqus job is executed in parallel using MPI. The shell script waits for all FE jobs to complete and extracts the load/displacement data from each model. The load/displacement data is passed back to CALFUN, which calculates and passes it back to BOBYQA. BOBYQA algorithm then updates the quadratic approximated surface and the size of the trust region and calculates the direction and length of the next step and the next trial . This process is continued to convergence. It is illustrated in Figure 4.

The behaviour of , (2), is not known. It might have multiple local minima, change very rapidly or very slowly in one or more of its arguments, or even be discontinuous. Indeed, it was observed that, at least in the most slender model, , with very small initial imperfection, 0.14 mm (see Table 1), higher buckling modes can be induced for particular (see Figure 5). These models, of course, predict much higher loads than the models which buckle in the first mode. Although this phenomenon is interesting in itself, its study is left for the future. For the purposes of this work the important consequence is that is likely to be discontinuous near such .

These factors present a number of complications to a numerical optimisation technique. First, the shape of the approximated function might change dramatically with each new function evaluation. This will lead to a rapid change of the trust region size and the search direction. This might result in poor convergence or inability to converge. Second, the process might converge to different local minima depending on the starting point. This problem exists even for functions with known derivatives; however it is much more acute when derivatives are not known.

The key input parameters to BOBYQA are the starting point, , the starting and the final trust region radii, and , , and the bounds and ; see (3).

BOBYQA constructs the starting approximated surface by taking points with distance from . As shown in Table 2, has a significant effect on the required number of evaluations of and on . In this work values between and were tried.

BOBYQA terminates the search when the trust region radius is reduced to . Powell suggests that the local minimum is within from its last estimate of [28]. In this work the accuracy of 5 MPa is considered adequate for and for . Hence was used. Note that before passing to BOBYQA was scaled by a factor of to make all components of of the same order. Therefore the accuracy of is , which is considered adequate. The relatively small difference between and , in the order of 10^{2}**–10^{3}, leads to very rapid convergence, which is desirable because each evaluation of is expensive.

An important factor in this work that tight bounds on each parameter were available from prior work on this steel [3–5], where these parameters were estimated using ad hoc trial and error procedures, common in practical engineering. The following bound vectors were deemed reasonable: and , meaning that each variable was constrained as , , and .

The influence of varying the starting point, , on the rate of convergence and on was also investigated. The results are shown in Table 2.

The results show that is shallow around the local minima; see Figure 6. The first evaluations of are done on points with each component of this vector perturbed by or on the bounds and if is outside of the bounds. These evaluations are used to construct the starting approximation to [28]. Therefore no convergence is expected in the first 7 iterations. From iteration 8 onward convergence is very rapid and little improvement is achieved beyond iteration 15 or in some cases as early as iteration 11; see Figure 7. This means that quadratic approximation used in BOBYQA is well suited for around the local minima. This fact and the use of relatively high ensure that few evaluations of are required to find .

The local minima are sensitive to and ; see Table 2 and Figure 7. If is close to , then using smaller can produce a better optimum, as evidenced by cases . If no good guess for is available, high is advisable. Taking too low might result in premature convergence, as in case . Cases , , and show that is sufficient for converging to a good optimum, even when is far from .

Two levels of parallelisation were exploited for local search: five FE simulations were run concurrently and independently and Each FE calculation was done using multiple MPI processes. There are likely further fine grained parallelism options in the FE solver, to do with matrix partitioning, and so forth. However, these were not available to us for this work because the Abaqus is closed proprietary software.

The University of Bristol BlueCrystal cluster was used in this work. Each node has two 8-core 2.6 GHz Sandy Bridge processors with 4 GB per core. It was found that the Abaqus FE solver does not scale well for the problems analysed in this work. Even the largest model, , did not scale to the full node, 16 cores. The best performance for this model was achieved with 12 to 14 cores, with the scaling factor of 5 to 6, and with 4–8 cores for models with lower , with the scaling factor of 2 to 3. Consequently the best local search performance was achieved with three computational nodes. The first node was used to analyse models with on 4 cores and with on 12 cores. The second node was used to analyse models with and , each on 8 cores. The model with was analysed on the third node with 14 cores. Each evaluation of took on average 2.6 min or 23 evaluations per hour. According to Table 2 this means that each starting point can be evaluated in 1 to 1.5 hours.

#### 4. Global Optimisation and Parallelism

In addition to the two levels of parallelism exploited for local search, in global multistart search the third level was exploited too: multiple starting points were evaluated concurrently and independently.

In principle any and all parallelism options can be enabled; however for optimal performance, these have to be chosen wisely, given hardware, software, and licensing limitations. The PBS system limits the total execution time and the number of nodes available to a job. The Abaqus FE requires extra license tokens for each core used for parallel execution. The total number of license tokens available to us is limited.

These factors lead to several parallelisation strategies. At one extreme is the approach in which each FE job is executed serially, thus allowing for a maximum number of concurrent FE jobs to be run. In this case relatively many starting points can be evaluated concurrently, but slowly. At the other extreme is execution of each FE job with maximum parallelisation, using all available Abaqus license tokens and all available PBS nodes. In this mode starting points are evaluated serially, but fast. As mentioned in the previous section, the Abaqus scaling is limited. Hence, after some experimentation, it was concluded that optimum FE parallel performance is achieved when 3 nodes are used to evaluate each starting point. Four starting points, chosen at random, were evaluated concurrently, requiring 12 nodes (196 cores) and 2,232 Abaqus license tokens. Availability of the Abaqus license tokens was the ultimate limitation. With this level of concurrency 3 starting points were evaluated per hour. A total of 50 points have been evaluated and Table 3 shows five lowest minima found.

#### 5. Validation

Figure 8 compares the experimental load/displacement traces with those predicted by the FE with the lowest minimum plastic properties, line of Table 3. To judge whether is a good fit or not, one would have to compare the residual between the model and experiments with that between repeated experiments. However, repeated experimental data was not available for this work.

Instead is validated on a different experiment. In this work a tensile test was used. Figure 9 shows a good match between the experimental load/displacement curve and those predicted by the FE with the fitted plastic properties from Table 3. However, none of the five minima describe the last stages of the deformation well (see inset in Figure 9), where necking is accompanied by ductile damage. This is the limitation of the 3-parameter plastic hardening model adopted in this work.

Note that the validation step forms an integral part of the autotuning method. This means that some distinct data must be deliberately left for validation and not included into the formulation of , (2).

#### 6. Global Optimisation with No Validation

While the validation gives the user some measure of the goodness of fit, it requires that some experimental data is left out of the optimisation. As a result the optimisation is inevitably done over a smaller range of stress-strain histories, thus reducing the range of applicability of the tuned plastic constitutive model.

One can use all available experimental data for optimisation, at the expense of losing the measure of goodness of fit. However, the fitted plastic model will have a wider applicability.

If the tensile test data is included into , (2), then different local minima are obtained. The five lowest minima obtained from 25 starting points are shown in Table 3. Figure 10 shows the experimental and the FE load/displacement traces at the best discovered estimate of the global optimum.

It is interesting to note that minima 1–3 and 5 found using both the buckling and the tensile data in are distinctly different from the lowest minima found using only the buckling data. However, the 4th lowest minima are quite similar in both cases. It is hard to draw definitive conclusions from these observations because relatively a small number of starting points, 50 and 25, respectively, have been tried. Given enough time, it is possible that even lower minima might be discovered in both cases.

#### 7. Discussion and Further Work

The main discovery of this work is that no single clear estimate of the global optimum has emerged. The five lowest local minima discovered when using only the buckling data or when using both the buckling and the tensile data have very different with very similar . Although it is logical to accept with the lowest as the best estimate of the global optimum, the confidence of this choice would be low.

Section 6 shows that adding more data into the optimisation results in only a very small improvement in the prominence of the estimate of the global optimum. However, the best discovered estimate of the global optimum with the tensile data included is quite different from that obtained with just the buckling data. In other words, the discovered optimum might be sensitive to the noise in experimental data.

One possible explanation of this is that diversity of stress/strain histories in the data cannot be fully represented by the 3-parameter isotropic power hardening model. The mechanical explanation of this is that the isotropic power hardening might not adequately represent the mechanical behaviour of structural steel, particularly at large strain, where some ductile damage mechanism might be activated. It is therefore suggested to explore in future a combined isotropic/kinematic hardening model, as the model regions close to the constraints, and those with the maximum off-axis deflection, experience reverse plasticity, first compression, followed by tension at large buckling deformations. In addition it is suggested to explore fitting of an anisotropic hardening model, such as Hill’s orthotropic model, which previously was fitted successfully with the virtual fields method [13]. Depending on the manufacturing process, reinforcing bars might have texture and hence be significantly anisotropic.

It would be interesting to explore whether the addition of the initial imperfection to the optimisation parameters would result in a better estimate for the global optimum. This work requires a substantial increase in the number of optimisation parameters, 5 in the present case. Therefore it is deemed feasible only when better scalability of the FE solver has been achieved.

Scalable and scriptable FE solver is required to reduce calculation time of . If each evaluation of can be done in less than a minute, then it will become feasible to add more variables to , such as the initial imperfection. When can be evaluated in just a few seconds, then optimisation of more complex plastic and/or damage models will become feasible, for example, combined kinematic/isotropic hardening or the Gurson-Tvergaard-Needleman continuous damage model [36]. The use of a damage model will also help better describe the necking behaviour in the tensile tests, something that a 3-parameter hardening model cannot do, as described in Section 5. It is likely that this will require FE codes scalable up to tens or hundreds of thousands of cores. One possible candidate is ParaFEM, which showed excellent scaling on HECToR and ARCHER, the UK national supercomputers, that is, Tier-1 systems [37]. With exascale era widely expected to start by about 2020 [38], this work is a small step towards enabling exascale computing for practical mechanical or civil engineering applications.

#### 8. Conclusions

Multistart global optimisation, with BOBYQA used for local search, has proved successful for autotuning a 3-parameter isotropic power hardening model, where the objective function was a scaled sum of the squared differences between the experimental and the FE load/displacement data for elastoplastic buckling of steel rods. Thus the optimisation was a nonlinear least squares problem.

Convergence of BOBYQA is sensitive to the starting trust region radius, . For small BOBYQA can converge prematurely, because the objective function is shallow around the minima. The use of a large initial trust region radius is thus recommended.

Higher mode buckling was predicted by FE for specific combinations of the model parameters, resulting in very high gradients or even discontinuities of the objective function.

The quality of the best discovered global optimum was assessed using a separate, post-optimisation validation step.

However, poor prominence of the best discovered global optimum raises further questions, which the authors intend to address in future work: How do the type and the amount of experimental data affect the design of the objective function? If some knowledge of the influence of the model parameters on the stress/strain response is available, how can it be included into the objective function? What is the type of the desired new experimental data, given a prior estimation of the global optimum?

#### Disclosure

Mohammad M. Kashani is now at Faculty of Engineering and Environment, University of Southampton, Southampton SO17 1BJ, UK.

#### Competing Interests

The authors declare that there is no conflict of interest regarding the publication of this paper.

#### Acknowledgments

This work was carried out using the computational facilities of the Advanced Computing Research Centre, The University of Bristol: https://www.bris.ac.uk/acrc/. Professor M. J. D. Powell died in 2015. In readme for BOBYQA package he writes: “I hope that the time and effort I have spent on developing the package will be helpful to much research and to many applications.” It has been and it will.