The parallelization of 2D/3D software SAPTIS is discussed for nonlinear analysis of complex structures. A comparative study is made on different parallel solvers. The numerical models are presented, including hydration models, water cooling models, modulus models, creep model, and autogenous deformation models. A finite element simulation is made for the whole process of excavation and pouring of dams using these models. The numerical results show a good agreement with the measured ones. To achieve a better computing efficiency, four parallel solvers utilizing parallelization techniques are employed: (1) a parallel preconditioned conjugate gradient (PCG) solver based on OpenMP, (2) a parallel preconditioned Krylov subspace solver based on MPI, (3) a parallel sparse equation solver based on OpenMP, and (4) a parallel GPU equation solver. The parallel solvers run either in a shared memory environment OpenMP or in a distributed memory environment MPI. A comparative study on these parallel solvers is made, and the results show that the parallelization makes SAPTIS more efficient, powerful, and adaptable.

1. Introduction

Complex structures are largely employed in engineering practice in a variety of situations and applications, for example, water resources and hydropower engineering, mining engineering, and traffic engineering. An analysis of such structures is not possible by empirical methods, and moreover in situ experimental studies are costly. Recent advances in numerical techniques have provided the finite element method (FEM) for analysis of much more complex systems in a much more realistic way. The FEM can model the complex behavior of concrete without limitations caused by complexity.

Many authors have studied the nonlinear response of concrete structures using the FEM focusing on the three-dimensional elastoplastic problem [112]. Most authors consider the effect of creep [48], while others consider cracking [912], with regard to nonlinear analysis of structures. Generally, the models are extremely computation intensive. Many engineers complain about the high computational costs. As a result, some nonlinear analysis codes have to be given up. The situation has changed since the arrival of a variety of high performance computers and the advances in parallel computing techniques, that is, parallel algorithms and parallel platforms. Parallel algorithms on different platforms, that is, algorithms utilizing OpenMP and MPI, are studied by [1316]. Recently, the GPU high performance computing has been popular. A parallel GPU equation solver is introduced into SAPTIS as well. In these respects, we should be able to model more complex effects like hydration, water cooling, cracking, creep, autogenous deformation, and so forth.

The aim of this work is the prediction of concrete behaviors in different situations as well as the improvement of the runtime in the nonlinear analysis program structure analysis program for temperature and induced stress SAPTIS. A composition of models is presented in terms of the former, and a comparative study on different parallel solvers is made for the sake of the latter. The paper does not address the issue in mathematics itself, but focuses on the application of current algorithms. The goal is to develop practical and efficient parallel strategies for nonlinear analysis of complex structures.

In this study, we will arrange the contents as follows. In the first section, a composition of models and its basic formulations are introduced. Then, the parallel strategies as well as the three parallel solvers are presented. Furthermore, the performance of the newly developed parallel code is tested on different computers. Finally, conclusions regarding the parallelized SAPTIS are reported.

2. Models for Nonlinear Analysis of Concrete Structures

The nonlinear analysis of the model takes into consideration the effects of hydration, water cooling, modulus changes, creep, and autogenous deformation. An appropriate model for these effects as well as the FEM equations is used for building the nonlinear system.

2.1. Thermal Simulation
2.1.1. Hydration Models

Hydration of concrete brings heat. Hereby it can be modeled using five models in SAPTIS, namely, exponential model, hyperbolic model, complex model, hydration degree model, and table model [17]. We put a superscript “+” as heat increasing, and the first four models can be explicitly formulated as follows.

Exponential Model 1:

Exponential Model 2:

Hyperbolic Model:

Complex Model:

Hydration Degree Model:

Table Model. Calculate the hydration heats and increments by splines based on experimental data.

In (1) ~ (5), is the adiabatic temperature rise regarding hydration at age , are heat parameters, , , , , and are parameters regarding heating rate, is an equivalent age, , , and are activation parameters.

2.1.2. Water Cooling Models

There are two water cooling models in SAPTIS: one is a fine model and the other is an equivalent model. Derivations and formulations of the two models are described in detail in [18]. The cooling pipes take away heat, which can be written as: where () is the adiabatic temperature drop caused by the cooling system, is the temperature of water at entry, is the initial temperature of concrete, which will be given in (8), and is the thermal distribution function.

In the fine model, a local grid refinement of the cooling pipes is needed. In every single time step, thermal distribution along the cooling pipes is obtained by calculating the heat exchange between the cooling pipes and the concrete, and the thermal field of concrete is therefore obtained. The fine model can accurately simulate the thermal distribution around the cooling pipes, but it needs a grid refinement, which inevitably enlarge the calculation scale.

The equivalent model does not take into consideration the location of pipes. An average concept is exploited by inputting the pipe spacing, the flux, the temperature, and the time of watering and calculating a mean effect of water cooling. The equivalent model can reduce calculation time without local grid refinements and ensure an average precise, but it does not consider the thermal distribution along the cooling pipes and it cannot obtain the thermal gradient near them.

2.1.3. Basic Thermal Model

The equilibrium equation of heat transfer can be written as: The initial condition can be written as: The boundary conditions are In (7) ~ (9), represents age, is thermal conductivity [kJ/(m·h·°C)], stands for density [kg/m3], is specific heat capacity [kJ/(kg·°C)], is thermal diffusivity and [m2/h], stands for total adiabatic temperature incremental and   [°C]; is temperature [°C], is the initial temperature of concrete [°C], is the fixed boundary temperature [°C], represents heat flux [kJ/(m2·h)], denotes ambient temperature in natural convection conditions and adiabatic temperature of boundary layer in forced convection conditions [°C] and is heat transfer coefficient in surface [kJ/(m2·h·°C)].

2.2. Creep Simulation

SAPTIS can simulate thermal distribution as well as stress field in concrete casting. Apart from linear, nonlinear, and elastoplastic simulation, creep and hardening of concrete are also required.

2.2.1. Modulus Models

Modulus and strength of concrete increase to peaks with age due to hydration effect after pouring. The increasing rate of modulus and strength relates to temperatures of concrete and environment. To simulate such effect, SAPTIS adopts several modulus models [19].

Exponential Model:

Hyperbolic Model:

Complex Model:

Aging Degree Model: Table Model. Calculate the modulus by interpolation based on experimental data.

In (10) ~ (13), is modulus at age and , , , , and are parameters regarding hardening rate.

In practical modeling, choose one model from (10) ~ (13), and the results show that any of the models can well simulate the process of modulus growth of concrete.

2.2.2. Creep Model

Zhu [19] proposed a creep formula for loading at different ages after a rigorous derivation: where represent creep of concrete, stands for age, denotes loading time, and , , , , , , , , , and are regression coefficients. An increment method is employed to calculate the creep influence on deformation and stress. Modulus and creep of concrete in several engineerings are shown in Tables 1 and 2.

2.2.3. Autogenous Deformation and MgO Linear Expansion Models

Input the ages of concrete and corresponding strains, and autogenous deformations can be obtained from spline interpolation.

Autogenous deformations exist in MgO concrete and can somehow offset the shrinkage due to thermal decrease, which is beneficial for preventing dam cracking. MgO concrete has been applied in a few projects, and good performance has been achieved [20, 21]. We propose two formulas to calculate autogenous deformations of MgO concrete from practice.

Dynamic Formula:

Residual Formula: In (15) and (16), , , and are parameters related to final expansion in [] and , , , , , , and are coefficients.

2.3. Finite Element Formulation
2.3.1. Element Types

SAPTIS is 2D/3D software for general analysis. In 2D space, elements of triangle with 3–6 nodes and quadrilateral with 4–9 nodes are employed. In 3D space, elements consist of tetrahedra with 4–11 nodes, wedges with 6–16 nodes, and hexahedra with 8–21 nodes. The shape functions of these elements can be found in and elaborated upon by [20, 21]. Besides, a one-dimensional (1d) element used for links and prestressed anchors is defined in the software.

Two definitions for this 1d element are emphasized: one is an explicit definition, in which the element is a real bar connected by its two nodes, and it has its own stiff matrix (Figures 1(a), 1(b), and 1(c)). The other is an implicit definition and it treats the element as a virtual bar, which does not have a geometric entity in representation, but its stiff matrix is nominally added to its neighboring element(s) (Figures 1(d), 1(e), and 1(f)). In both definitions, there are three connection types in terms of hexahedral neighbors, that is, edge connection (Figures 1(a) and 1(d)), face diagonal connection (Figures 1(b) and 1(e)), and body diagonal connection (Figures 1(c) and 1(f)).

2.3.2. Finite Element Formulation of Heat Transfer

By using the variation formulation, the equilibrium equation of heat transfer in (7) can be transformed into the following matrix form: where [] is the matrix for specific heat capacity, [] is the matrix for heat conductivity, and is the total heat flux vector for internal hydration heat and heat convection.

The matrices of [] and [] and vector shown in (17) are where [] is the matrix for shape function, is the finite domain, and is the boundary of .

2.3.3. Finite Element Formulation of Stress Analysis

An incremental method is employed to analyze the deformation and stress of concrete. The sum of elastic and creep strain in concrete is proportional to a sustained load under ordinary service stress level, while the thermal strain is proportional to the temperature rise. The autogenous deformation in MgO concrete can be worked out by the formulas we propose. The incremental method has been elaborated upon in [12, 22], and we only present its finite element equation as follows: where is the force increment and is the strain matrix. is the stress increment caused by elastic, creep, thermal, and autogenous strains, which can be written as where is the instant stress matrix, is elastic strain increment, relates to creep strain increment, stands for thermal strain increment, and represents autogenous strain increment.

Put (20) into (19), and we can get an equilibrium form where is the stiff matrix of element, is displacement increment of element, and , , and are nodal load increments caused, respectively, by creep, temperature, and autogenous deformation. The matrix and vectors can be summarized as After the assembly of all element stiffness matrices, a general finite element equation in its global form can be expressed as where [] is global stiffness matrix, is global displacement vector, is global increment vector induced by external force, and , , and are global increment vectors caused by creep, temperature, and autogenous deformation, respectively.

3. Parallelization Strategy

Since the 1980s, a lot of work has been done in parallel and distributed computing for structural analysis. Parallel processing benefits such analysis a lot by using two different strategies [23].(1)The analysis problem may be subdivided by geometrically dividing the idealization into a number of subdomains: explicit decomposition approach, also called substructure approach.(2)Alternatively the system of equations for the whole structure may be assembled and solved in parallel without recourse to a physical partitioning of the problem: implicit domain decomposition (IDD) approach, or global approach.

It should be noted that such strategies with domain decomposition techniques are not general solution procedures and are specialized for particular applications. Furthermore, what can be parallelized are [24] (a) input problem characteristics, (b) assembly, (c) boundary conditions, (d) solution of algebraic equations, and (e) postprocessing. The fourth item is the most important to parallelize. Additionally, Gummadi and Palazotto [25] indicated that in a typical linear static FEA, the most consuming operation is the solution of linear equations. We recognize from our own experience that the time used for solving equations can reach 70%~90%. An optimization in linear equation solvers is required to largely improve computing efficiency.

3.1. Linear Equation Solvers

Linear system of large equations is solved by two kinds of solvers, that is, the direct solvers (e.g., Gauss) and the iterative ones (e.g., PCG).

The direct solvers are applicable with accuracy assurance for any nonsingular linear system with an appropriate scale and density. When solving a large sparse system with many 0’s, a direct solver is particularly consuming. Additionally it cannot ensure the accuracy in terms of rounding errors. The parallelization for direct solvers is only suitable for tridiagonal matrices, block tridiagonal matrices and banded matrices. For a general sparse linear system, the sparse equation solvers are popular with multifrontal algorithms and supernode technology.

The iterative solvers overcome the shortcomings of the direct ones and retain the sparsity of coefficient matrix. With small storage and computation, they have more obvious advantages especially for large, sparse, asymmetric, and seriously ill-conditioned matrices. At present, the preconditioned conjugate gradient (PCG) solver on behalf of the Krylov subspace solvers is among the most popular iterative solvers in engineering and sciences. Classical iterative solvers (e.g., Jacobi, GS, SOR, and SSOR) with their improved ones are used for preconditions in most situations.

In SAPTIS, four linear equation solvers are adopted: (1)the preconditioned (i.e., SSOR) conjugate gradient solver, (2)the preconditioned (e.g., CG, CGS, BiCGSTAB, etc.) Krylov subspace solver, (3)the sparse equation solver,(4)the GPU solver.

3.2. Key Technique for Parallel Programming

A parallel procedure is the implement of a parallel algorithm on a parallel computer or cluster. The parallel algorithm is designed for a parallel computing model, which is abstracted from different parallel systems or architectures of computer models. There are three programming models for parallel computing at present: shared memory programming model; message passing programming model; hybrid programming model, that is the hybrid of shared memory and message passing programming models. They correspond to three parallel architectures: shared memory system, for example, Symmetric Multiprocessor (SMP); distributed memory system, for example, clusters and (Massive Parallel Processing MPP); distributed/shared memory system, for example, multicore clusters and SMP clusters.

To implement a parallel procedure, four major problems are faced with. The first to be concerned is what parts of the algorithm can be parallelized. These parts should be decomposed and executed by different processes, and a judgment for decomposition is whether there exists data competition. The second problem is the strategy for decomposition. There are two strategies: one is task strategy, another is data strategy. In task strategy, the procedure is decomposed into different tasks, whose dependence on whom should be noted. Then the tasks are scheduled to avoid mutual interference. Such parallelization strategy belongs to a coarse one and demands high independence of specific problems. In data strategy, the data space of procedure is decomposed into different areas. Each processor takes in charge of its own area. Obviously, this strategy is well suited for parallelization of finite element analysis. The third to be concerned is the programming model. This can be determined by the parallel architecture. The last one is the programming method. In scientific computing, data parallelization strategy is generally employed. As a result, the looping or (Single Program Multiple Data SPMD) programming is adopted. In looping programming, loops without competition are distributed into different processors. In SPMD programming, all processors execute the same procedure, but they use different data, which are transferred and shared among the processors via communication.

In SAPTIS, these problems are solved one by one as follows.(i)The solution of algebraic equations should be urgent to parallelize.(ii)For FEA parallelization, the data strategy should be exploited.(iii)The message passing programming model should be employed if based on MPI, and the multithread programming model should be employed if based on OpenMP.(iv)Data parallelization strategy determines that the looping or SPMD programming method should be used.

3.3. Implementation

The implementation is based on reference books [26] and work made by [1316, 2729].

3.3.1. Parallel Preconditioned Conjugate Gradient Solver Based on OpenMP

The parallel preconditioned conjugate gradient (PCG) solver uses symmetric successive over relaxation (SSOR) technique and runs on the OpenMP-based platform.

The bottlenecks of performance for the PCG solver are the matrix and vector operations like and , which costs over 80% of the computation time. Therefore, the parallelization strategy for the PCG solver based on OpenMP is to parallelize the loops containing such operations without data competition by controlling statements of OpenMP. After such programming process, parallelization of matrix-vector multiplication and inner product operation is achieved, and the data formats and compile options are optimized simultaneously.

The PCG solver is parallelized in an algorithm level, which requires frequent communication on computers of distributed memory systems. Thus, such parallel algorithm is more suitable for computers of shared memory systems, and an OpenMP-based platform is employed.

The parallelization procedures for the PCG solver can be summarized as follows:(i)first, the matrix and vector operations are parallelized, including and , where the vector is generated by specific algorithm;(ii)then, the inner product operation is parallelized;(iii)furthermore, the vectors are updated;(iv)finally, the preconditions are calculated (if necessary).

3.3.2. Parallel Preconditioned Krylov Subspace Solver Based on MPI

The Krylov subspace is defined by means of the square matrix , the initial vector , and the positive scalar and . The original system is reduced by using the orthogonal transformation , obtaining the tridiagonal system . By the integration of the solution of the tridiagonal system and the generation of the matrices and , let us obtain a complete algorithm to solve the original system, where we obtain a better solution in each step [29].

The parallel preconditioned Krylov subspace (PKS) solver uses iterative techniques, for example, CG, CGS, BiCGSTAB, GMRES, TFQMR, and so forth, and runs on the MPI-based platform.

For the parallelization of the PKS solver, a domain decomposition method is employed. The “divide and conquer” strategy is used, in which the finite analysis domain is divided into several subdomains and each subdomain is in the charge of one or several processors. All the processors solve the whole problem independently and interactively, and synergistically and simultaneously.

Based on the distributed memory environment of parallel computing, the PKS solver is paralleled by using the SPMD programming method and the message passing programming model. Since the “divide and conquer” strategy is used, the solution procedures of FEA equations consist of assembly of global matrix and parallel solving of all local equations. The SPMD programming model implies that the source codes of every process are nearly the same, in spite of the differences of data in different areas.

The parallelization procedures for the PKS solver can be summarized as follows:(i)the global FEA equations are assembled;(ii)the “divide and conquer” strategy is employed to divide the FEA domain into sub-domains;(iii)local equations are formed based on sub-domains;(iv)iterative techniques, for example, CG, CGS, BiCGSTAB, GMRES, TFQMR, and so forth, are paralleled and used to solve all the local equations.

3.3.3. Parallel Sparse Equation Solver Based on OpenMP

The parallel sparse equation (SE) solver uses a direct solving method and runs on the OpenMP-based platform.

Algorithms based on iterative methods are not always suitable for specific structural analysis, for such algorithms don not always work since the matrix may not be well conditioned during the iterative process. The direct method has fewer problems in achieving solution convergence. In terms of these reasons, a sparse equation solving method is adopted as well in SAPTIS. This is a numerically stable parallel algorithm for solving ill-conditioned linear systems of equations.

The sparse methods are close to direct methods in essence, but they have also big differences. The rearrangements of matrices, incomplete decompositions, and multifront techniques have made the sparse methods more efficient than the direct ones.

The sparse equation solver is parallelized in CSR matrix format, which supports several types of matrices including real/imaginary matrices and symmetric/asymmetric matrices.

The parallelization procedures for the sparse equation solver are similar to those for the PCG solver. In this case, we will omit the discussions.

3.3.4. Parallel GPU Equation Solver

Recently, the high performance computing is quite popular, that is, the GPU computing. The graphics processing unit (GPU) has become an integral part of today’s mainstream computing systems. Over the past decade, there has been a marked increase in the performance and capabilities of GPUs. The advent of GPU computing technology greatly improves the computational performance of numerical simulations and the speedups reach tens to hundreds. Besides, a GPU is small, portable, and cheap. It costs little power.

The parallel GPU equation solver uses an iterative method. The strategies used for GPU implementation can be summarized as follows:(1)replace the original CPU-based calculation functions with GPU kernels. A kernel is a function that runs on the GPU device;(2)integrate as many GPU kernels into a large one as possible. This is to ensure the efficiency of GPU codes, for data transfer among different kernels is time-consuming.

Based on the strategies above, the functions shown in Algorithm 1 are introduced into SAPTIS. These functions as well as their library files have enabled SAPTIS to carry out GPU high performance computing.

extern “C” int VeksInit(UINT32 nElems, UINT32 nGlobalNodes, UINT32 aiElemConn[ ] );
// Funcition:       Initialize the solver
extern “C” int VeksSetK(double afK[ ] );
// Funcition:       Set element stiffness matrices
extern “C” int VeksSetDemF(double afF[ ] );
// Funcition:       Set element loads
extern “C” int VeksSetU(double afU[ ] );
// Funcition:       Set nodal displacements
extern “C” int VeksSetF(double afF[ ] );
// Funcition:       Set nodal loads
extern “C” int VeksFixU(char abU[ ] , double afU[ ] );
// Funcition:       Set nodal displacements
extern “C” int VeksGetU(double afU[ ] );
// Funcition:       Get nodal displacements
extern “C” int VeksSetC(double afC[ ] );
// Funcition:       Set element conductivity matrices
extern “C” int VeksSetDemQ(double afQ[ ] );
// Funcition:       Set element thermal loads
extern “C” int VeksSetQ(double afQ[ ]);
// Funcition:       Set nodal thermal loads
extern “C” int VeksFixT(char abT[ ], double afT[ ]);
// Funcition:       Set nodal thermal constraints
extern “C” int VeksGetT(double afT[ ]);
// Funcition:       Get nodal temperatures
extern “C” __int64 VeksSolve(__int64 nMaxIter, double fTimeStep, double fUnbRatio, double fEps, unsigned int nPrint);
// Funcition:       Solve equations
// The procedures for GPU analysis
#include “Eks.h”
int nRet = VeksInit(nElems, nGlobalNodes, aiElemConn);
VeksFixU(abU, afU);
VeksSetC(afC); VeksSetDemQ(afDemQ);
VeksFixT(abT, afT);
VeksSolve(nSteps, fTimeStep, fUnbRatio, fEps, nStipePrint);

The analysis procedures by using the GPU parallel solver are summarized as follows:(1)include the “Eks.h” file into the source files;(2)link the library files Geks.Lib;(3)call the function VeksInit to initialize the GPU solver;(4)call the function VeksSetK to transfer the element stiffness matrices into the GPU solver;(5)call the function VeksSetC to transfer the element conductivity matrices into the GPU solver;(6)call the function VeksSetDemF to transfer the loads vector into the GPU solver;(7)call the function VeksFixU to set the displacement constraints;(8)call the function VeksSetDemQ to transfer the thermal loads vector into the GPU solver;(9)call the function VeksSolve to solve the equations;(10)call the functions VeksGetU and VeksGetT to get the displacements and temperatures, respectively.

4. Numerical Examples

4.1. Thermal Field Verification
4.1.1. Cylinder Water Cooling Pipe

Take a piece of circular area, whose radius (see Figure 3). The fine and equivalent models of FEA are used for comparison and verification. Comparative simulations between a plastic and a metal pipe are performed. The parameters for the simulations are presented in Table 3.

The comparisons between the results of the fine and equivalent models are illustrated in Figure 4. The temperature in the fine model is a real value, while the temperature in the equivalent model is and can only be a mean value of the whole pipe domain.

The figure shows that the temperature of the metal cooling pipe is lower than that of a plastic one, and the temperature difference between them reaches 1.5°C on day 15. Thus, the metal pipe takes priority over the plastic one in concrete cooling. Meanwhile, the numerical result shows a great agreement with the analytical solution. The temperatures obtained by these two models are nearly the same, which indicates that the equivalent model can ensure its accuracy when simulating a real engineering problem.

4.1.2. Simulation of a Dam

A hyperbolic arch dam in Southwest China is 285.5 m high. The annual average air temperature in the area is 19.7°C; the highest monthly average air temperature is 27.1°C; the lowest monthly average air temperature is 10.6°C. The annual average ground temperature is 21.4°C.

A simulation for the dam is performed to grasp the real temperature status and thus to develop a measure for preventing cracking. The simulation takes into account the whole process of excavation and pouring of dams and the varieties of influencing factors for dam temperature. The model with its boundary conditions is illustrated in Figure 2.

The simulation results are shown in Figure 5. The temperature of Elevation 357 m is presented in Figure 5(a) while the temperature of Elevation 408 m is presented in Figure 5(b). The comparisons between the numerical results and the measured ones are made, and we know from the figure that these results show a great agreement with each other. Additionally, the results show the real temperature status. These all indicate that the models presented are reasonable and accurate.

4.2. Parallel Computing Examples
4.2.1. Parallel PCG Solver Test on OpenMP

(1)  Serial Codes Test. A serial codes test is performed on a ThinkCentre M Q45t computer. Its parameters are shown in Table 4. This computer can solve a problem with more than 5 million degrees of freedom.

The result of the test is shown in Figure 6. The PCG solver runs on a single PC very efficiently with little memory, and it can solve big problems. Figure 6 shows the different times solving problems with different DOFs by using the serial SSOR-PCG solver. The result indicates that the sovler takes only 100 seconds when solving a problem with 2 million DOFs and takes 500 seconds when solving a problem with 5 million DOFs. Obviously, it is much faster than many solvers.

(2)  Parallel Codes Test. A parallel codes test of the PCG solver runs on a Sun Fire 6800 computer. Table 5 shows the test platform, and Table 6 gives the test results. We can easily know from the results that the speedups increase significantly with the number of threads. The acceleration of speedups drops a little, but the speedups themselves are almost over 80%.

4.2.2. Parallel PKS Solver Test on MPI

Test of the parallel PKS solver is performed on platform shown in Table 7. Methods are shown in Table 8, and the test results are presented in Table 9.

We can easily obtain that the speedups are significant and the parallel solver is quite efficient. This parallel solver can run either on a cross-node cluster or on a shared memory system such as the multicore CPU.

4.2.3. Parallel SE Solver Test on OpenMP

The test platform for parallel SE solver is presented in Table 10, and the results are shown in Table 11. For a 930000-order slightly ill-conditioned matrix, the solving time is quite impressive, and it is only 70 seconds when the solver runs on 16 threads. Since a sparse direct method is employed, the time is mainly spent on incomplete LU decomposition. Only about 20G memory is consumed, which is much less than the Gaussian elimination, but more than iterative methods whose memeroy consumption can be only 3-4 G. We can get from the speedups that the parallel sparse equation solver is very efficient and scalable, and it is well suitable for equations of large linear system.

4.2.4. Parallel GPU Solver Test

An engineering test is performed, using the model in Figure 7.

The boundary conditions are similar to the ones shown in Figure 2. The results are shown in Figure 8.

In this simulation, we test the speedups, which will be shown in Table 12.

5. Conclusions

First, numerical models adopted by the software are presented including (i)hydration models,(ii)water cooling models,(iii)modulus models,(iv)creep model,(v)autogenous deformation models taking into account the properties of MgO concrete.

A thermal example for verifying the thermal models is presented. A good agreement is achieved between the numerical results and the analytical ones. A finite element simulation for the whole process of excavation and pouring of dams using these models is made, and the numerical results show a good agreement with the measured ones.

Then, several parallel solvers are introduced with their parallel strategies, consisting of (1)the preconditioned (i.e., SSOR) conjugate gradient solver, (2)the preconditioned (e.g., CG, CGS, BiCGSTAB, etc.) Krylov subspace solver, (3)the sparse equation solver.

The parallelization procedures for the PCG solver can be summarized as follows:(i)first, the matrix and vector operations are parallelized, including AP and , where the vector P is generated by specific algorithm;(ii)then, the inner product operation is parallelized;(iii)furthermore, the vectors are updated;(iv)finally, the preconditions are calculated (if necessary).

The parallelization procedures for the PKS solver can be summarized as follows:(i)the global FEA equations are assembled;(ii)the “divide and conquer” strategy is employed to divide the FEA domain into sub-domains;(iii)local equations are formed based on sub-domains;(iv)iterative techniques, for example, CG, CGS, BiCGSTAB, GMRES, TFQMR, and so forth, are paralleled and used to solve all the local equations.

The sparse equation solver is parallelized in CSR matrix format, which supports several types of matrices including real/imaginary matrices and symmetric/asymmetric matrices. The parallelization procedures for the sparse equation solver are similar to those for the PCG solver.

Last, a comparative study on these parallel solvers is performed. The results show that(i)the speedups are quite significant;(ii)the serial and parallel solvers are both very efficient;(iii)the serial and parallel solvers can both deal with very large problems with more than 5 million degrees of freedom;(iv)the parallelization makes SAPTIS more powerful and adaptable.

A GPU-based parallel solver has been developed, and the GPU parallelization has made SAPTIS much more efficient.

Conflict of Interests

The authors declare that they do not have any conflict of interests with the content of the paper.


The authors would like to acknowledge the financial support of the National Natural Science Foundation of China (51209235), the National Basic Research Program (973 Programs: Grant nos. 2010CB731500, 2013CB036406, and 2013CB035904), the National “Twelfth Five-Year” Plan for Science and Technology Support (2013BAB06B02), the Governmental Public Industry Research Special Funds for Projects of MWR (201201050), and the IWHR Special Research of China (Volume 1118, Volume 1208, Volume 1169, Volume 1268, Volume 1309, Volume 1353, and Volume 1361).