Abstract

We have discussed the multidimensional parallel computation for pseudo arc-length moving mesh schemes, and the schemes can be used to capture the strong discontinuity for multidimensional detonations. Different from the traditional Euler numerical schemes, the problems of parallel schemes for pseudo arc-length moving mesh schemes include diagonal processor communications and mesh point communications, which are illustrated by the schematic diagram and key pseudocodes. Finally, the numerical examples are given to show that the pseudo arc-length moving mesh schemes are second-order convergent and can successfully capture the strong numerical strong discontinuity of the detonation wave. In addition, our parallel methods are proved effectively and the computational time is obviously decreased.

1. Introduction

Detonation is a type of combustion involving a supersonic exothermic front accelerating through a medium that eventually drives a shock front propagating directly in front of it. Detonations occur in both conventional solid and liquid explosives, [1] as well as in reactive gases [2]. Due to the strong discontinuity, one of the most important challenges in detonation simulations is to capture the precursor shock wave. To capture the strong discontinuity in detonation waves, mesh adaptation is an indispensable tool for use in the efficient numerical solution of this type of problem. The moving mesh method is one of the mesh adaptation methods, which relocate mesh point positions while maintaining the total number of mesh points and the mesh connectivity [35]. Particularly, Tang et al. proposed a moving mesh method that contained two parts: physical PDE time evolution and mesh redistribution [68]. The physical PDE time evolution and mesh redistribution were alternating, and conservative interpolation was used to transfer solutions from old mesh to new mesh. The method is shown to work well generally for hyperbolic conservation. After that, Ning et al. have improved this method and proposed the pseudo arc-length moving mesh schemes, which can deal with the multidimensional chemical reaction detonation problem [9].

In this paper, we will discuss the parallel computation of pseudo arc-length moving mesh schemes for multidimensional detonation. There are some works about the parallel schemes for Euler numerical scheme. Knepley and Karpeev developed a new programming framework, called Sieve, to support parallel numerical partial differential equation (PDE) algorithms operating over distributed meshes [10]. Morris et al. demonstrated how to build a parallel application that encapsulates the Message Passing Interface (MPI) without requiring the user to make direct calls to MPI except for startup and shutdown [11]. Mubarak et al. present us with a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing interprocess communication [12]. Wang et al. use the parallel scheme to reduce the cost for adaptive mesh refinement WENO scheme of multidimensional detonation [13]. However, there are no discussions about the parallel computation for moving mesh schemes, which will be of concern in this paper. Different from the traditional Euler numerical scheme, the data communications of moving mesh schemes between processors are more complex, which include physical values and mesh points. Besides, the processor communications for pseudo arc-length moving mesh schemes include adjacent processor and diagonal processor. Here, we adopt the software architecture of MPI. The most important advantages of this model are twofold: achievable performance and portability. Performance is a direct result of available optimized MPI libraries and full user control in the program development cycle. Portability arises from the standard API and the existence of MPI libraries on a wide range of machines. In general, an MPI program runs on distributed memory machines. The processor communications for the parallel computation of pseudo arc-length moving mesh schemes are more complex than the traditional Euler scheme, and it includes adjacent processor and diagonal processor. Here, we will consider three kinds of processor partitions to show our parallel schemes. This article is organized as follows. Section 2 introduces the chemical and physical model. Section 3 presents the numerical scheme. Section 4 is devoted to the parallel computation. Section 5 conducts several numerical experiments to demonstrate our schemes. The paper ends with a conclusion and discussion in Section 6.

2. Governing Equations

Instead of using many real elementary reactions, a two-step model was utilized as the testing model. Two-step reaction model considers a complicated chemical reaction to be an induction and an exothermic reaction. For both induction reaction and exothermic reaction, the progress parameters and are unity at first, then decreases to zero, and decreases until an equilibrium state is reached. The rates and are given as follows [14].where is the mass density, the pressure, the temperature, the gas constant, the heat release parameter, and the constants of reaction rates, and and the activation energies.

In deriving fundamental equations, the gas is assumed to be perfect, nonviscous, and non-heat-conducting. In Cartesian coordinates, governing equations for gaseous detonation problem, including the above chemical reaction, arewhere is the multidimensional vector, is the multidimensional matrix function, and is the chemical reaction source term. For the one-dimensional space,For the two-dimensional space,In the case of three-dimensional space,where , , and are the Cartesian components of the fluid velocity in the , , and directions, respectively. Total energy density is defined asHere is the specific heat ratio.

3. Numerical Method

Firstly, we present the framework of pseudo arc-length moving mesh schemes for the gaseous detonation problem (2). Our adaptive scheme is formed by two independents parts: the evolution of the governing equation and the iterative mesh redistribution.

3.1. Time Evolution of Governing Equations

The physical domain for computation is . Given a partition of the mesh domain and a partition of the time interval . The average on the cell isHere, the sign denotes the length for the region in one dimension, the area for the region in two dimensions, and the volumes for the region in three dimensions. Integrating (2) over , we haveApplying the Green-Gauss’s theorem and rewriting, we havewhere is the outward unit normal vector of boundary external surface . The Lax-Friedrichs flux is defined bywhere . It satisfies the conservation and consistencyLet be a partition of boundary external surface and be the outward unit normal vector of surface . Then we have a semidiscrete scheme of (2)where or , , is the internal or external approximate value at .

For the one-dimensional space, , and is the left (right) point of line cell . Thus, (12) becomesBy the initial reconstruction technique [15] to reset , , , on the edge , we can obtain the second-order accurate spatial discretization:where is a nonlinear limiter function which is used to suppress the possible pseudo oscillation. In our computations, we use van Leer’s limiter [15]For the two-dimensional space, , and the diagram for edges , , of the quadrilateral element is shown in Figure 1.

The values , at edge , , can be defined byFor the three-dimensional hexahedron element, the situation is complex. Due to the movement of the mesh points, the hexahedron element will change to polyhedron element. For calculating the element volume and the boundary external surface area, each surface of hexahedron element is divided into two surfaces and calculated. The control element is illustrated in Figure 2. The boundary surfaces , , are defined in Table 1.

The second-order values , at edge , , can be obtained byTime discretization can be achieved by the strong stability preserving high order Runge-Kutta time discretization [16, 17]. The semidiscrete scheme (12) can be written asHere, the second-order TVD Runge-Kutta method for (18) in the time discretization is

3.2. Pseudo Arc-Length Moving Mesh Scheme

Let and denote the physical and computational coordinates, respectively. Here, denotes the number of spatial dimensions. A one-to-one coordinate transformation from the computational domain to the physical domain is denoted byIn the variational approach, the mesh map between the domains and is usually provided bywhere and , , are given symmetric positive definite matrices called monitor functions, depending on the underlying solution to be adaptive. The Euler-Lagrange equations of the functional (21) have the formIn practice, may have a very complex geometry, and as a result it is not realistic to solve (22) directly. An alternative is to consider a functionalwhere . In addition, one of the choices of monitor functions is , where is the identity matrix and is a positive weight function. For the purpose to have more accuracy near the nonsmooth part of solutions, we introduce the monitor function of pseudo arc-length norm [18, 19]where and are some nonnegative constants. Thus, the corresponding Euler-Lagrange equations about (23) areIn our computations, we will use the Gauss-Seidel type iteration method to solve the mesh equation (25)for , , and , wherewhere In addition, let us consider the solutions updating on new grids from the old grids . Here, the solution updating should preserve conservative property for in the sense of thatLet denote the region scanned by the boundary after one iterative step of (28). So we can have the following conservative interpolation scheme:where and are the reconstructed left and right states on the boundary by (for more details see Section 3.2). , , denotes the integration of over the domain . Here we have omitted the subscripts , , of and for simplicity. Since only cell averages of over are known, we should give an approximation of instead of its exact value. Then can be approximately calculated asNext, we will show the examples to calculate , , according to different dimensions.

In one-dimensional space, , can be obtained from the following:In two-dimensional space, the changed area , , are illustrated in Figure 3. As an example, we compute . Noticing , we haveObviously, is the area, which is positive if , , , and are located in a counter-clockwise order; otherwise it is negative if those four points are located in a clockwise order.

In the three-dimensional space, the control element is illustrated in Figure 4. Similarly, we compute as an example. The changed area is illustrated in Figure 5. is composed by three tetrahedrons, so we haveEach tetrahedron volume can be calculated by the volume formulaObviously, is the volume, which is positive if , , and are located in a counter-clockwise order; otherwise it is negative if those three points are located in a clockwise order.

The solution procedure can be illustrated by the following flowchart.

Algorithm 1.
Step  1. Give a uniform mesh on the computational domain . Initialize the physical mesh on the physical domain , and compute the grid initial values . Then, let and .
Step  2. With scheme (26), move the grid from to and the corresponding grid values from to based on scheme (29) for . Repeat the updating procedure for a fixed number of iterations or until . Then, let and at time level .
Step  3. Evolve the underlying PDEs using finite volume scheme (12) and time discretization scheme (19) on the new mesh to obtain the numerical approximations at the time level .
Step  4. If , then let and , and go to Step  2.

Remark 2. In practice, it is common to use some temporal or spatial smoothing on the weight function to obtain smoother meshes in order to avoid very singular mesh and/or large approximation errors near those regions where the solution has a large gradient. A low pass filter should be used to smooth the weight function 2~3 times for each : in one-dimensional space:in two-dimensional space:in three-dimensional space:

Remark 3. Because discontinuities may initially exist on boundaries or move to boundaries at a later time, redistribution of boundary points should be made in order to improve the quality of the solution near boundary. For convenience, our attention is restricted to the case in which the physical domain is straight boundary. Assume that a new set of grid points is obtained in by solving the moving mesh equation (26). Then the speeds of the internal grid points are given bywhere , . We assume that the points of boundaries are moving with the same speed as the tangential component of the speed for the internal points adjacent to those boundary point. For an example, when the boundary surface is the boundary surface of , we have where , . Thus new boundary points are defined by where , . The redistribution for other boundaries can be carried out in a similar way. Numerical experiments show that the above procedure for moving the boundary points is useful in improving the solution resolution.

4. Parallelization Strategy

The perfect parallel strategy can reach the conflicting goals of portability and high performance. Here, we adopt the software architecture of MPI, which is a de facto standard for parallel programming on distributed memory systems [20]. The primary importance with the MPI model is its discrete memory view in programming, which makes it hard to write and often involves a very long development cycle [21, 22]. Carefully thought-out strategies for partitioning the data and managing the resultant communication are essential for a good MPI program. Secondly, due to the distributed nature of the model, global data may be duplicated for performance reasons, resulting in an increase in the overall memory requirement. Besides, because of machine instabilities and the lack of fault tolerance in the model, it is very challenging to get very large MPI jobs running on large parallel systems (although this is true in general for other programming models). Considering our numerical schemes, the schematic diagram for our parallel computation is given in Figure 6. Here, corresponds to the number of processors. Next, the illustrations aiming at the critical problems for the schematic diagram will be discussed.

4.1. Partition the Spatial Mesh Domain

In the parallel computation, the whole domain is divided into some blocks, and each block is distributed to different processors. The distribution is such that each processor gets allocated with one block. Because the computational complexity is related to the number of mesh points, the whole spatial domain is divided according to mesh points rather than spatial coordinates. In addition, the number of mesh points for each processor computation is the same as far as possible. The common partitions for whole spatial domain are given in Figure 7. Figure 7(a) is line partition which fits the stripe domain. The surface partition is shown in Figure 7(b) which can be used for the surface domain and body domain. The body partition in Figure 7(c) is used for the three-dimensional spatial domain. Next, we will discuss relationship between the whole domain and processors, and pseudocodes will be given. It is assumed that is the number of cells in the whole spatial domain. The array is created to store which processor is assigned to each cell, and array is created to store the number of cells for each processor. The new number in processor for th cell can be obtained by the following pseudocode:1~N                  

In addition, the array is created to store the relationship between the whole domain and processors. The pseudocode is                  

4.2. Reduction and Synchronization

The size of spatial mesh is different for each processor, which leads to the time step for each processor being different. Thus time step is needed to synchronize in the computation. and are used to achieve this. Let be the time step of each processor computation with CFL condition. is the time step after synchronization. The codes for MPI Fortran are__

4.3. Processor Communication

For data communication, it is necessary to know neighbor processors for each processor. For line partition, there are two neighbors for current processor to communicate, and if the current processor is on the boundary or its neighbor does not exist, the neighbor processor will be signed . For the adjacent two processors and ( is left; is right), there are adjacent cells for processors and . Thus, processor needs cells to communicate with processor . In the same way, processor needs cells to communicate with processor . The following pseudocode is given to show the communication between and .                                                                                                

The arrays and are the cells which send and receive, while The arrays and are the cells which send and receive. For the surface partition, there are eight neighbor processors for the current processor, which is shown in Figure 8(a) (here, is the current processor). Because neighbors , , have the common communicating edge with the current processor , we can use the similar way as the line partition to communication. The position of processor , , can be obtained with , . For example, the following pseudocode is to show the communication between and .                                                                                                            

When the current processor is on the boundary, which is shown in Figure 8(b), the ghost cells can be obtained with the current processor , other ghost cells and for can be obtained by communicating with and . The following pseudocode is for obtaining                                                       

For body partition, there are neighbor processors around the current processor , which is shown in Figure 8(a). We have divided the neighbors into three kinds , , and , , and , where they are corresponding to three kinds of communication which is shown in Figure 9(b). , , and are the examples, of which we will give the pseudocode to show the communication of body partition.

For ,                                                

For ,                                                      

For ,                                                      

Figure 10 shows the communications when the current processor is the boundary processor. Figure 10(a) is the surface boundary while Figure 10(b) is the edge boundary. For Figure 10(a), the communication is the same as the situations in Figure 8, while Figure 10(b) is the similar situation as the line partition. Thus, we leave it out.

5. Numerical Examples

In this section, the numerical convergence and the efficiency for our parallel schemes will be shown. In addition, the practical problem will be considered. In our simulation, the parameters are taken as , , , , , and . The simulations are taken on the computer group of four nodes. There are two Intel E5620CPUs on each node, and each CPU is quad-core.

5.1. Example  1

Firstly, the numerical accuracy is tested for the one-dimensional Euler equations. A periodic boundary condition is used and the initial conditions are set to be , , , , and . The exact solution for this problem isThe final time for this computation is . The monitor function for this computation iswhere and . The parameter is . The computational domain is taken as . The errors and convergence orders in the density are obtained and compared in Table 2. From Table 2, we can see that our scheme is the second-order convergent scheme. In addition, Sod’s classical shock tube problem is studied to show that our schemes can capture the singularities and avoid the nonphysical phenomena. The Riemann initial values areand reflecting boundary conditions at and . The terminal time is . The solutions about density and the trajectory of mesh are shown in Figure 11. There are cells used in the simulation. The monitor function employed for this computation isIt is found that the mesh is adaptive with the solutions and our schemes can capture the discontinuous jump in numerical simulations. The computational complexity and errors analysis are compared between the fixed mesh and moving mesh schemes in Table 3. Comparing (F1) and (M1), we can use mesh points to reach the effect of fixed mesh points while costing the same CPU time. In addition, the different arc-length parameters with mesh points are simulated to compare and fixed mesh points schemes. Our schemes can reach better results with less mesh points. Particularly, , errors are obviously reduced. Thus we can conclude that our schemes can capture the singularities and avoid the nonphenomena.

5.2. Example  2

This is the two-dimensional example. Firstly, the numerical accuracy is tested with the periodic boundary condition and initial conditions , , , , and . The source terms are adjusted so that the governing equations (2) can fit the exact solutionHere the monitor function is about entropy (42). The final time for the computation is . The errors and convergence orders about density are shown in Table 4, which implies that our schemes are the second-order convergence. Next, we apply our schemes to a two-dimensional explosion problem. The computational domain is illustrated in Figure 12. The boundary condition is reflective, and the initial conditions in the unreacted region are . The initial conditions in the reacted region are obtained by the model [23]. There are cells in our computation. The monitor function is used in the computation. The terminal time is . The parallel cost with different processors is compared in Tables 5 and 6. Table 5 is the comparison of parallel efficiency with different processors for line partition, while Table 6 is for the surface partition. The and for parallel computations are defined byBoth tables illustrate that our parallel schemes are effective and the computational cost time is reduced. Figure 13 is the computational results at time , and Figure 14 is at time . The solutions figures show that our schemes can capture the shock waves and the mesh trajectory is adaptive.

5.3. Example  3

The last example is the three-dimensional problem. The numerical exact test is taken firstly. The periodic boundary condition is used and the initial conditions are set to be , , , , and . The terminal time is and the monitor function is about entropy (42). The exact solution is The errors and convergence orders about density are shown in Table 7, which implies that our schemes are the second-order convergence. The computational domain and boundary conditions are shown in Figure 15. The initial values for reacted region and unreacted region are the same as Example two. Table 8 is used to show the parallel efficiency for body partition, which illustrates that the computational time is reduced and our parallel scheme can deal with the three-dimensional problems. There are cells in our simulations. The monitor function is used in the computation. The terminal time is . Figures 16 and 17 show the adaptive solutions. Figure 16 shows the three-dimensional sliced effect of the adaptive mesh with the solutions at different time points. Figure 17(a) is the simulated result of the block in the center of the domain about density at time , while Figure 17(b) is a -direction of the sliced simulation about density at time . The spatial cells are distorted to adapt to the shock wave front. They show us that our schemes can capture shock waves.

6. Conclusion and Discussion

In this paper, we have discussed the parallel computation for pseudo arc-length moving mesh schemes. Different from the traditional Euler numerical scheme, the communications between processors are more complex, which include adjacent processor and diagonal processor. The parallel schemes including line, surface, and body partition are all considered in this work and the pseudocodes and schematic diagram are given. Finally, the numerical examples are shown to illustrate that our parallel schemes are effective and the pseudo arc-length moving mesh schemes are convergent and can be used to capture the shock wave.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Grant nos. 11390363 and 11532012).