Abstract

Parareal is a kind of time parallel numerical methods for time-dependent systems. In this paper, we consider a general linear parabolic PDE, use optimal quadratic spline collocation (QSC) method for the space discretization, and proceed with the parareal technique on the time domain. Meanwhile, deferred correction technique is also used to improve the accuracy during the iterations. In fact, the optimal QSC method is a correction of general QSC method. Along the temporal direction we embed the iterations of deferred correction into parareal to construct a hybrid method, parareal deferred correction (PDC) method. The error estimation is presented and the stability is analyzed. To save computational cost, we find out a simple way to balance the two kinds of iterations as much as possible. We also argue that the hybrid algorithm has better system efficiency and costs less running time. Numerical experiments by multicore computers are attached to exhibit the effectiveness of the hybrid algorithm.

1. Introduction

The parareal algorithm was firstly proposed by Lions et al. [1] in 2001 as a parallel-in-time approach for solving time-dependent differential equations, mainly for ordinary differential equations (ODEs) and parabolic equations [24]. The significant advantage of the parareal algorithm is that it does not require the consecutive solving of subproblems along the temporal direction. As a purely parallel algorithm, the parareal algorithm has been applied to many practical problems, such as hyperbolic problems [5], fluid mechanics [6], quantum control [7, 8], and optimized control problem [9, 10]. Recently, some innovative variants of the parareal were proposed by integrating with other efficient methods, such as spectral deferred corrections [11], domain decomposition technique [12], the Krylov subspace method [13], and adaptive technique [14]. We have also proposed a kind of parareal waveform relaxation methods for ODEs [15] and parabolic PDEs [16], as well as parareal deferred correction methods for PDEs [17].

The parareal algorithm was initially derived from multiple shooting, time multigrid, and algebraic deferred correction [18, 19] through borrowing the idea of domain decomposition on the time domain. In particular, it employs two operators on two time decompositions of different step sizes with the name of coarse and fine grids to propagate the values along the time direction. The initial guesses of subproblems defined on the coarse grid are given by the coarse propagator for the first iteration. Then, the values at the interface of two adjacent coarse-grid subproblems are updated using fine propagators in iterations, which is quite similar to the Schwarz methods for elliptic systems. Spectral deferred correction method, a kind of iterative methods for solving ODEs, is originally proposed in [20] and has been combined with parareal in [11]. It makes use of the previously computed solution on the same time subdomain. High order approximations can be achieved while the computation cost is reduced sharply.

In this paper, we apply the idea of correction to the problem of parabolic PDEs. In detail, we employ the optimal QSC method for the spatial variables and PDC method for the temporal variables, which will result in high accuracy approximations with low computational cost. The new method can also be implemented in parallel, just like the same parallel mechanism as the classical parareal method.

The outline of our paper is as follows. In Section 2, we give a brief review about parareal algorithm and deferred correction. In Section 3, we employ the optimal QSC method for the space discretization of parabolic PDE, which results in an ODEs system. Then we present the PDC method and apply the method to the ODEs system. In Section 4, we give a proper error estimation for the PDC method. In Section 5, we analyze the stability of the algorithm and show the relationship between the stability and the propagators employed. In Section 6, we present a simple way to choose the time step of the fine propagator. The speedup and system efficiency of the new algorithm are analyzed in Section 7. Numerical experiments are attached in Section 8.

2. Materials and Methods

2.1. Parareal Algorithm

We briefly review the parareal algorithm on the following nonlinear ODEs:where the nonlinear function is Lipschitz on and the function is to be computed.

The parareal algorithm requires a decomposition of the time domain into time subdomains , , with . For simplicity, we suppose that all time subdomains have the same length . The approximations at these points can be obtained using the following scheme:where and denote, respectively, the coarse and fine approximation at the point with initial value at the point . The coarse propagator is cheap and lowly accurate, and the fine propagator is of high precision but also more expensive. The initialization of the iterative algorithm (2) is usually realized by the propagation of the coarse propagator along consecutively; that is, .

The parareal algorithm has several well-known properties in the following: (a) the initial values for the fine propagator on each time subdomain are known in every iteration, so the fine propagators on different time subdomains can be implemented by different processors. (b) The final result of parareal algorithm will achieve the accuracy of serial implement of the fine propagator on the whole time domain after iterations, where is the number of time subdomains. Note that the number of iterations is usually set to be much less than the number of time subdomains in practice to avoid necessary computations, while achieving a competitive accuracy [13].

2.2. Deferred Correction Method

Deferred correction method is an iterative method for ODEs. We employ the scheme designed in [21] as one of the basic elements for our proposed method, in spite of many other variants of the original deferred corrections. We adopt the scheme in [21] because of its simplicity for implementation with high accuracy and also robustness for different problems.

We also take system (1) as our example to describe the execution details of deferred correction method. We firstly define a grid on with equispaced nodes, where , with time step . An th order method on this grid could bring a numerical solution with error . In addition, we can also generate a unique th-degree polynomial on with this grid using Lagrange interpolation. Then we can define an error function which satisfies the differential equationThis equation is usually regarded as the neighbor problem of original system (1) in the regime of deferred correction methods. Meanwhile, if we use the same th numerical method to solve this neighbor problem, we could get another group of numerical values on the same grid with error . The numerical solution is a better approximation of the original system after one correction since we have [2224]. If we iterate such correction times, the order of accuracy of the iterated deferred correction is . In practice, the degree can not be very large.

3. QSC-PDC Methods for Parabolic PDEs

In this study, we consider the following linear PDE as our basic model to demonstrate our proposed method:where is the spatial domain, is the time domain with , and is the unknown function. We take the time domain decomposition for system (5) and design the generic iteration scheme:The essential efforts of this paper are concentrated on the design of techniques, which could bring execution cost and high accuracy for scheme (6).

3.1. QSC Methods for Spatial Variables

We firstly consider reducing the computational cost in spatial domain. Note that, for every iteration , we have to use the propagators and to solve a linear homogeneous PDE defined on a fixed spatial domain the same as that of the original system for every time subdomain. The discretization of the spatial and temporal variables will result in algebraic equations for subproblems. The approximate order of the discretization procedure is crucial to the scale of the algebraic equation. In this paper, we use the optimal QSC method for the spatial variable, which was originally used for two-point boundary value problem [25], then elliptic PDEs [26], and parabolic PDEs [27, 28].

Similarly, we consider a uniform partition of the spatial domain, with mesh size . We denote as the space of piecewise quadratic polynomials and as the space of quadratic splines. Remember that the dimension of the space is . We let be a set of piecewise polynomial basis functions of the space .

For a fixed , system (5) is a two-point boundary value problem, whose quadratic spline approximate solution can be written as Therefore, we can write the approximation of system (5) as where , , are degrees of freedom (DOFs). Taking the set {, ; , } as the collocation points, we can obtain ODEs systems with respect to the DOFs.

Starting from the quadratic polynomial we choose a set of quadratic B-splines as the basis functions of the space . Following [27], we denote as the space of quadratic splines satisfying homogeneous Dirichlet boundary conditions. The dimension of the space is , and a set of basis functions of such space is {, , , }. Then the quadratic spline approximate solution of system (5) can be written as With the values of the basis functions on collocation points , we can obtain the following ODEs systems on the DOFs of the optimal one-step QSC methods:where , the coefficient matrices areand is the interpolating vector of at all the collocation points. The vector satisfies , with the interpolation of at collocation points.

The discretization error of the spatial variable in the uniform norm is globally and at the nodes of the uniform partition and the collocation points [25]. In contrast, the classical finite difference discretization could only generate approximations with error locally, which demonstrates the great benefits of the optimal QSC.

3.2. PDC Method for the Temporal Variables

The parareal algorithm has a very important property. The approximation at all the endpoints of time subdomains will achieve the accuracy of the -propagator after iterations [8, 18], where is the number of time subdomains. In fact, for any , the approximation at the time point will be updated times and reach the final approximation which is exactly the result by the -propagator at this time point.

In order to carry out the PDC method on ODEs system (12) with respect to DOFs, we discretize each time interval into smaller equispaced grids and denote the PDC approximation at the point in th parareal iteration. For simplicity, we only arrange one correction in each parareal iteration. The details of the hybrid algorithm are described as follows step by step:(i)Use the coarse propagator to obtain rough numerical approximations at all the time interval points , that is, , .(ii)Use the prediction step of the new propagator on each time interval , usually an th-order numerical method, to obtain fine approximations at the grid points on with the initial value at . We suppose that the numerical solution is , and , where .(iii)Use the inductive formulae of parareal algorithm to obtain improved approximations at , ; that is,(iv)For , ( is the number of parareal iterations)(1)Solve the following neighbor problem of the original system:where is the error function on each time interval from the previous step, is the true solution of (12) restricted on , and is an th-order polynomial interpolating . We denote the numerical solution of (15) as .(2)Update the numerical solution , and let .(3)Perform the following iterative formulae to get improved approximations:where .

With these final DOFs , we can obtain the approximations of original system (5) on the grid consisting in spatial collocation points and temporal points.

Remark 1. When solving neighbor problem (15), we can use some simple numerical integrators, such as the backward Euler method. An alternative formulation is to compute the approximate solution in two steps, and the order of accuracy is also preserved [25, 27, 28]. The realization process of such strategy based on backward Euler method is as follows: (i)first compute a second order approximation by (ii)then compute an optimal order approximation by

The two-step version involves two linear systems over every time step, and both the two coefficient matrices of the above two systems are tridiagonal. So the total computational cost will be smaller than that of one-step integrator (15), in which the bandwidth of the coefficient matrix is 4, because of the matrix .

4. The Convergence of the PDC Method

The convergence analysis of the optimal QSC method can be found in the literature [25], and we just need to present the convergence of the PDC algorithm in the temporal domain. Bal [2], Gander, and Hairer [29] gave a convergence analysis on the classical parareal algorithm. Based on their work, we present the convergence analysis on the PDC method.

We also assume that the coarse propagator satisfies error expansionfor small and the Lipschitz conditionwhere means the true solution at point of the original system with the value at the time point .

Within every time subinterval , the difference between the true solution and the approximation by the PDC method can be written as Based on this relationship and also the conditions (20) and (21), we could get

We first look at the last term , which is an upper bound of the error caused by the deferred correction. For simplicity, we assume the solution of the original system to be smooth enough and solve the original equation in the prediction steps and the neighbor equations in the correction steps by the forward Euler integrators.

The prediction step in deferred correction includes a numerical integration with initial value on every time interval . Let be the numerical solution from the forward Euler integrators on uniformly distributed nodes in with time step . We know from [30] that the error satisfies , where is the true solution at with the same initial value at . In fact, the error at is related to the length of , so it is reasonable to hypothesize , where is a constant.

The correction steps in the hybrid algorithm are different from these steps of the pure deferred correction. The initial values of neighbor problems are always for pure deferred correction. Meanwhile, for the correction steps which are embedded in parareal process, the initial values of the neighbor problems are kept being updated in parareal iterations. In other words, the initial value of problem (15) on time subinterval is . However, this formula of the initial value does not affect the iterative improvement property of the deferred correction. For the th parareal iteration, we havewhere . We can see from (24) that the error of deferred correction method equals the error of the corresponding neighbor problem. Similar to the inductive convergence proof of spectral deferred correction in [30], we can obtain that , considering the solution of the original system to be smooth enough.

Based on these results, we have the convergence of the PDC method.

Theorem 2. Let the solution of the original system be smooth enough. If the coarse propagator satisfies condition (20) and condition (21), where , , are continuously differentiable, then the error at time point after PDC iterations is bounded:Obviously, when the time step approaches , such error bound will go to with enough iterations.

Proof. We suppose that a constant exists such that holds for any . Inequality (23) can be bounded by Considering such inequality, we study the recurrence relationwhere , , and . It is easy to see that is an upper bound on , and we can choose , . Multiplying (27) by and summing over , we obtain where .
We denote . This function satisfies the following recurrence relation: Then, we have Solving for , we obtain after induction where . We know that replacing the factor in the denominator by only will increase the coefficients in the power series of . It means the coefficients in the power series of are smaller than the corresponding coefficients in the power series of , whereUsing the binomial series expansion it is easy to find that the coefficient of the term in the expression is For any , we have . So, it follows

Remark 3. In practice, we usually take very small and much smaller. We notice that the first term in (25) decreases to superlinearly as increases, and such decrease is very quick with the help of small . For another hand, the increase of also takes the summation in (25) much smaller despite more terms. Therefore, a couple of iterations are enough to pull the error down below some tolerance.

5. Stability Analysis of the PDC Method

To describe the stability of the PDC method, we apply such numerical method to the equationWe follow the definition of stability in [20] and define the amplification factor for as where denotes the numerical approximation at the point . The numerical method is stable for some given value of , if

For simplicity, we adopt forward and backward Euler methods in the analysis. The time domain is divided into subdomains. The coarse propagator is forced to take large time steps, and implicit Euler is better than explicit Euler for the coarse propagator [31]. Therefore, we choose one-step backward Euler method to be the coarse propagator on every subdomain, while the fine propagator is chosen as steps of Euler method on every subdomain for the original equations and corresponding neighbor problems in the process of deferred correction.

We notice that the approximation is related to the iteration number of the PDC method. We denote by the approximation of system (36) at th subdomain point and on the th iteration, where and . The approximations satisfy Because of the one-step backward Euler method for the coarse propagator, we have We denote as the amplification factor of the coarse propagator and let be the amplification factor of the fine propagator, which is from the deferred correction method. In fact, the first iteration of the PDC method is exactly the classical deferred correction, but with smaller numerical steps on each time subdomain. Therefore, for steps of backward Euler integrators, and for steps of forward Euler integrators. However, the expressions of will be much more complicated when .

For the initial approximation, we have For the first iteration (), we have and, after inductions, we get which is also the amplification factor after one iteration of the PDC method. We will continue to analyze the stability in two different cases, respectively.

Case 1. For the steps of implicit Euler integrators as the fine propagatorsWe denote , and the factor is always smaller than 1 except two areas around its two poles and . If we choose and , the stability regions are shown in Figure 1.
When , we can not get the explicit expressions of the amplification factors. The stability regions are estimated numerically in Figure 2. We see that the size of the instable region expands as the iteration number is increased. Nevertheless, these instable regions always lie in the right half plane, so all the PDC methods with implicit fine propagators at these tested iterations are -stable.

Additionally, we notice that and the PDC method with both implicit propagators is -stable at the first iteration. We take several values for along -axis, and the corresponding amplification factors are in Table 1.

From Table 1, we can conjecture that the PDC methods at different iterations could also be -stable.

Case 2. For the steps of explicit Euler integrators as the fine propagators, the amplification factor is The regions of to make are related to the relationship between and .
If , the factor has two poles, that is, and infinity. We choose and , and the stability regions of the PDC method at different iterations are shown in Figure 3, in which the stable areas are shrunken as the iteration proceeds.
If , the factor has one pole, that is, . We choose and , the stability regions of the PDC method at different iterations are shown in Figure 4. In fact, there are very small unstable areas located at the left half plane in Figures 4(a) and 4(b), so the PDC methods are not -stable in such situations either.
If , the factor is always a positive constant much smaller than 1 because the number of time steps is usually a moderate integer. Therefore, we are reasonable in merging this case into the previous one.
In Figure 4, we observe that there is a significant change about the pattern of the size of the stable region at the third iteration. The stability region before the third iteration is quite similar to the stability region of the backward Euler method, while the stability region after the third iteration has a similar shape of the stability region of the forward Euler method. The change means that the dominant control is transferred between the components of the PDC method. Figure 5 illustrates the relationship between this critical iteration and the number of interpolation points .
We can see a common trend that the stability regions are decreased with more iterations according to Figures 24 in this section. In fact, there are several factors having impact on the size of the stability region, including the coarse propagators, the fine propagators, the way of interpolation at those equispaced nodes, and the numerical differentiation formula of the original ODEs [32, 33]. We believe that the change of the stable regions is the result of all these factors working together accumulatively.

6. The Choice of the Fine Time Step

The PDC method includes two iterative processes, that is, parareal and deferred correction. Their individually different iteration speeds will alternatively influence the performance of the proposed hybrid algorithm. To this end, we study the balance strategy to avoid the waste of resources as well as a slowed-down speed. It is intuitively to tune the parameters of deferred correction since they are embedded in the framework parareal for our proposed PDC method. In fact, there are several ways to improve the accuracy of the final approximations for pure deferred corrections, such as smaller time step, higher order numerical integrators, or more iterations. In this study, we choose to tune the time step of the fine propagators of the PDC method.

Here we take the following system as an example to show how to find an optimized time step for the PDC method:We assume the propagator is one-step backward Euler integrator on with , and the fine propagator is the -step backward Euler integrator for prediction and each correction loop. Therefore, we have where is the fine time step. Let and ; then an upper bound of the error can be obtained from the inductive equationWe rewrite the recurrence relation in form of vectors, which is The fine time step can be regarded as the convergence factor of deferred correction iterations. If th order numerical integrator is employed in deferred correction, this factor will be obviously . The constant , which describes the difference between the true-solution propagator and the coarse propagator, can be regarded as a part of the convergence factor of parareal in contrast to deferred correction. We can choose close enough to to balance the two iterations in the PDC method.

Note that the fine time step is crucial for the final approximation performance of the PDC method. We denote the number of the fine time steps on each time subdomain by , where . Polynomials with degree of are employed in the process of deferred correction, and is the upper bound of the final approximate order of the PDC method. So there should be a moderate integer such that , which means the final result could reach at least the accuracy . However, the can not be very large either because the polynomials of low degree are preferred for interpolations. So there should be a little bigger integer such that to avoid Runge’s phenomenon.

Now, we get a value interval for the fine time step, with . The choice of the optimal time step iswhich means that should be chosen as the nearest point to from the value interval .

7. System Efficiency Analysis

Parareal can be implemented in two parallel mechanisms which are called serial-parallel parareal and pipelined parareal in [34]. For the pipelined parareal, each processor alternatively compute the fine and coarse propagators without waiting except the initial wait. It had been shown that the pipelined fashion could be better in terms of the computation speed. In this section, we study the parallel speedup and efficiency of the PDC method based on pipelined parareal method.

We first investigate the PDC method for ODEs system (12) from optimal one-step QSC method for PDE. Here we suppose backward Euler methods are used in both the coarse and fine propagators. We also assume that the computing costs of the algebraic equations involved in and propagators are the same if they have the same scale and sparse structure. When analyzing the speedup of the classical parareal method, we equalize the computing costs of one time step at each iteration. In this paper we do not give such assumption any more, since interpolations are necessary in correction steps of the PDC method, which is not included in the prediction step. The costs of information transmission and data storage are ignored in this paper.

Similar to the speedup and system efficiency analysis shown in [34], we adopt the following notation:(i) is the length of time domain.(ii) is the length of each time subdomain.(iii) is the time increment for the coarse propagator.(iv) is the time increment for the fine propagator.(v) is the cost of the numerical method at each time step in .(vi) is the number of time steps of the coarse propagator on each time subdomain, and we often choose .(vii) is the cost of the numerical method at each time step in .(viii) is the number of time steps of the fine propagator on each time subdomain.(ix) is the cost of interpolation averaged at each time step in correction steps.(x) is the number of time subdomains on ; that is, .(xi) is the number of PDC iterations.(xii) is the speedup.(xiii) is the system efficiency and equals the ratio .

According to the pipelined parallel mechanism in [34], the initial step of the PDC method is done with the propagator and the cost for steps is . The cost of the first iteration of the PDC method is , while the cost of any other iteration of the PDC method is .

The total cost for the initial and iterations of the PDC method is Let be the number of deferred correction iterations to compute the same problem sequentially on the fine DC nodes with one processor to achieve the same grade accuracy as PDC iterations. Then the cost of the serial DC method is . The parallel speedup of the PDC method is and the system efficiencyIf we further assume and for simplicity and let , the efficiency will be where denotes the ratio between the cost for solving algebraic equation and the cost for interpolations; it is often a small constant according to the numerical experiment later.

It is reported in [34] that the parallel efficiency of the classical pipelined parareal under our simplifying assumptions is which is automatically bounded by . In contrast, the efficiency of PDC method is bounded by , which is apparently better than that of the classical parareal method. However, we have to pay additional attention to the constant . The constant denotes the ratio between the time steps of the coarse and the fine propagators, since we assume . For the classical parareal method, it is required that is much greater than , for classical parareal method is much less than 1, and the resulting efficiency is very close to its upper bound. For the PDC method, the accuracy is improved by iterative corrections, and is not needed to be much greater than ; then for PDC method is not so small. Therefore, the resulting efficiency of PDC method is not so close to its upper bound as the efficiency of the classical parareal method.

In order to show the contribution of QSC method to the whole algorithm, we investigate the PDC method for the following ODEs system generated after the classical finite difference for PDEwhere is the size of spatial grid, the coefficient matrix and is the interpolation vector of at equispaced grid points.

Let and be the cost of the numerical method at each time step in and , respectively. Let be the cost of the interpolation averaged at each time step and define . Obviously, and are directly related to the scale and structure of the algebraic equations involved. As we know, the finite difference method has the approximate order , while the optimal QSC method has the order . To achieve the same level of error, the dimension of system (57) should be much greater than the dimension of system (12). Let and be the dimensions of system (12) and system (57), respectively. From , we obtain .

For the PDC method for system (12), the coefficient matrices of the linear equations have sparse structure with bandwidth 4, and the corresponding time complexity is . For the PDC method for system (57), the coefficient matrices of the linear equations have sparse structure with bandwidth 1, and the corresponding time complexity is . Then we have approximate relationships and , and the approximate system efficiency is which is almost the same as , except for a different constant . Therefore, the application of the QSC method nearly has no effect to increase the system efficiency.

The total cost after iterations of the PDC method for system (57) is where the ratio is a small constant. The cost of PDC method for system (57) is about times of the cost of the PDC method for system (12). Since is a very big integer, the QSC method reduces the total cost greatly. Furthermore, if the optimal two-step QSC method is employed, the reduction factor will be rather than .

8. Numerical Experiments

In this section, we take the following parabolic PDE as an example to present the advantages of the QSC-PDC algorithm:The true solution of system (61) is

To perform the QSC-PDC method, we first employ a uniform partition of the spatial domain with 160 equispaced grid points, and the step size is . Then we take the midpoint of each segment as the collocation points; that is, Define the vector . The resulting collocation equation iswhere the matrices , , and are the same as the matrices in system (12). Here, we only consider the optimal one-step QSC method.

We divide the time domain into 40 subdomains. On each subdomain, we use 1-step backward Euler method for the propagators and use 5-step backward Euler method for the propagators. Polynomials of degree 5 are employed for interpolation in each correction loop. The PDC method is programmed in the pipelined manner, and information is communicated between processors through MPI, and the code is written using C language with GNU Scientific Library. The program is carried out on the Shared Hierarchical Academic Research Computing Network of Canada (https://www.sharcnet.ca/). 41 CPUs are employed in the experiments, including one root CPU and the other 40 CPUs. The root CPU is in charge of sending and receiving data and displaying outputs. The iterative computing of the propagators on the 40 time subdomains are distributed onto the other 40 CPUs.

For simplicity, we only examine the error at a special point . The resulting PDC error and corresponding running time are shown in Table 2. In order to exhibit the advantages of the QSC-PDC method, we perform the classical parareal method for system (64), and the spatial variable is also discretized by optimal QSC. Here, we call the reference method as QSC-parareal method. The time domain is also divided into 40 subdomains, and the coarse propagator is also chosen as 1-step backward Euler method on each subdomain. The fine propagator is chosen as 100-step 2-stage Gauss method on each subdomain, which is of high order and easy to carry out.

The classical parareal method for (64) is also programmed in pipelined manner, and the corresponding error at the same point and running time are shown in Table 3.

To show the advantages of the deferred correction clearly, we plot the iteration error and running time of the two methods together in Figure 6. We can see that the deferred correction technique makes the parareal method achieve better approximations in fixed iterations and costs much less running time.

9. Conclusions

In this paper, we firstly applied the optimal one-step QSC method to solve linear parabolic PDEs, which lead to linear ODEs with respect to the degrees of freedom. We then combined the parareal algorithm with deferred correction method along the temporal direction to deal with the collocation equations. With the fourth order error on the spatial step size, high accuracy could be obtained without too many collocation points. We also supplied an error estimation of the PDC methods, which illustrates good convergence with small time subdomains. We also analyzed the stabilities of the PDC method with different cases of coarse and fine propagators. We presented the tuning strategy for the time step size of the deferred correction method to balance the parareal and the deferred correction iterations of distinguish individual computational speeds.

Compared with the classical parareal method, PDC method has much higher parallel efficiency, and the optimal QSC can reduce the computational cost greatly when achieving a desired accuracy. By the numerical experiments on multicore computers, the advantages of the QSC-PDC algorithm are obvious.

Competing Interests

The authors declare that they have no competing interests.

Acknowledgments

The authors would like to thank Professor Kenneth R. Jackson of University of Toronto for his valuable suggestions. They also thank Shared Hierarchical Academic Research Computing Network of Canada and its staff for supplying multicore computers and for guiding them to use it. This work was supported by the National Natural Science Foundation of China (no. 11401589, no. 11301543, no. 11326246, and no. 11571367), Shandong Provincial Natural Science Foundation (no. ZR2013AL018), and the Fundamental Research Funds for the Central Universities (no. 14CX02148A).