#### Abstract

Four kinds of array of induced polarization (IP) methods (surface, borehole-surface, surface-borehole, and borehole-borehole) are widely used in resource exploration. However, due to the presence of large amounts of the sources, it will take much time to complete the inversion. In the paper, a new parallel algorithm is described which uses message passing interface (MPI) and graphics processing unit (GPU) to accelerate 3D inversion of these four methods. The forward finite differential equation is solved by ILU0 preconditioner and the conjugate gradient (CG) solver. The inverse problem is solved by nonlinear conjugate gradients (NLCG) iteration which is used to calculate one forward and two “pseudo-forward” modelings and update the direction, space, and model in turn. Because each source is independent in forward and “pseudo-forward” modelings, multiprocess modes are opened by calling MPI library. The iterative matrix solver within CULA is called in each process. Some tables and synthetic data examples illustrate that this parallel inversion algorithm is effective. Furthermore, we demonstrate that the joint inversion of surface and borehole data produces resistivity and chargeability results are superior to those obtained from inversions of individual surface data.

#### 1. Introduction

IP methods are important in geophysical electrical surveys. Surface exploration is used for detecting metallic and nonmetallic minerals, using various observational techniques. Borehole-surface, surface-borehole, and borehole-borehole exploration methods are also widely used in mining field contexts and give improved results for the prospecting for ores, oil, and gas.

Many previous resistivity and IP inversion methods have been developed, including 2D least-squares inversion [1], a 3D resistivity inversion using alpha center method [2], many linear and nonlinear IP inversion methods [3, 4], and a 3D IP conjugate inversion method [5].

Since 3D problems are very much larger than 1D and 2D problems, their solution is correspondingly more costly in computer time. Parallel MPI and GPU algorithms are generally used for numerical calculation. MPI is a library that is called by Fortran77, Fortran90, C, and C++ software and serves to communicate between processes. It has two basic patterns: the equal pattern and the principal-subordinate pattern [6].

GPUs are parallel processors with a large number of computation units and are superior to CPUs in both processing capability and memory bandwidth. They are cheaper and consume less power, and they are therefore used for large-scale, high-performance computation work [7, 8]. Programs are written in the familiar C, C++, Fortran, or an ever-expanding list of supported languages [9]. CULA is a set of GPU-accelerated linear algebra libraries utilizing the NVIDIA CUDA parallel computing platform to dramatically improve the speed of advanced mathematical computation. In other words, to use these functions an NVIDIA GPU with CUDA support is required. The library contains CULA Dense and CULA Sparse.

Solution is very slow for forward and “pseudo-forward” equations using the nonlinear conjugate gradients method. Therefore, a set of 3D IP forward and inversion parallel algorithms are needed for surface, borehole-surface, surface-borehole, and borehole-borehole techniques. This paper introduces a parallel algorithm developed by parallel programming to improve 3D NLCG IP inversion. First the theory of the NLCG algorithm for 3D IP inversion is presented. Second, implementation of the MPI and GPU parallel algorithm is briefly introduced. Finally, the efficiency of inversion codes is analyzed, with tables and synthetic data examples.

#### 2. Forward

The electrical potential for arbitrary conductivity in a half-space is given by the differential equation:where is the conductivity of rocks (S/m); denotes the potential (V) in absence of IP effect; is the electric current. The quantity on the right-hand side is in Ampere/meter^{3}, while currents are in Ampere; and is the Dirac delta function. The Neumann boundary condition is applied for the earth-air interface; and a mixed boundary condition is adopted at infinite boundaries. The equation can be written as [10]where is the angle between the radial distances from the source point to the outward normal spatial coordinate on the outer boundary. Then, the forward mapping operator is defined as

If ground is chargeable, then the potential , which is induced by the application of a constant current will be different from . According to Siegel’s formulation [11], with the conductivity replaced by , the effect of the chargeability of ground can be modeled by direct current resistivity forward mapping operator . Then, apparent chargeability can be defined by equationwhere and are the apparent resistivity computed by the potential and , respectively. So the apparent chargeability can be computed by two DC resistivity modelings with conductivity and .

##### 2.1. Finite Difference Method

Usually, there are two ways to discretize the differential equation: the finite difference method and the finite element method. The finite difference method was applied for 3D forward modeling in the present study. The matrix form of (1) can be written aswhere is the coefficient matrix; is the potential vector; and is a vector containing the locations of the positive and negative current sources. There are many numerical methods for solving linear equations; of these, the ILU0 preconditioner and the BICGSTAB iterative method [12] was chosen to solve (5) in serial algorithm. The ILU0 preconditioner is an incomplete LU factorization with zero fill-in [13]:where is the lower triangular and is the upper triangular.

##### 2.2. Forward Test

In order to test the accuracy of the forward algorithm, we compared our result of using the proposed BICGSTAB method with results from the preconditioned conjugate gradient (CGPC) algorithm [14]. The model in the present study consisted of a 10 Ω·m rectangular prism measuring 40 m × 40 m × 50 m embedded in a 100 Ω·m homogeneous half-space. The top of the model was located 30 m below ground surface. Four data acquisition methods were used for recording pole-pole data: surface-surface, surface-borehole, borehole-surface, and borehole-borehole (Figure 1). Their responses are shown in Figure 2, which indicates that the proposed BICGSTAB algorithm produced highly accurate results.

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

#### 3. Inverse

Based on the result of resistivity result, nonlinear conjugate gradients IP inversion was adopted in this study since it constrains the model and is a nonlinear inversion method [15]. The objective function in the problem is in two parts:where is a vector of the observed data; is a vector of the calculated data obtained by the forward algorithm in the th inversion iteration; is the random error matrix; is the Lagrange multiplier that balances the effect of data misfit and model regularization during the iteration; is the a priori model parameters vector; is the model parameters vector at a given iteration; and is the second-difference Laplacian. The corresponding gradients of the objective function are expressed aswhere is the Jacobian matrix and is the vector containing data residuals. The initial search direction is obtained fromwhere is the precondition factor, written aswhere is a specific scalar. The search direction in the th inversion iteration can be defined:

The updated step size is then obtained from

The gradients and updated step are important parts of the inversion; and are calculated directly as the characteristics of nonlinear conjugate gradients inversion. From (4) and (5), the sensitivity matrix can be derived as

#### 4. MPI and GPU Parallel Programing

The main parallel computations in the NLCG inversion of 3D resistivity data are the forward and pseudo-forward modeling. Forward and pseudo-forward modeling of the same source is mainly done to form the coefficient matrix and the right-hand vector and solve the equation in a similar fashion to (5). Because forward and pseudo-forward modeling is independent for each source, MPI parallel computation is suitable for this calculation.

We developed a parallel procedure based on the 3D NLCG serial procedure by calling Linux system interfaces with Intel Fortran complier and used the principal-subordinate pattern for programming. The main process controls the assigning and sending tasks for every process and of reclaiming and assembling the results from other processes. The subordinate processes calculate the task sent from the main process and send the result back. 3D parallel inversion divides the 3D forward and pseudo-forward modeling tasks into several subordinate processes which are implemented in separate parallel execution nodes. The main process (NO. 0) reads all input parameters and broadcasts command messages to all subordinate processes. The subordinate processes finished calculation and sent result back to main process.

A workstation with a GPU card (NVIDIA Tesla C2075) is very fast for computation. Our parallel code was developed using CUDA (6.0) with the solver of CULA Sparse (S6) library. The BICGSTAB iteration method used in the serial code for solving linear equations was replaced by calling CULA Sparse library that provides iterative solvers for sparse systems. Therefore, the CULA iterative matrix solver is called during forward and pseudo-forward modeling in different processes. Table 1 shows the iteration times and runtimes of six different methods. These methods choose the same ILU0 preconditioner, double precision data type, and compressed sparse row (CSR) matrix storage format. “Iteration times” are the iteration times when achieving the convergence error 10^{−6}. “Overhead time” means the time that the memory transfers to GPU. “Preconditioner time” is the time for the preconditioner generative portion of the iterative solver. “Solver time” represents the time spent on iteration solver. “Total time” is the sum of overhead time, preconditioner time, and solver time. It is easy to know (from Table 1) that the CG method is the fastest one; the GMRES method spends the smallest number of iteration times. In our program, overall consideration, the CSR matrix storage format, ILU0 preconditioner, and CG iteration solver were chosen for solving linear equations.

The details of our parallel programming are shown in Figure 3. The flowchart of NLCG inversion shows that one forward modeling and two pseudo-forward modelings are conducted for each inversion iteration.

#### 5. Numerical Experiments and Discussion

##### 5.1. Examples with Synthetic Data

To reflect the influence of borehole data in the inversion, we carried out four field scenarios for a synthetic simple resistivity model consisting of a 10 Ω·m and 20% chargeability rectangular prism (40 m × 40 m × 70 m) embedded in a 100 Ω·m and 1% chargeability background. The top of the model is 30 m below ground surface (Figure 4). The input apparent chargeability with 5% Gaussian random noise was generated by applying 3D IP forward finite difference modeling algorithm for these four tests. The reference chargeability models were a homogeneous 1% half-space in the inversion. And all the four experiments were run using a fixed Lagrange multiplier used to balance the effect of data misfit and model regularization. When one source starts working, all the receivers will record using pole-dipole.

**(a)**

**(b)**

For the first test, 189 current electrodes and 189 receiver electrodes were distributed equally along nine survey lines on the surface (Figure 5). Data was recorded using left- and right-side pole-dipole (AMN and MNB) in this surface resistivity method. The minimum distance between each consecutive electrode was 10 m and maximum distance was 50 m, giving a total of 33,660 data points for the entire survey.

**(a)**

**(b)**

**(c)**

The vertical sections of resistivity and chargeability inversion result are illustrated in Figure 6 from direction. Both the resistivity and chargeability anomalies are located near the top of the block, not at its center, and the target values of anomalies are not recovered. Furthermore, the boundary of the anomalous body is not delineated very well, especially in the bottom of block.

**(a)**

**(b)**

In the second test, the first test was repeated, but with the original distribution of electrodes and receivers enhanced by 28 pairs of additional receivers placed at similar positions in two boreholes. Thus, for each source, 28 additional independent voltage measurements were taken from two boreholes, combining the surface and surface-borehole methods. The minimum distance between borehole electrodes was 10 m, and maximum distance was 50 m (Figure 7). The total number of data points was 38,952. Figure 8 shows the vertical sections of resistivity and chargeability inversion result. Although the anomaly is not located in the center of block, it is closer than in the first test. And the boundary of the anomalous body is delineated more clearly than the result of first test. Overall, the top of block is recovered better; because of the data from borehole, the anomaly is stretched in the -direction.

**(a)**

**(b)**

**(c)**

**(a)**

**(b)**

The third experiment combined the borehole-surface and borehole-borehole methods. The distributions of receivers were the same as in the second test, and 28 current electrodes were put in similar positions into two boreholes (Figure 9) and total 5,824 sets of voltage data were joined to the inversion calculation. Figure 10 shows improved inversion results compared to those from the first and second tests: the anomalous body appears at the center of the block, and its boundary is clearly delineated; furthermore, the bottoms of anomalies are relatively wider.

**(a)**

**(b)**

**(c)**

**(a)**

**(b)**

In the fourth and final test, all the receivers and current electrodes from test one to test three were used (Figure 11), combining the surface, surface-borehole, borehole-surface, and borehole-borehole methods. The entire experiment yielded 44,776 sets of synthetic data. The inversion captured the correct location of the block in the vertical sections and produced the best result of the four tests. Compared with second and third test, the boundary is more clearly seen in the vertical section (Figure 12). The magnitudes of the resistivity and chargeability values and the position of the anomaly matched the theoretical model reasonably well.

**(a)**

**(b)**

**(c)**

**(a)**

**(b)**

##### 5.2. Computation Efficiency

In order to test the GPU computation efficiency, the runtime of linear equation computation with GPU was compared with CPUs in the same method, processes, sources, and iteration times (see Table 2). Next, we tested the computation efficiency of the parallel algorithm of NLCG. Some four-way tests were carried out in two modes (MPI and MPI + GPU): (i) a parallel algorithm test using a single machine with one process; (ii) a parallel algorithm test using two machines with one process in each; (iii) a parallel algorithm test using three machines with one process in each; and (iv) a parallel algorithm test using four machines with one process in each. All the four tests with the grid size of 67 × 67 × 65 were computed by ten inversion iteration times. Table 3 shows the statistical runtimes for the prism model inversion for the four tests. “Runtime (hours)” is the time spent until all the processes stop. “Percentage of serial algorithm (%)” is the time spent by the parallel algorithm as a percentage of the time spent by the serial algorithm.

In Table 2, the runtime with GPU is about 3.8 times the runtime with CPU. Table 3 shows that the efficiency of the algorithm is greatly enhanced by MPI + GPU parallel computation, especially when using four machines. The runtime using MPI + GPU is about 3.8 times the runtime using MPI alone, with the same number of machines and processes.

#### 6. Conclusion

Based on the analysis of NLCG algorithm for 3D IP inversion with MPI and GPU parallel programming, we developed a parallel NLCG 3D resistivity inversion code for data generated by surface, borehole-surface, surface-borehole, and borehole-borehole IP methods. The inversion results show that data from the borehole improves the quality of the inversion and delineates the boundary of an anomalous body more clearly. The borehole data can constrain the bottom of the anomalous body very well; similarly, the surface data can constrain the top of block. We tested the parallel code using synthetic data and compared it with a serial procedure. This showed that our proposed parallel algorithm not only is effective but also greatly improved the speed of inversion. It also shows that the computing speed of a workstation with a GPU card (NVIDIA Tesla C2075) is faster than one with CPU cores. The tables demonstrate that the CG method of CULA library is suitable for double precision data type and symmetric positive definite coefficient matrix; the equation solved by the iterative method using GPU is faster than that with CPU; the parallel algorithm accelerates inversion in cases when high efficiency is required; furthermore, the use of larger numbers of machines enhances this effect.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

The authors would like to thank the support by National Natural Science Foundation of China (no. 41374078) and the Ministry of Land and Resources of Geological Survey Projects (nos. 12120113086100 and 12120113101300).