Abstract
An efficient algorithm is proposed to analyze the electromagnetic scattering problem from a high resolution head model with pixel data format. The algorithm is based on parallel technique and the conjugate gradient (CG) method combined with the fast Fourier transform (FFT). Using the parallel CGFFT method, the proposed algorithm is very efficient and can solve very electrically largescale problems which cannot be solved using the conventional CGFFT method in a personal computer. The accuracy of the proposed algorithm is verified by comparing numerical results with analytical Mieseries solutions for dielectric spheres. Numerical experiments have demonstrated that the proposed method has good performance on parallel efficiency.
1. Introduction
In recent years, there has been an increasing effort to achieve an efficient numerical analysis of largescale electromagnetic problems which usually require much computational time and large computer memory. An efficient numerical method for large and complex bodies is very important for many practical applications. The method of moments (MoM) [1] has become one of the most popular methods to compute the scattering problems in a variety of applications [2–6]. However, MoM requires memory usage and computational load to solve the matrix equation using the LU decomposition or Gaussian elimination, where is the number of unknowns. To reduce the computational time, CGFFT is employed to solve the MoM matrix equation, which is one of the most efficient ways to solve the volume integral equation for dielectric targets and reduces the computational complexity to in each iteration [7–10]. For the most practical EM problems, a regular computer cannot be sufficient for its limited available memory and performance. New developments of parallelprocessing techniques and highperformance computer (HPC) system give the chance of solving large problems that were unattainable in the past. To reach this point, it becomes more and more important that the development and parallelization of fast algorithms with highly parallel performance be able to benefit from large amounts of computational memory and parallel processors of HPC system [11–13].
In the past few decades, the energy absorption in human head exposed to radiofrequency (RF) electromagnetic radiation has brought about an increased concern for the possible consequences of electromagnetic radiation on human health. Many studies have been performed for calculating the power absorbed in a human body exposed to the electromagnetic (EM) field emitted by radiocommunication equipment [14–17]. In this paper, the EM scattering problem from a highresolution 3D anatomically realistic model of the human head was considered. The volume integral equations are applied to describe the problem. MoM is then used to discretize the coupled integral equations, and a CGFFT algorithm has been proposed to solve the resulting discrete linear system. And the parallelization techniques were applied to speed up the FFT calculation, vectorvector product, and matrixvector product during the process of CG iteration. The paper presents a deep review of the proposed parallel implementation of CGFFT algorithm with pulse base function. Different stages of the parallel algorithm were described, and its overall parallel performance was analyzed carefully. With this implementation, we have done a benchmark model test with more than 400 million unknowns and solved a practical EM scattering problem with more than 40 million unknowns using a HPC system which includes 27 nodes. Each node of the cluster has two Intel Xeon E5520 CPU and 12 GB memory and they are connected by 10 Gbps Ethernet high speed network. We have verified the accuracy and efficiency of the algorithm by comparing the numerical results with analytical results for dielectric spheres. Numerical results show that the proposed method has good parallel performance.
2. Theory and Methods
Consider a 3D dielectric object of arbitrary shape that is in homogeneous space which is characterized by relative permittivity ; we set the homogeneous space is free space . The arbitrarily shaped dielectric object with complex permittivity is inscribed by a cuboid . The time dependence of is assumed and suppressed. Under the illumination of the incident electric field, the total electric field inside the dielectric object can be determined through the following volume integral equation: where is the dyadic Green’s function in homogenous space, in which the corresponding elements are given by
The equivalent version for the induced current can be approximately obtained by where , and are the normalized electric current inside the dielectric object and the equivalent incident current, respectively.
A box with the size of is used to bound the considered dielectric target and is discretized into cuboidal cells. Then the volume of each cell is , where and is the division number in the direction. Choosing pulse function as the basis and testing function, we obtain the discrete forms of (4) as in which
We remark that the above formulations (6)(7) actually imply the scattering by small particles with the size of because of the use of pulse basis functions although the dielectric targets may be continuous. We can convert (6) into a linear system of equations where is an system matrix, is a column vector with the coefficients of the unknown currents, and is a column vector associated with the incident fields in the dielectric object. Here is the total number of unknowns. However, the inner products in (6) are all 3D summations of the products of discrete Green’s functions and discrete electric currents, which are quite time and memory consuming. For electriclarge electromagnetic problems, is very large and it is very difficult to solve (8) directly. In order to calculate fast the products of Green’s functions and electric currents, the discrete Green’s functions are extended in a larger computational domain as where ,
The signs of the expanded discrete Green’s functions are directly related to the even and odd nature of the components with respect to the coordinates in different extended subdomains. After defining the extended Green’s functions, the equivalent electric current can be defined in the extended domain by zero padding as
Using the convolution theorem and FFT method, we can obtain the discrete form of the integral equation (6) with FFT method [18, 19]: where are the discrete Fourier transform (DFT) of , respectively. Similarly, the corresponding adjoint operations can also be performed using FFT. As a consequence, we can solve (12) rapidly through the CGFFT algorithm [18]. In order to speed up the FFT calculation, the parallel FFT is used to obtain the FFT and inverse FFT results. In the proposed algorithm, both the FFT transform and the inverse FFT transform are implemented using the FFTW library, which is a subroutine library for computing the discrete Fourier transform in one or more dimensions and supports the distributedmemory implementation based on message passing interface (MPI). For example, to calculate the vectorvector product in the CGFFT method, which can be parallelized by call MPI_Allreduce() as shown in Algorithm 1.

3. Numerical Results
To illustrate the accuracy and efficiency of the proposed parallel CGFFT algorithm, we first consider the EM scattering by a dielectric sphere illuminated by plane waves, which has a closedform solution. In the following examples, the background is just free space. The dielectric sphere with m is illuminated by a plane wave. The incident wave is polarized in the direction and propagating in the direction, in which the operating frequency is 0.3 GHz. The comparison of numerical results of the internal electric fields between parallel CGFFT and analytical results is illustrated in Figure 1, which shows that the numerical results have good agreement with the analytical results. We have also computed the scattered electric fields from the dielectric object on the observation plane m and compared such results with the exact solutions as shown in Figure 2.
(a) 
(b) 
(a)
(b)
Then, we do the parallel performance testing on a HPC which has 27 nodes shown in Table 1, in which nodes are connected by 10 Gbps Ethernet. The benchmark model is a homogenous cubic dielectric object with , and the edge of cubic is 0.4 m. The incident wave is the same plane wave as that in Figure 1. We compare the network latency inside node and internode, which means that we test the network latency on one node and between two nodes, respectively. Figure 3 shows the testing results for internode and inside node. From Figure 3, we can see that the speed of network inside node is about 4 times of the inter node, which will be a bottleneck for the parallel CGFFT method. To evaluate the performance of the parallel CGFFT code, we define the performance as follows:
The parallel CGFFT methods performance testing result is demonstrated in Figure 4, and the detail data is listed in Table 2. From Figure 4, we can obtain that the performance goes up when no more than 8 nodes are used, and the performance goes down using 10 nodes. The reason is that the network latency plays an important role when we use more than 8 nodes. The parallel efficiency is also tested, which is defined as where is the number of processes, is the running time used by one process, and is running time used by processes. Figure 5 shows the parallel efficiency of parallel CGFFT with different discretization and processes, and the detail results are listed in Table 3. From Figure 5 and Table 3, we can see that the parallel efficiency is above 60% when no more than 8 nodes are used.
Finally, we use the proposed parallel CGFFT method to simulate EM scattering problem from 3D anatomically realistic human head model exposed to the plane wave working at 900 Mhz. The popular HUGO model [20] with a resolution of mm, as shown in Figure 6, includes 16 different tissues and organs. The electromagnetic properties ( and ) of 16 tissues in the model can be obtained from FCC published data [21], as listed in Table 4. In our simulation, 4 nodes are used and the computation time is about 65 minutes. Figure 7 shows the electric field on head surface. With the object oriented HUGO model, the field distribution over a specific object can be investigated. The electric field distributions on eyes, bone, and brain are demonstrated in Figures 8, 9, and 10, respectively.
4. Conclusion
In this paper, we have analyzed the performance of an efficient MPI parallel implementation of the CGFFT algorithm on HPC computers. In the proposed method, the codes can run not only on share memory systems machine but also on distributed ones, which present high scalability behavior. Special attention was paid to communications during the matrixvector product and vectorvector product, which are a key point for the parallel performance. We solved a problem with more than 400 million unknowns on a HPC including 27 nodes.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This work was supported in part by the National Science Foundation of China under Grant no. 61372057, in part by Natural Science Foundation of the Jiangsu Higher Education Institutions under Grant no. 10KJD180004, and in part by Postgraduate Innovation Project of Jiangsu Province under Grant no. CXZZ13_0973.