Analysis of Electromagnetic Propagation from MHz to THz with a Memory-Optimised CPML-FDTD Algorithm
FDTD method opened a fertile research area on the numerical analysis of electromagnetic phenomena under a wide range of media and propagation conditions, providing an extensive analysis of electromagnetic behaviour like propagation, reflection, refraction, and multitrajectory phenomena. In this paper, we present an optimised FDTD-CPML algorithm, focused in saving memory while increasing the performance of the algorithm. We particularly implement FDTD-CPML method at high frequency bands, used in several telecommunications applications as well as in nanoelectromagnetism. We show an analysis of the performance of the algorithm in single and double precision, as well as a stability of the algorithm analysis, from where we conclude that the implemented CPML ABC constitutes a robust choice in terms of precision and accuracy for the high frequencies herein considered. It is important to recall that the CPML ABC parameters provided in this paper are fixed for the tested range of frequencies, that is, from MHz to THz.
In the last 30 years, a wide range of absorption techniques in computational boundaries has been developed for FDTD as well as for other methods as finite and spectral elements. The main purpose of absorbing boundary conditions (ABCs) is to avoid spurious reflection of energy within the physical domain, at any incidence angle and operation frequencies, considering the thinner possible absorption zone. Boundary conditions for electromagnetic phenomena simulations are absolutely important as they allow to truncate the physical domain in the regions of interest, to simulate an infinite region without spending infinite computational resources (as RAM memory). A good choice of ABCs involves three key features: easiness of implementation, efficiency of the algorithm, and precision for long time periods. Actually, most solvers in the FDTD field implement PML ABCs , due to its extremely implementation easiness. Nevertheless, Berenger’s method heavily relies on the splitting of the fields in the boundary region, so for large domains, it constitutes a computationally expensive method. Convolutional PML  alleviates this feature due to the fact that it does not split any field in the boundary region, so the number of memory arrays for it to be implemented is reduced with respect to Berenger’s formulation.
Besides, Berenger’s PML ABC  is not fully capable of avoiding all spurious reflections [3, 4]. CPML ABC introduced by Komatitsch and Martin  is a convenient version of the convolutional PML for its implementation in different computational architectures, because it reduces the RAM memory usage in absorption regions, while providing an excellent precision. More recently, Martin and Couder-Castañeda introduced an unsplit convolutional perfectly matched layer (CPML) technique to efficiently absorb the compressible viscous flow without causing any numerical instability over long periods of time .
Nevertheless, most of the research done in the numerical propagation of electromagnetic phenomena with FDTD-CPML method has been focused on providing CPML ABC parameters to effectively absorb electromagnetic energy for relatively low frequencies (from Hz to MHz). Actually, FDTD studies for high and ultrahigh frequencies pose numerical difficulties as the memory and processing times required to resolve accurately the problem for such frequencies are still much higher than those provided by current computers, particularly when propagating in dispersive media [6, 7]. This problem has derived in two research directions. The first one has tried to use the recent progress in high performance parallel computing, such as those provided by the development of GPUs, [8–11] as well as multiprocessing coprocessors like Xeon Phi [12–16]. The other branch constitutes the extreme optimisation of current FDTD-CPML algorithms to be able to be successfully executed in current computers. This work belongs to the latter research branch.
In this paper, we particularly tackle the problem of drastically reducing the memory requirements at the absorption boundaries by optimising the CPML ABC developed by Komatitsch and Martin  for the Maxwell equations in FDTD method, in order to study its precision and accuracy in different propagation scenarios for the electric transverse propagation mode. We determine a set of CPML ABC parameters which effectively absorb all the energy in the boundary region for the whole frequency range from MHz to THz, which as to our knowledge has not been previously reported in literature. We tested the FDTD-CPML implemented method in single and double precision, in two different implementations of the algorithm, in order to minimise the memory usage as well as computing time.
This paper is organised as follows: in Section 2, we present the electromagnetic theory that gives rise to the FDTD method herein implemented. In Section 3, we introduce the optimised CPML ABC formulation we use, while in Section 4, we present numerical experiments: electromagnetic propagation in free space, with a parabolic reflector, in a coplanar nanowaveguide, as well as in common office indoors, along with their respective error analysis. In Section 5, we provide a performance study of the memory usage with double and single precision, as well as of computing time. Finally, in Section 6, we condense the results herein obtained.
2. Propagation Equations and FDTD Method
Electromagnetic phenomena are governed by Maxwell equations, which for a linear, homogeneous, and isotropic media, in the international system of units, are given by  where and are the electric and magnetic fields, respectively, and are permittivity and permeability constants that depend on the medium in which the electromagnetic field is propagated, and is the electric conductivity of the material. It is necessary to remark that the speed of light in such a medium is given by . The external current and electric charge densities, and , respectively, are the sources of electromagnetic field, and they can be functions of space and time, that is, and .
For the sake of simplicity, the system of (1), (2), (3), and (4) can be reduced to a bidimensional system, for instance to an electromagnetic propagation in plane, by confining the energy to propagate in such a plane (i.e., the Poynting vector). The resulting equations for the transverse electric mode (TE) in the absence of sources, and considering Cartesian coordinates, are thus
In order to cast (5), (6), and (7) in a second order in time and space FDTD method, the index notation for the considered staggered mesh shown in Figure 1 is followed. If time evolution is discretised with the index , then the finite differences version of such equations are, respectively,
3. CPML ABC Formulation
A very suitable version of the CPML ABC is given in ; this is because its implementation in a FDTD code without a PML ABC is simply obtained by replacing the spatial derivative , with , where is the so-called convolutional term of the CPML ABC—whose time evolution is the same as the time evolution of the other variables—and is the stretching function always greater than 1  and given by (13).
According to , can be implemented as a recursive sum in time, so we can update the memory variable of the ( or ) field in the direction for each time step as
For instance, the auxiliary equation for takes the form
The vectors , , and are estimated, within the absorption region, as where = , is the CPML thickness, , and is the number of layers of the CPML ABC. It is important to note that within the physical domain, , , and ; within the CPML where is the vacuum permittivity, is the damping function profile, is the shift in the frequency domain , and and are the degree of polynomial variation for and , respectively.
The vectors , , and are calculated in the intermediate steps , while the vectors , , and are calculated in the intermediate steps .
4. Numerical Experiments
We considered four different propagation scenarios: (i)Free space propagation(ii)Parabolic reflector(iii)Coplanar nanowaveguides (CNWG)(iv)Indoor propagation
To corroborate the quality of the numerical solution, it is necessary to determine the reflection error due to the presence of the CPML ABC, so reference solutions must be obtained for each of the applications previously mentioned. These reference solutions consist in performing the respective electromagnetic propagation with an extended physical domain, not allowing the waves to reach the boundaries of the extended domain to avoid spurious reflected waves to propagate within the extended domain.
An advantage of the usage of the presented CPML ABC consists in a drastic reduction of the number of memory arrays in the bidimensional algorithm , so it can be easily implemented for GPU processing, as those cards possess a limited amount of memory. The CPML ABC for the FDTD method requires to allocate the value of the time derivatives in a memory variable, which is implemented in two ways in this paper: the first, by allocating the memory variables for all the domain, and the second, by allocating them only in the absorption region (see Figure 2). Clearly, by allocating them just in the CPML region, we save memory, but with an increasing complexity in its implementation. By allocating the memory variables in all the domain, we require three nested cycles for calculation, but those cycles might be divided to save memory, producing eleven additional cycles to reach a total of fourteen nested cycles, which might decrease the performance of the vectorisation processes [19, 20].
The novel set of CPML parameters used for the herein reported numerical experiments are shown in Table 1. These parameters have shown to be optimised for the whole range of frequencies tested in this paper. They were obtained by successive numerical experiments as follows: clearly, the convolutional term of the CPML given by (11) fully depends on the vectors and ((14) and (15)), which in turn fully depends on , and ((13), (16), and (17), respectively,). Thus, the parameters to be adjusted to study the appropriateness and efficiency of the CPML ABC are those involved in calculating , , and , which from (13), (16), and (17), turn to be , , , , and . In order to find the values of the parameters reported in Table 1, it is better to start from those reported in . With those parameters fixed, we first varied the parameters regarding the vector for the range of frequencies herein considered until the parameters which do not distort field wavefronts are found. Then, we varied the parameters regarding the vector for all the range of frequencies until those which better absorb the incident fields are found.
The FDTD-CPML method was implemented in FORTRAN; beside its high speed for numerical algorithms, it eases the implementation because indexes might take positive or negative values, so they can start in (0,0) in the physical domain, so if the number of discrete points in the direction is IMAX and in the direction is JMAX, then the whole domain including the CPML might be labelled in the direction with and in the direction with , where , , , , and is the number of layers of the CPML ABC. The index notation can be observed in Figure 1.
Following the first implementation scheme, the flux diagram of the algorithm is shown in Figure 3, while the respective flux diagram by allocating memory variables just in the absorption region is shown in Figure 4. We compare the performance in both implementations for each numerical experiment presented in this paper.
The spatial increments and are estimated considering ( being the propagated wavelength), while is estimated with the Courant stability criterion . The sources of electromagnetic field are modelled as , where is the frequency and is the time step number.
For free space and parabolic reflector propagations, three operating frequencies were explored: 2.45, 5.00, and 20.00 GHz. The CNWG was tested at 100 THz while the indoor propagation was explored at a 2.45 GHz Wi-Fi frequency.
For each numerical experiment, results are presented as follows: snapshots of field propagation are shown at different time steps for both the CPML and the reference solutions; for reasons of the paper’s length, we only show snapshots for an operating frequency of 2.45 GHz (except for the CNWG, which is of 100 THz), provided their behaviour for the other operating frequencies is similar. We also perform an analysis of the energy decay within the physical domain when the source is turned off (at the middle of the simulation time). The global energy within the physical domain, at time step , is given by
To quantitatively verify the CPML performance, we set numerical viewers at the same positions in both the CPML and reference solutions for all the numerical experiments, to record the electromagnetic field in such points as a function of time. We then compare the recorded values of the viewers for both the reference and the ABC cases. For the reasons of the paper’s length as well as due to the likeness between them, only the viewer plots for 2.45 GHz are shown (100 THz for the CNWG).
4.1. Free Space Propagation
The physical domain set up is 1.25 m × 2.25 m while the reference (extended) domains are of 20 m × 20 m, 10 m × 10 m, and 5 m × 5 m for 20.00 GHz, 5.00 GHz, and 2.45 GHz, respectively, with the source located in the centre. Numerical viewer locations are shown in Figure 5. The results of the simulation for the CPML and reference solutions are shown in Figure 6, where the absorption in the CPML region is observed.
The comparison of the viewer-recorded values of electric field for both the reference and the ABC cases is shown in Figure 7. To perform the analysis of the energy decay, we turned off the source at time steps (the simulation lasted for 1160 time steps). In Figure 8, we show the behaviour of such an energy within the physical domain as a function of time, demonstrating that the CPML ABC accurately dissipates the energy introduced by the source.
In Table 2, we present the mean squared error between the viewer values of reference and CPML solutions.
4.2. Parabolic Plate Reflector
In this experiment, we propagated electromagnetic waves upon the surface of a parabolic antenna, in order to obtain the distribution of electromagnetic field in such a surface as well as the reflected field in the direction of the primary radiator, which is located in the focus of the parabola.
Both the physical and extended domains are of the same size than in the previous experiment. The parabolic plate is simulated to be made of silver (). The main radiator consists of a spherically symmetric source of electromagnetic field, within a hood feeder of the same material. The remaining computational domain is considered as free space. The considered geometry is shown in Figure 9, where the green dots represent the position of the numerical viewers.
The electromagnetic propagation for this experiment is shown in Figure 10. The behaviour of the electromagnetic field within the physical domain occurs as expected, and it features correctly the conversion from spherical to plane wavefronts. It can be clearly observed that the greater energy concentration occurs in the parabola’s focus, when it is illuminated by the plane wavefront. Figure 10 features four snapshots at different simulation times, 100, 400, 750, and 1160, for both the CPML and the reference cases. The source was turned off at time steps.
Viewers 1 and 3 are located in such positions (see Figure 9) to record the field of the plane wavefront generated by the parabolic reflector. Of particular interest are viewers 5 and 6, whose position is designed to record border effects. Viewer 6 records refraction phenomena occurred in the edge of the parabolic plate, due to the energy propagated on the parabolic surface. Viewer 5 accurately records the backscattered electromagnetic field in the backwards of the parabolic plate; despite of its low intensity, this effect has already been observed . Being the parabolic reflector of great interest for telecommunications engineering, its radiation pattern has been obtained in .
Figure 11 shows the behaviour of each of the viewers described above, comparing the values of the CPML and reference solutions. Figure 12 shows the energy decay after time steps (when the source is turned off), where it is possible to observe an adequate dissipation of energy within the computational domain, demonstrating the efficiency of the implemented CPML ABC.
Table 3 presents the mean square error for each viewer, between the CPML and the reference solutions.
4.3. Coplanar Nanowaveguide
In this section, we propagate electromagnetic waves with frequencies lying in the THz band. The basic structure of the CNWG consists of three metallic rectangular parallel plates, a central one and two at its sides, equidistantly separated from the central plate. The plates are placed on a squared dielectric substrate.
Figure 13 shows the configuration of the waveguide, which is characterised at a frequency of 100 THz. The three silver plates are located over a dielectric squared substrate of side m, while the size of the plates is of m long by m width. The separation between the plates is m. The orange dots represent sources and are located at a distance of from the plates. The thickness of the substrate is m. We located strategically four numerical viewers (green dots) within the physical domain, so to record electromagnetic field in such positions. Two are located between the plates to analyse the behaviour of the field within the waveguide, while the other three are located outside the waveguide. The sources were located strategically so to obtain the better propagation of the waves along the waveguide (see Figure 13). For the CNWG, the domain is extended from [, ] m to [, ] m.
Figure 14 shows electric field propagating through the CNWG. We observe the field propagates symmetrically along the waveguide. TE mode provides low dispersion in CNWGs, so it is optimal to build circuits and wideband electronic components. Figure 14 presents three snapshots at different time steps, for both the CPML and the reference solutions.
4.4. Indoor Propagation
In this numerical experiment, we propagate Wi-Fi waves () in an indoor setup of two contiguous offices, based on the experiments reported in . We characterised the different materials that are usually found in office indoors such as metals, wood, glass, and plastics, by their conductivity, permeability, and permittivity. Ten different materials were identified and characterised and are shown with a colour code in Figure 16, which is featured along with the corresponding values of their physical parameters in Table 5. For this experiment, the domain is of 3.728 m × 5.696 m. Due to the complexity and the size of physical domain with respect to the propagated wavelength as well as due to RAM requirements, we did not perform the reference solution.
Figure 17 shows the electric field through four snapshots at different time steps. In order to probe the quality of the CPML ABC, its stability is tested by extending the time domain up to 100,000 time steps and by turning off the source at . With this numerical experiment setup, we show in Figure 18 the energy decay due to the presence of the CPML ABC, demonstrating its correct operation for large time domains.
In Figures 19 and 20, we show the behaviour of electric field () recorded at selected viewers in both offices, respectively. As it can be clearly observed, after the source is turned off, the field clearly decreases as a function of time, fact that confirms the stability of the ABC herein implemented, for both long time domains and a wide range of propagating materials. As it can be observed in such figures, the six viewers show a very interesting feature; notwithstanding its positions on the indoor setup, for early times in the simulation, they show the transients before arriving to a quasistationary field propagation status. Such transients are also present when the source is turned off at 50,000 time steps, arriving to another quasistationary state of no field, after the energy is dissipated by the CPML ABC.
Viewers in the lower office (viewers 1, 2, and 3) exhibit greater values in the recorded electric field values, mainly due to the fact that the source is located within this office, so the field is more intense within this part of the propagation domain. The first viewer shows that it receives the first wavefront coming directly from the source, which travels and leaves the viewer number one. But after certain time, the viewer records the arriving of further wavefronts coming from some edges that scattered waves towards the viewer 1 direction, as well as some reflected waves entering into this region.
As viewer 2 is located in a free space region, adjacent to two boundary regions with a CPML ABC, the amount of scattered and reflected waves arriving at this viewer is very much lower, which is clearly shown in the fact that the field recorded by this viewer is more regular than the field recorded by viewers 1 and 3. This is not the case for viewer 3, which is closer to several edges and material interfaces, so it is subject to record wavefronts at different time steps, coming from different scatter and reflection processes.
For the case of viewers 4, 5, and 6, located in the upper (second) office, the behaviour of the recorded field is different. These viewers clearly record lower values of the electric field , as the waves arrive attenuated to them. Particularly, viewer 5 behaves as viewer 2, provided that it is surrounded by two CPML ABCs, which prevents most of scattered and reflected waves to arrive to it. Viewers 4 and 6 show clearly different wavefronts arriving from different processes of reflection and scattering of waves from the surroundings.
5. Performance Tests
To test the performance of the CPML ABC herein implemented, we programmed a conventional version (algorithm shown in Figure 3) and our memory-optimised version (algorithm shown in Figure 4), running the experiment with single- and double-precision variables, as well as comparing different compilers.
Usage of single-precision variables yields to saving almost 50% of RAM memory as well as about 30% of processing time. Nevertheless, it might introduce numerical differences of about .
We compared the memory saving of our memory-optimised version against the conventional one. For instance, for the parabolic reflector experiment for a propagation frequency of 2.45 GHz, the memory-optimised version requires 14.4 MB and 7.3 MB for double and single precision, respectively. The not-optimised version uses 16.7 MB and 8.7 MB in double and single precision, respectively. The version of the algorithm that uses splitted ABC (Berenger’s PML ) used 17.6 MB and 9.0 MB in double and single precision, respectively. We can observe a memory saving of 14% with respect to the nonoptimised version, as well as an 18% with respect to the nonsplit version, for double precision. For single precision, we save 50% for all the cases. Nevertheless, the more the discrete points, the memory saving is about 30% with respect to nonsplit version and about 40% with respect to the split version.
Table 6 shows the RAM memory allocated by the more memory-demanding experiments. As it can be observed, there is a memory saving of 18.4%, 18.3%, and 15.6% with double-precision variables for free space, parabolic reflector, and CNWG, respectively, while for the single-precision mode, the memory savings are of about 18.4%, 18.3%, and 15.8%, respectively.
Therefore, the saving memory version saves at least 15% of RAM memory. This percentage increases as the computational domain increases too. Single precision offers more memory saving, although its precision limits to seven digits, so it could introduce numerical errors. In Figure 21, we show the absolute error produced by the usage of single-precision variables with respect to the double-precision ones; the parabolic reflector, the CNWG, and the indoor propagation numerical experiments absolute errors are featured in Figures 21(a), 21(b), and 21(c), respectively.
(a) Parabolic reflector numerical experiment. L2 norm error is of 7.403291 × 10−07 (2.45 GHz, snapshot at time steps (7.09 s))
(b) CNWG numerical experiment. L2 norm error is of 2.630922 × 10−06 (100 THz, snapshot at time steps (1.740295218690000 × 10−04 s))
(c) Indoor propagation numerical experiment. L2 norm error is of 3.0817 × 10−11 (100 THz, snapshot at time steps). It can be clearly observed that for such long time periods, single-precision version is numerically stable
To test computing time in serial mode, we performed experiments on a quad-core Intel Core i5-4670 at 3.40 GHz (without HT), over a Linux kernel 3.13.0-65-generic SMP operating system. Comparing memory-optimised versus conventional versions, there is no observable difference in computing times. Nevertheless, some differences between different compilers are observed. For the parabolic reflector at 20 GHz, by compiling with gfortran 4.9.2 with optimisation and vectorisation flags activated, we obtain computing times of 10 m, 55 s and 04 m, 36 s for the memory-optimised version in double and single precision, respectively. By compiling with PGI FORTRAN, we obtain computing times of 11 m, 02 s and 05 m, 13 s for the conventional version for double and single precision. The difference in computing times between the compilers is negligible; actually, memory-optimised version generates more cycles, providing a similar performance.
Table 7 features computing times using the memory-optimised version for double and single precision, for the different numerical experiments herein performed, where we can observe a computing time saving of about 40%.
We analysed the behaviour and performance of CPML ABC version by Komatitsch and Martin , for different frequencies in the band from MHz to THz, observing an excellent absorption and energy dissipation and lacking spurious reflected waves being propagated within the physical domain. Moreover, we found ad hoc CPML parameters that work perfectly for all numerical experiments herein performed, for the whole range of explored frequencies. It ought to also be remarked that this CPML ABC implementation features stability for long time periods. Our implementation of the FDTD-CPML algorithm provides at least a 15% of memory saving versus a conventional implementation, without sacrificing performance. We also find that single-precision mode saves about 50% of RAM memory and computing time, without introducing sensitive numerical error due to the usage of single-precision variables.
Furthermore, although for the sake of clarity, our implementation of the herein developed algorithm was done in 2D, it can be straightforwardly extended to 3D just by building the vectors , , and in (13), (14), and (15) carefully in each boundary of the problem for them to correspond to the absorbed physical field on each boundary, taking into account their corresponding physical parameters. It ought to be remarked that as our implementation of the CPML ABC saves memory through the drastic reduction of the allocated arrays with respect to conventional PML ABC implementations, the achieved optimisation in 3D could be of at least 64% .
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
The authors acknowledge EDI grant by SIP/IPN. This research was partially supported by projects SIP 20160105, 20171536, 20170721, and 20171027 and IPN. The authors also acknowledge FS0005 computational time grant at FSF by CDA/IPN.
R. Martin and C. Couder-Castañeda, “An improved unsplit and convolutional perfectly matched layer absorbing technique for the navier-stokes equations using cut-off frequency shift,” Computer Modeling in Engineering and Sciences, vol. 63, no. 1, pp. 47–77, 2010.View at: Publisher Site | Google Scholar
R. Vuduc and K. Czechowski, “What GPU computing means for high-end systems,” System, vol. 8, p. 10, 2011.View at: Google Scholar
J. Gomes, G. Teodoro, A. Melo, J. Kong, T. Kurc, and J. Saltz, “Efficient irregular wavefront propagation algorithms on Intel(R) Xeon Phi(TM),” in 2015 27th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD), pp. 25–32, Florianopolis, Brazil, 2016.View at: Publisher Site | Google Scholar
G. Civario, S. Delaney, and M. Lysaght, “Preparing a seismic imaging code for the Intel Knights Landing Xeon Phi processor,” Advances in Parallel Computing, vol. 27, pp. 585–590, 2016.View at: Google Scholar
J. D. Jackson, Classical Electrodynamics, John Wiley & Sons. Inc., New York, NY, USA, 1999.
M. Arroyo, C. Couder-Castañeda, A. Trujillo-Alcantara, I.-E. Herrera-Diaz, and N. Vera-Chavez, “A performance study of a dual Xeon-Phi cluster for the forward modelling of gravitational fields,” Scientific Programming, vol. 2015, Article ID 316012, 14 pages, 2015.View at: Publisher Site | Google Scholar
C. Couder-Castañeda, H. Barrios-Piña, I. Gitler, and M. Arroyo, “Performance of a code migration for the simulation of supersonic ejector flow to SMP, MIC, and GPU using OpenMP, OpenMP+LEO, and OpenACC directives,” Scientific Programming, vol. 2015, Article ID 739107, 20 pages, 2015.View at: Publisher Site | Google Scholar
C. Couder-Castañeda, C. Ortiz-Alemán, M. Orozco-Del-Castillo, and M. Nava-Flores, “TESLA GPUs versus MPI with OpenMP for the forward modeling of gravity and gravity gradient of large prisms ensemble,” Journal of Applied Mathematics, vol. 2013, Article ID 437357, 15 pages, 2013.View at: Publisher Site | Google Scholar