Abstract

This paper presents a novel efficient high-resolution two-dimensional direction-of-arrival (2D DOA) estimation method for uniform circular arrays (UCA) using convolutional neural networks. The proposed 2D DOA neural network in the single source scenario consists of two levels. At the first level, a classification network is used to classify the observation region into two subregions (0°, 180°) and (180°, 360°) according to the azimuth angle degree. The second level consists of two parallel DOA networks, which correspond to the two subregions, respectively. The input of the 2D DOA neural network is the preprocessed UCA covariance matrix, and its outputs are the estimated elevation angle to be modified by postprocessing and the estimated azimuth angle. The purpose of the postprocessing is to enhance the proposed method’s robustness to the incident signal frequency. Moreover, in the inevitable array imperfections scenario, we also achieve 2D DOA estimation via transfer learning. Besides, although the proposed 2D DOA neural network can only process one source at a time, we adopt a simple strategy that enables the proposed method to estimate the 2D DOA of multiple sources in turn. Finally, comprehensive simulations demonstrate that the proposed method is effective in computation speed, accuracy, and robustness to the incident signal frequency and that transfer learning could significantly reduce the amount of required training data in the case of array imperfections.

1. Introduction

Direction-of-arrival (DOA) estimation, as one of the crucial technologies in antenna array systems, has been widely applied in many fields, such as sonar, seismology, radar, and mobile communication [13]. The multiple signal classification (MUSIC) [4, 5] and estimation of signal parameter via rotational invariance technique (ESPRIT) [6, 7] are two classic conventional one-dimensional (1D) DOA estimation algorithms. Despite achieving high accuracy, the MUSIC algorithm suffers from high computational complexity. The estimation accuracy of the ESPRIT algorithm is not high enough even though it avoids the spectral peak search and requires less computation. In practical DOA estimation, signals are usually located in the stereo space. To determine the specific position of signals, it is necessary to estimate angles in at least two directions. Compared with 1D DOA estimation, 2D DOA estimation can provide more accurate signal location information, which is more practical and applicable. Employing a uniform circular array (UCA) can extend MUSIC and ESPRIT to 2D, namely, 2D-MUSIC [8, 9] and UCA-ESPRIT [10, 11]. However, in the 2D DOA estimation scenario, the computational complexity of conventional 2D estimation algorithms is much higher. Also, conventional DOA algorithms are inconvenient in practical application by virtue of the inevitable array imperfections.

In recent years, neural network-based methods have been extensively developed for improving the operation speed and adaptability of DOA estimation. In order to achieve 1D DOA estimation, references [1215] employ a uniform linear array (ULA). The former use the multilayer perceptron (MLP), and the latter support vector machine (SVM). Likewise, reference [16] achieves 1D-DOA estimation based on a Y-shaped array through using the radial basis function (RBF). To achieve 2D DOA estimation, both references [17, 18] employ a rectangular array. The former uses RBF, and the latter MLP. Also, reference [19] achieves 2D DOA estimation based on four ULAs by using RBF. The MLP, SVM, and RBF methods require the input feature of neural networks to be a vector, which might cause the input feature dimension to be excessively high. To avoid the problem, references [12, 1719] adopt the strategy of taking only the first row of the preprocessed array covariance matrix as the input feature vector while ignoring noise interference on each element of the covariance matrix. Therefore, reference [14] adopts the strategy of averaging diagonal elements of the preprocessed array covariance matrix to slightly reduce noise interference. Because the array covariance matrix is a Hermitian matrix, references [13, 15, 16] use half of the preprocessed array covariance matrix elements as the input feature vector to fully consider noise. After the employment of the strategy, the increase of the input feature dimension, however, complicates neural networks [20]. The weight sharing concept of convolutional neural networks (CNN), which has achieved great success in computer vision, can overcome the high dimension of input features. References [2123] use CNN to achieve 1D DOA estimation based on UCA and ULA, respectively, and obtain satisfactory results. The existing CNN-based DOA estimation methods are developed under the ideal condition without array imperfections. And the main thing is that they only focus on 1D and have limited ranges of practical applications. Therefore, the extension of the dimension of DOA estimation based on CNN is in necessity, and the proposal of an effective solution also becomes necessary in the case of array imperfections.

Motivated by the aforementioned analysis, we present a novel 2D DOA estimation method for uniform circular arrays (UCA) using CNN. To do this, we propose a 2D DOA estimation model consisting of three modules: preprocessing, 2D DOA neural network, and postprocessing. The preprocessing provides appropriate input features for the 2D DOA neural network. The 2D DOA neural network outputs the estimated elevation and azimuth angle. The postprocessing modifies the elevation angle output by the 2D DOA neural network. Moreover, in the inevitable array imperfections scenario, we also achieve 2D DOA estimation through transfer learning [24]. Besides, although the proposed 2D DOA neural network can only process one source at a time, we adopt a simple strategy that enables the proposed method to estimate the 2D DOA of multiple sources in turn. Finally, some numerical examples demonstrate the superiority and effectiveness of the proposed method.

The main contributions of this paper are as follows: (1) DOA estimation for UCA using CNN is extended from 1D to 2D; (2) the robustness of the proposed method to the incident signal frequency is effectively improved by simple postprocessing; (3) the feasibility of using transfer learning to reduce the amount of training data in DOA estimation is verified.

The remainder of the current study is organized as follows: Section 2 reviews the concept of CNN and defines the problem of interest; in Section 3, the 2D DOA estimation model using CNN for UCA is proposed; finally, Section 4 demonstrates simulation results, and Section 5 summarizes the conclusions and future work.

The main notations used in the paper are listed in Table 1.

Other terms used in the study follow the general notations unless otherwise stated.

2. Preliminary and Problem Formulation

2.1. Convolutional Neural Network

Generally, A CNN structure consists of convolutional layers, pooling layers, and fully connected layers [25]. The convolutional layer, as an essential part of CNN, is composed of convolution kernels. A convolution kernel connects an input image with a feature map, which can be used as the input of the next layer.

Figure 1 visually illustrates a convolution process. To ensure the input image and feature map are of equal size, the 8 × 8 input image is expanded to 10 × 10 by padding 0’s around it. The 3 × 3 convolution kernel slides on the input image with the stride of 1. The value aj,k in the j-th row and k-th column of the 8 × 8 feature map can be obtained, which can be expressed aswhere δ signifies a nonlinear activation function. b is the shared bias, and ωJj+1,Kk+1 the shared weight in row Jj+1 and column Kk+1 of the convolution kernel. denotes the value in the J-th row and K-th column of the input image after padding 0’s. Generally, a convolution layer has more than one convolution kernel. The number of convolution kernels is equal to that of feature maps. The channel number of a convolution kernel is equal to that of an input image.

Different regions of an input image can share the weights of convolution kernels, which is conducive to reducing network parameters and training networks. Compared with MLP, SVM, and RBF, CNN does not need to lengthen input features into vectors, which effectively overcomes the excessive dimension of input features.

2.2. Problem Formulation

This study employs UCA to achieve 2D DOA estimation account for UCA’s omnidirectional coverage and almost identical beam width [26]. A far-field stable electromagnetic signal s with known carrier frequency f, as Figure 2 depicts, impinges on a UCA of M identical omnidirectional elements. The elevation angle of the incident signal is θ (0° ≤ θ ≤ 90°), and the azimuth angle is φ (0° ≤ φ ≤ 360°). The radius of the UCA is R, and the angle between the m-th element and the first element is τm (τm = 2π(m − 1)/M). The phase reference point is located at the circle center O. The M × 1 steering vector can be expressed aswhere c denotes the propagation speed. The single snapshot M × 1 observation vector can be expressed aswhere n is the n-th snapshot, and s(n) denotes the signal, and e(n) signifies the M × 1 uncorrelated additive Gaussian noise vector. The M × N observation matrix can be expressed aswhere s and e denote the 1 × N signal vector and M × N noise matrix, respectively, and N is the number of snapshots. Apparently, x contains information about the power and angle of the incident signal, while it is interfered with by noise. The addressing issue here aims to establish the map from the observation matrix x to the elevation and azimuth angle by virtue of the proposed 2D DOA estimation model.

3. 2D DOA Estimation Model

As shown in Figure 3, the proposed 2D DOA estimation model consists of three modules: the preprocessing, 2D DOA neural network, and postprocessing. The input of the model is the observation matrix x, and the outputs are the estimated elevation angle and azimuth angle . The function of the preprocessing is to feed appropriate input features into the 2D DOA neural network. The 2D DOA neural network, as the core of the model, is composed of the classification network, DOA0–180 network, and DOA180–360 network. According to the classification network response 0 or 1, the input feature is fed into the corresponding DOA0–180 network or DOA180–360 network. The function of the postprocessing is to modify the estimated elevation angle , which is output by the 2D DOA neural network, so as to improve the robustness of the model to the incident signal frequency.

3.1. Preprocessing

The input of the preprocessing module is the observation matrix x of UCA, and the output is the feature appropriate for the 2D DOA neural network.

From the observation matrix x, the estimated array output covariance matrix Rxx can be expressed aswhere Rij and ( = 2πfRsinθ[cos(φ − τj) − cos(φ − τi)]/c) signify the element and phase, respectively. i signifies the i-th row, and j j-th column of Rxx. and denote the signal and noise power, respectively. The signal-to-noise ratio (SNR) is defined as . Rxx is a Hermitian matrix. Equation (5) indicates that the main diagonal elements of Rxx are real numbers, which are only determined by the signal power and noise power, while these elements do not contain any angle information. The modulus of the nonmain diagonal elements of Rxx is determined by the signal power, and the phase is determined by the signal frequency, array radius, and incident angles.

If Rxx is directly used as the input feature of the 2D DOA neural network, the input features may be inconsistent at the same incident angles due to the effect of signal and noise power. The inconsistency is inconvenient for training the neural network. First of all, the main diagonal elements of Rxx are replaced with 0’s to eliminate the signal and noise power. Then the modulus of nonmain diagonal elements is normalized to eliminate further the signal power and the interference of noise. Next, the real part of the lower triangular elements and the imaginary part of the upper triangular elements are used to extract the phase. Finally, the preprocessing module outputs an M × M real matrix as the input feature of the 2D DOA neural network. We use N − Rxx to signify the input feature, which is expressed aswhere and .

3.2. 2D DOA Neural Network Architecture

The input of the 2D DOA neural network is N − Rxx from the preprocessing, and the outputs are the estimated elevation angle to be modified by postprocessing and the estimated azimuth angle. The 2D DOA neural network consists of two levels. The first level is a classification network, and the second level is the parallel DOA0–180 network and DOA180–360 network. According to the azimuth angle degree, the first level aims to classify the observation region into two subregions, 0° < φ < 180° and 180° < φ < 360°. The first level response determines the corresponding DOA network of the second level, into which the input feature N − Rxx will be fed.

The development of such a two-level neural network is based on the following considerations. When the azimuth angle approaches 0° or 360°, the input features will be very close, and the output may jump in the case of developing a neural network without classification. That will make the neural network challenging to fit and increase the difficulty of training. The DOA estimation accuracy near 0° or 360°, as reference [27] illustrates, is not high enough. Reference [21] adopts cunning algebraic postprocessing to overcome the problem, but it might inevitably lead to calculation errors. The proposed 2D-DOA neural network overcomes this problem well, avoiding the output jump and calculation error.

Furthermore, the optimal architecture of the 2D DOA neural network cannot be developed in one step. We tried numerous different network architectures and then selected the network with the least parameters on the premise of ensuring the estimation accuracy.

3.2.1. Classification Network Architecture

In essence, the classification network can compress the dimension of input features from M × M to 1. To validate the feasibility, the range of the elevation angle θ is set from 0° to 90° (10° resolution) and that of the azimuth angle φ from 0° to 180° and 180° to 360° (1° resolution), respectively. A total of 3,620 pieces of data are sampled. Principal component analysis (PCA) is performed after preprocessing these data. Figure 4 displays the projections of the second, third, and seventh principal components. Noticeably, a classification network can be developed to classify φ into (0°, 180°) and (180°, 360°).

The classification network architecture, as Figure 5 depicts, consists of 5 layers. The first layer is the input one, where the input is N − Rxx. The second and third layers are convolutional ones, where the number of convolution kernels is set to 16 and 8, respectively. The convolution kernel size is set to 3 × 3, and the stride is set to 1. The padding mode is set to “same.” The fourth layer is a fully connected one with 16 neurons. The fifth layer is the output one with 1 neuron, and the output is 0 or 1. To be precise, 0 indicates the azimuth angle φ ∈ (0°, 180°), and 1 indicates φ ∈ (180°, 360°).

The first layer has no activation function. The second, third, and fourth layers adopt ReLU [28] as the activation function. The fifth layer adopts Sigmoid [29] as the activation function. The cost function adopts the cross-entropy, which can be expressed aswhere denotes the classification network response, and the ground truth label. m signifies the total number of samples, and L the number of layers of the network. ω denotes the network weights, and b the network biases. λ is the regularization parameter (λ = 0.0001).

In addition, when the azimuth angle is 0°, 180°, or 360°, the corresponding response may be 0 or 1 due to the interference of noise. In this case, N – Rxx may be fed into the DOA0–180 network or DOA180–360 network, but the uncertainty of the response does not affect the estimation accuracy of the proposed model.

3.2.2. DOA Network Architecture

In the case of φ ∈ (0°, 180°), the classification network outputs 0, and then N – Rxx is fed into the DOA0–180 network, while in the case of φ ∈ (180°, 360°), the classification network outputs 1, and then N – Rxx is fed into the DOA180–360 network.

Figure 6 shows the DOA0–180 network architecture, which consists of 12 layers. The first layer is the input one, where the input is N – Rxx. The second to ninth layers are convolutional ones, where the number of convolution kernels is set to 64, 64, 32, 32, 16, 16, 8, and 8 in turn. The convolution kernel size is set to 3 × 3, and the stride is set to 1. The padding mode is set to “same.” The tenth and eleventh layers are fully connected ones with 32 and 16 neurons, respectively. The twelfth layer is the output one with two neurons, which correspond to the estimated elevation angle to be modified and the estimated azimuth angle .

The first layer has no activation function, while the other layers adopt ReLU as the activation function. The cost function adopts the mean squared error (MSE), which can be expressed aswhere ω denotes the network weights, and b the network biases.

The DOA180–360 network architecture is the same as the DOA0–180 network architecture (no repetition to avoid redundancy).

3.3. 2D DOA Neural Network Training

The training sets, validation sets, and test sets used in this study are composed of simulated data. The simulation conditions are as follows: (1) the incident signal frequency is set to 500 MHz, and the number of snapshots 2,000; (2) the UCA element number is set to 8; (3) the UCA radius is set to 0.6 meter, which is equal to the electromagnetic wave wavelength of 500 MHz; (4) the signal amplitude is randomly generated to enhance the robustness of the neural network to the signal amplitude [30]. The network’s initial settings are as follows: (1) the weights are initialized with samples drawn from a truncated normal distribution centered on 0 with standard deviation of sqrt(1/fan_in), called lecun_normal [31], and the biases are initialized to 0’s; (2) Adam [32] is adopted in the backpropagation; (3) the minibatch size is set to 512. The classification network and DOA network are trained independently.

The training samples are discretized. We conducted numerous experiments and found that the denser the sampling, the higher the estimation accuracy. However, when the sampling density reaches a certain level, the estimation accuracy cannot be further improved. Therefore, the number of training samples should be minimized on the premise of ensuring the estimation accuracy. In addition, excessively small elevation angles would cause each element’s phase in the steering vector to approach zero, and the interference of noise may make the azimuth angle error larger. However, in this case, the signal direction is almost perpendicular to the z-axis of the Cartesian coordinate system in Figure 2, regardless of the azimuth angle. Furthermore, if the training sets contain excessively small elevation angles, the interference of noise may lead to the data that the input feature does not match the label, which will affect the convergence of the network. Consequently, the inception of elevation angles in the training sets and validation sets is not 0° but 1°.

3.3.1. Classification Network Training

The sampling setting of elevation angles is {1° :  1° :  4°, 5  :  5  :  90}. The sampling settings of azimuth angles are {0° : 1°: 4°, 5° : 5° : 175°, 176° : 1° : 180°} and {180°:1° : 184°, 185° : 5° : 355°, 356° : 1° : 360°}. The corresponding azimuth angle label is 0 and 1. The data is generated with the signal-to-noise ratio (SNR) of −10 dB and 20 dB, respectively. A total of 2 × 22 × 90 = 3,960 pieces of data constitute the training set. The validation set data is also 3,960. Its sampling settings are the same as those of the training set, but it is generated independently to ensure no duplicate data. The learning rate is set to 0.001, and the epoch 800. The regularization parameter and early stopping are used to prevent overfitting.

Figure 7 illustrates the variation of cost and accuracy in the training process. The accuracy of the training set and validation set cannot reach 100% because of the uncertain response when the azimuth angle is 0°, 180°, or 360°. However, the uncertain response does not affect 2D DOA estimation accuracy.

3.3.2. DOA Network Training

Figure 8 shows the variation of cost in the training process of the DOA0–180 network. The sampling settings of elevation angles and azimuth angles are {1° : 1° : 90°} and {0° : 2° : 180°}, respectively. The data is generated with the SNR of −5 dB and 20 dB, respectively. A total of 2 × 90 × 91 = 16,380 pieces of data constitute the training set. The validation set data is also 16,380. Its composition is the same as that of the training set, but it is generated independently to ensure no duplicate data. The learning rate is set to 0.001, 0.0001, 0.00005, and 0.00001 in turn, and the epoch for each learning rate is set to 400. The strategy of early stopping is adopted to prevent overfitting.

The training set, validation set, and training process of the DOA180–360 network are similar to those of the DOA0–180 network (no repetition to avoid redundancy).

3.4. Postprocessing

The input of the postprocessing module is the estimated elevation angle output by the 2D-DOA neural network, and the output is the modified estimated elevation angle . The purpose is to enhance the robustness of the neural network to the incident signal frequency. If the actual signal frequency is different from the training set signal frequency, the actual data and the training set data will not satisfy the same distribution, resulting in inaccurate DOA estimation. Only a few references take into account the robustness to the incident signal frequency. Reference [18] selects 17 frequency points in the range of 2.41 GHz to 2.47 GHz with a step of 3.6 MHz, and the frequency is taken as an input feature of the network. However, the training set would be expanded 17 times. Reference [21] enhances the frequency robustness of the network by randomly sampling from 100 MHz to 500 MHz in the training set. However, random sampling might cause the same input feature to correspond to multiple different incident angles. The current study utilizes simple algebraic postprocessing to effectively improve the frequency robustness of the 2D-DOA estimation model.

f and θ are set to represent the actual incident signal frequency and elevation angle, respectively. and represent the corresponding training set parameters. The four parameters satisfy

In this case the 2D-DOA neural network will estimate according to the input feature corresponding to rather than θ, based on the analysis of each element phase in equation (5). Consequently, the estimated value of should be modified. The modified elevation angle estimate is expressed as

If , there may not be that can satisfy equation (10) in the training set.

3.5. Multisource Scenario

The 2D DOA estimation model proposed above is only suitable for single source estimation, which will be greatly limited in practice. Theoretically, a neural network suitable for multisource scenarios can be trained as long as the training set is extended. However, because the required data will increase exponentially with the number of the sources, the approach may be difficult to implement. In this subsection, we adopt a simple strategy to achieve multisource 2D DOA estimation based on the proposed 2D DOA estimation model.

The array receiving model is as described in Section 2.2. If L signals impinge on the UCA, equation (5) should be revised aswhere and signify the l-th signal power and phase, respectively. θl and φl denote the elevation and azimuth angle of the l-th signal. First of all, the main diagonal elements are averaged to eliminate the interference of noise, and then the noise power is subtracted to obtain the sum of signal powers. Then the main diagonal elements are replaced with 0’s, and the modulus of nonmain diagonal elements is normalized by the sum of signal powers. Next, the real part of the lower triangular elements and the imaginary part of the upper triangular elements are fed into the 2D DOA neural network. Finally, the 2D DOA estimation of the signal corresponding to the maximum eigenvalue of Rxx is obtained.

In the next step, the component of the signal is removed from Rxx to obtain a new input feature, which is fed into the 2D DOA neural network again to obtain the 2D DOA estimation of the signal corresponding to the second largest eigenvalue of Rxx. By analogy, the 2D DOA estimation of the L signals can be obtained sequentially. In the multisource scenario, equation (6) should be revised, and the input feature Nl − Rxx of the l-th signal can be expressed aswhere and . represent eigenvalues of Rxx in descending order. The L larger eigenvalues can be used as an effective basis for determining the level of SNR [33]. The M − L smaller eigenvalues are approximately equal to the noise power , which can be expressed as . and have no specific meaning and are set to 0 for the convenience of calculation.

4. Simulation Results

In this section, first of all, the classification network response is presented. Then, Section 4.2 illustrates the 2D-DOA estimation model response, analyzing the effect of the SNR and number of snapshots, the robustness to the signal frequency, and the processing time. Besides, transfer learning is argued that it could be employed to reduce the amount of training set data in the case of array imperfections. Finally, we analyze the 2D DOA estimation in the multisource scenario. Unless otherwise stated, the simulation conditions are as described in Section 3.3.

4.1. Performance of the Classification Network

Figure 9 shows the classification network response to the incident signals with different frequencies. 50 points are randomly sampled in the range of SNR ∈ {−10 dB : 0.01 dB : 20 dB}, θ ∈ {1° : 0.01° : 90°}, and φ ∈ {0° : 0.01° : 360°} for each frequency.

Under different frequencies, the classification network response is 0 in the case of φ ∈ (0°, 180°), while the response is 1 in the case of φ ∈ (180°, 360°). Obviously, the classification network can well classify the observation region into two subregions.

4.2. Performance of the 2D DOA Estimation Model
4.2.1. 2D-DOA Estimation Model Response

To intuitively show the performance of the proposed method, the SNR is set to 5 dB, and 200 points are randomly sampled in the range of θ ∈ {1°: 0.01°: 90°} and φ ∈ {0° : 0.01° : 360°}. Figure 10(a) shows the response of these samples. Figures 10(b) and 10(c) show the correlation between the observed angle and estimated angle of the elevation and azimuth, respectively. The Pearson product-moment correlation coefficients (rppm) are 0.9999.

Furthermore, in order to verify the necessity of the classification network, we also train a slightly more parameterized DOA network without classification as a comparison. The root mean square periodic error (RMSPE) is defined as the evaluation criterion, which is expressed as

Compared with RMSE, RMSPE might better evaluate the 2D DOA estimation accuracy. It is exemplified in a sample: if  = 50°,  = 359°,  = 50°, and  = 0.1°, then the RMSPE is about 0.778°, while the RMSE is about 253.781°.

The SNR is set to 5 dB. The elevation angle is set to 45°, and the azimuth angle is set to {0° : 20° : 360°} in turn. For each azimuth angle, 500 Monte Carlo runs are performed. Figure 11 shows that the proposed method is superior to the method without classification, especially when the horizontal angle is 0° or 360°.

4.2.2. Effect of the SNR and Snapshot Number

To highlight the superiority of the proposed method, we now compare the proposed method with 2D-MUSIC, UCA-ESPRIT, and RBF. Both 2D-MUSIC and UCA-ESPRIT are classic conventional 2D-DOA estimation methods. RBF networks can approximate any nonlinear function and have an excellent convergence rate [17, 19]. Therefore, the three methods are selected as the baseline methods.

In this subsection and the remainder of this paper, the simulation settings of the three baseline methods are as follows. For the accuracy of 2D-MUSIC and search speed, the first search step is set to 1°, and then the second search is performed with a step of 0.01° within ±1° of the first step search result. For the accuracy of UCA-ESPRIT, the array element number is set to 19, and the maximum mode order 6. Three RBF networks replace the classification network, DOA0–180 network, and DOA180–360 network, respectively. The training sets and validation sets of the three RBF networks are the same as those of the proposed method. The spread of the training sets is searched in the range of [1, 10], and the desired MSE is searched in the range of [0, 5] to obtain the optimal performance of the validation sets.

Figure 12 reveals the relationship between the RMSPE and SNR of each method. SNR ∈ {−10 dB : 1 dB : 20 dB}. 500 points are randomly sampled in the range of θ ∈ {1° : 0.01° : 90°} and φ ∈ {0° : 0.01° : 360°} for each SNR. Figure 13 reveals the relationship between the RMSPE and snapshot number of each method with the SNR of 5 dB. N ∈ {100 : 100 : 2000}. 500 points are randomly sampled in the range of θ ∈ {1° : 0.01° : 90°} and φ ∈ {0° : 0.01° : 360°} for each snapshot number.

Figures 12 and 13 illustrate that the performance of the proposed method is even slightly better than that of the 2D-MUSIC algorithm. Although the 2D-MUSIC algorithm breaks through the Rayleigh limit and approaches the Cramer Rao bound, the estimates of 2D-MUSIC are still a discrete value related to the search step. However, the outputs of the proposed method are continuous. The proposed method based on CNN is also superior to RBF. Although RBF has a better ability of fitting function and fast convergence, its generalization ability in 2D DOA estimation is inferior to CNN.

4.2.3. Robustness to the Signal Frequency

Figure 14 illustrates the relationship between the RMSPE and signal frequency of each method (including the proposed method without postprocessing) with the SNR of 10 dB. f ∈ {100 MHz : 50 MHz : 500 MHz}. 500 points are randomly sampled in the range of θ ∈ {1° : 0.01° : 90°} and φ ∈ {0° : 0.01° : 360°} for each frequency.

2D-MUSIC, as Figure 14 shows, is superior to UCA-ESPRIT in the frequency range from 250 MHz to 500 MHz, while it is inferior to UCA- ESPRIT in the frequency range from 100 MHz to 200 MHz. It results from the fact that when the array radius is fixed, as the frequency decreases, it is equivalent to reducing the array aperture for 2D-MUSIC, which affects the sharpness of spectral peaks [34]. RBF is not robust enough to the frequency because of the poor performance of 2D DOA estimation. The error of the proposed method without postprocessing is excessively large. RMSPE has reached 7.40° when the incident signal frequency deviates from the training set frequency by only 10%, i.e., 450 MHz, resulting in the failure to normally estimate 2D DOA. The proposed method is superior to 2D-MUSIC algorithm in the range of 400 MHz to 500 MHz and inferior to 2D-MUSIC within the range of 100 MHz to 350 MHz, because the lower the frequency, the larger the in equation (10), which is equivalent to amplifying the estimation error of the proposed method. However, despite 80% frequency deviation of the training set, the RMSPE of the proposed method is less than 1°. Therefore, the proposed method has a certain robustness.

4.2.4. Processing Time

This subsection highlights the speed advantage of the proposed methods by comparing the processing time of each method. The computations are executed on a PC with Intel Core i7-9700K CPU and 16 GB DDR4 RAM. The processing time of the proposed method and RBF includes preprocessing, network running, and postprocessing time. Table 2 shows the results from 500 Monte Carlo runs.

4.3. Array Imperfections and Transfer Learning

In this subsection, we consider array imperfections unavoidable in practical applications due to antenna manufacturing, equipment environment, etc. Typical array imperfections include gain and phase inconsistence, sensor position error, and intersensor mutual coupling. Referring to reference [13], in this paper, the gain biases are set as

The phase biases are set as

The position biases are set as

The mutual coupling coefficient vector is set as

In equations (14)–(17), ρ ∈ {0 : 0.1 : 1}, and it is introduced to control the strength of the imperfections; for 2D-MUSIC, RBF, and the proposed method, a = 4 and b = 3; for UCA-ESPRIT, a = b = 9. In light of the array imperfections, equation (2) should be revised as

Figure 15 illustrates the relationship between the RMSPE and ρ of each method with the SNR of 10 dB. 500 points are randomly sampled in the range of θ ∈ {1° : 0.01° : 90°} and φ ∈ {0° : 0.01° : 360°} for each ρ. When ρ < 0.5, the proposed method performs best. However, with the increase of ρ, each method’s performance is degraded, and even these methods fail to estimate 2D DOA. Consequently, it is necessary to calibrate array imperfections.

Generally, conventional calibration methods lack adaptability and are challenging to model accurately [3537]. CNN-based methods are data-driven, and therefore they do not require prior assumptions about array imperfections. Furthermore, to reduce the required training data in practical application, this study adopts transfer learning to address the array imperfection problem. Transfer learning aims at improving the performance of target learners on target domains by transferring the knowledge contained in different but related source domains. In this way, the dependence on a large amount of target domain data can be reduced for constructing target learners [24]. All the weights and biases of the 2D-DOA neural network trained in Section 3.3 are not fixed, and the strategy of early stopping is used to prevent the possible overfitting caused by transfer learning.

According to the training set sampling setting of the classification network and DOA network in Section 3.3, the corresponding training sets are regenerated, respectively, in the case of ρ = 1. Table 3 shows three trained two-level neural networks, with the same structure as the 2D DOA neural network proposed in Section 3.2.

Figure 16(a) displays the relationship between the RMSPE and SNR of these three neural networks and 2D-MUSIC when ρ = 1. Figure 16(b) compares the RMSPE versus frequency. The test sample generation patterns in Figures 16(a) and 16(b) are the same as those in Figures 12 and 14, respectively. The three neural networks are superior to 2D-MUSIC, and CNN can effectively address the array imperfection problem. The performance of T-Network-20% is comparable to that of R-Network-100%. Their similar performance indicates that the amount of training data can be reduced through transfer learning in practical application.

In addition, we also studied the case that the data usage percentage is less than 20%. With the reduction of data volume, the accuracy of the network trained by transfer learning gradually decreases, but it is always higher than that of retraining mode with the same amount of training data. When the data usage percentage is more than 20%, the transfer learning mode with the increase of data volume can achieve the accuracy of R-Network-100% or T-Network-20% with fewer epochs.

4.4. Performance in the Multisource Scenario

In this subsection, we consider the multisource scenario. M eigenvalues can be obtained by eigenvalue decomposition of the UCA covariance matrix of M elements. In order to attain the input characteristic Nl − Rxx of the 2D DOA neural network, i.e., equation (12), at least one eigenvalue corresponding to the noise power must be guaranteed. Therefore, the maximum number of targets that can be estimated by the proposed method is M − 1. However, similar to the MUSIC algorithm, when the number of targets approaches the theoretical maximum M − 1, false negatives or false positives may occur. Assume that five stationary signals impinge the UCA in the range of 1° ≤ θ ≤ 90° and 0° ≤ φ ≤ 360°. 500 independent Monte Carlo experiments are performed in case of array imperfections (ρ = 1). T-Network-20% runs five times in each experiment. After postprocessing, the 2D DOA estimation of the five signals can be output in turn. For brevity, Figure 17 shows only the first and third outputs of each experiment. The RMSPE is 0.14° and 0.27°, respectively, which indicates that the estimation accuracy of the proposed method is high.

Furthermore, the influence of the angle distance between targets on the estimation effect is analyzed. The elevation angle θ1 and azimuth angle φ1 of the first target are set to [45° : −1° : 1°] and [90° : −1° : 46°] in sequence, respectively. The elevation angle θ2 and azimuth angle φ2 of the second target are set to [46° : 1° : 90°] and [91° : 1° : 135°] in sequence, respectively. Thus, the angle distance (θ1 − θ2 + φ2 − φ1) between the two targets is [2° : 4° : 178°] in turn. In case of array imperfections (ρ = 1), 500 Monte Carlo experiments are performed for each angle distance using T-Network-20%. Figure 18 reveals that the angle distance has little effect on the estimation accuracy.

5. Conclusions

In this paper, we have presented the 2D-DOA estimation model composed of three modules: the preprocessing, 2D DOA neural network, and postprocessing. The preprocessing effectively eliminates the signal power as well as the interference of noise, providing appropriate input features for the 2D DOA neural network. The 2D DOA neural network consists of the classification network, DOA0–180 network, and DOA180–360 network. The classification network divides the observation region into two parts according to the azimuth angle, which avoids the possible jump of the output when the azimuth angle is near 0° or 360°. The parallel DOA0-180 network and DOA180–360 network output the estimated elevation angle to be modified and the estimated azimuth angle. The postprocessing modifies the estimated elevation angle to enhance the robustness to the signal frequency. Besides, the feasibility of applying transfer learning to overcome array imperfections is also validated.

The experiments reached the following conclusions: (1) the proposed method is superior to 2D-MUSIC, UCA-ESPRIT, and RBF in the accuracy and operation speed, and it has certain robustness to the incident signal frequency; (2) the CNN-based approach can address the array imperfections problem, and the amount of training data can be reduced by means of transfer learning.

We should solve the following problems that still remain: (1) the implementation of transfer learning to train neural networks in the real environment; (2) the study of the robustness of neural networks when the actual signal frequency is greater than the training set signal frequency; (3) further improving the 2D DOA estimation method for multisource scenarios.

Data Availability

The simulation data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This work was supported by the National Research Foundation of Korea (NRF) funded by the Korea Government (MSIT) (2016R1A6A1A03013567 and 2018R1A2A2A14023632) and by the Korea Institute of Energy Technology Evaluation and Planning (KETEP) and the Ministry of Trade, Industry & Energy (MOTIE) of the Republic of Korea (20194030202300).