#### Abstract

Unmanned vehicles are widely used in industrial scenarios; their positioning information is vital for emerging the industrial internet of thing (IIOT); thus, it has aroused considerable interest. Cooperative vehicle positioning using multiple-input multiple-output (MIMO) radars is one of the most promising techniques, the core of which is to measure the direction-of-arrival (DOA) of the vehicle from various viewpoints. Owing to power limitations, the MIMO radar may be unable to utilize all the antenna elements to transmit/receive (Tx/Rx) signal. Consequently, it is necessary to deploy a full array and select an optimal Tx/Rx solution. Owing to the industrial big data (IBD), it is possible to obtain a massive labeled dataset offline, which contains all possible DOAs and the array measurement. To pursuit fast and reliable Tx/Rx selection, a convolutional neural network (CNN) framework is proposed in this paper, in which the antenna selection is formulated as a multiclass-classification problem. Herein, we assume the DOA of the vehicle has been known as a prior, and the optimization criterion is to minimize the Crame´r–Rao based on DOA estimation when we use the selected Tx/Rx subarrays. The proposed framework is flexible and energy friendly. Simulation results verify the effectiveness of the proposed framework.

#### 1. Introduction

Industrial internet of thing (IIOT) is acknowledged as the trend of the manufacturing industry [1], which aims to promote product innovation, improve operation level, and expand novel business models. In industrial logistics, unmanned vehicles are widely used. For safe driving purpose, it is important to enable the vehicles to the internet-of-vehicle (IOV). IOV is an important branch of IIOT [1]. Vehicle positioning is one of the most important tasks in IOV that has gained extensive attention in the past decades. Several frameworks have been put forward, the most commonly used method rely on the global positioning system (GPS). However, the high latency reduces the implementation potential of GPS technique for the IOV. Moreover, the line-of-sight transmission property of the GPS signal makes it unavailable in tunnels, and it may fail to work due to cloud cover. To develop a robust and reliable vehicle positioning system, some advanced sensors have been investigated, for instance, cameras, lidars, and radars [2]. On the basis of comprehensive consideration of cost, latency, and reliability, the radar approaches are promising. Generally, the vehicle position can be measured from four principles in the radar approaches [3, 4], i.e., radio-signal strength (RSS), time-of-arrival (TOA), time-difference-of-arrival (TDOA), and direction-of-arrival (DOA). Nevertheless, the RSS techniques are difficult to accurately obtain vehicle position due to the complexity of the wireless channel. The TOA and TDOA approaches rely on latency measurement, but it is usually very hard to obtain high-accuracy time difference. The DOA approaches are appearing, since the DOA measurement has adequate accuracy by exploiting an antenna array [5].

The concept of DOA is shown in Figure 1, in which the radar nodes measure the angle of the incoming radio signal from various viewpoints. To achieve super angle resolution, an antenna array is employed to transmit/receive (Tx/Rx) signal at the radar node. Usually, antenna elements are placed into uniform sharps, e.g., uniform linear array, uniform circular array, and uniform rectangular array, thus spatial Nyquist sampling is available. By cooperating with the existing estimation algorithms [6], e.g., multiple signal classification (MUSIC), the estimation method of signal parameters via the rotational invariance technique (ESPRIT), tensor estimator, the DOA can be easily achieved. As is known to us, angular resolution of a uniform array can be improved by increasing the array aperture at the expensive of higher cost, larger physical area, and additional computational burden. To mitigate this issue, several nonuniform alternatives have been proposed, such as minimum redundancy arrays, coprime array, and nested array. These arrays declare to achieve more degree-of-freedom, but the antenna elements are fixed. To pursuit radar cognition, reconfigurable circuitry is required, e.g., waveform agility and array adaptively. To this end, the compressive sensing concept was introduced [7], in which Tx/Rx is randomly chosen from a full array. The DOA can be accurately recovered with high probability via solving an optimization problem, i.e., DOA estimation can be linked to the popular sparse representation methods, e.g., [8]. Nonetheless, such approach is hard to achieve an optimal array manifold since it is agnostic to current target scenario. A adaptive framework was investigated in [9–11], the goal of which is to minimize the determinant of the estimation error covariance matrix, and it was treated as a convex optimization problem. In [12], the sensors were dynamically adjusted via minimizing the Cram´er–Rao bound (CRB) on parameter estimation, and the greedy search algorithm was adopted. Another reconfigurable receive array selection framework was addressed in [13], in which the conditional Bobrovski–Zakai bound (BZB) on DOA estimation was chosen as a performance metric. A reconfigurable Tx/Rx pair methodology was proposed for the multiple-input multiple-output (MIMO) radar [14]. Similarly, the selection task is solved via greedy search algorithm. A common feature the approaches in [9, 14] share is that a mathematical optimization problem needs to be solved, which is time-consuming.

With the explosive growth of industrial sensors, tremendous complex real-time data can be obtained from the physical and man-made environments, which leads to industrial big data (IBD). The IBD provides an unprecedented opportunity to facilitate data-driven prediction techniques for array selection. A few works have interpreted the antenna selection problem to multiclass-classification learning. To realize different type classifications, various classifiers are available, e.g., support vector machine (SVM), k-nearest neighbor (KNN), decision tree (DT), multilayer perception (MLP). In [15], a SVM architecture was presented, which links antenna selection to the supervised machine learning. Inspired by the artificial neural networks, the convolutional neural networks (CNN) become the most popular generative models in machine learning [16, 17]. CNN is a powerful tool, since it is skilled in automatic feature extraction from massive data. It has been proven that its superiority is in extensive aspects, such as text and voice recognition, image processing, and industrial manufacture. More recently, the deep learning network was adopted for antenna selection in [18]. In their work, a CNN is trained offline, in which the real part, the image part, and the angle of the covariance matrix were used as the input of the CNN; the object function is to minimize the error bound on DOA estimation (e.g., BZB and CRB). Compared with the SVM approach, the CNN method provides more accurate and faster classification performance. Similar to [18], the CNN approaches have been exploited for Tx/Rx selection in MIMO communications [19]. Unlike MIMO communications, MIMO radars illuminate an area of interest via emitting diversity waveforms and receive the echoes using multiple antennas. Owing to the noncooperative operation mode, a MIMO radar needs much more transmit power than a MIMO communication system. In practical IOV network, however, most of the radar nodes are far away from the power grid. Usually, the radars are powered by solar panels. Therefore, the transmit power of the radar system is limited. The closest prior study to our work is [18], but it is only suitable for receiving antenna selection in passive radars. Moreover, the training of CNN is inefficient.

In this paper, we investigated the problem of optimal Tx/Rx selection for vehicle positioning in IIOT. The object is to minimize the CRB on MIMO radar DOA estimation error with limited transmit power. The Tx/Rx selection issue is treated as a multiclass-classification problem, and a CNN-auxiliary framework is proposed. Instead of inputting the real part, the image part and the angle of the covariance matrix, the amplitude values and phase values of the covariance matrix are identified as the input of the CNN. Simulation results show the proposed framework offer faster and more accuracy selection performance. The main contributions of this paper are illustrated as follows:(a)A cooperative vehicle positioning architecture relying on DOA estimation is presented. The core is to measure the DOA of the vehicle via MIMO radar nodes. Combined with the location information of the nodes, vehicle position can be accurately recovered. Unlike the RSS approaches, the TOA approaches, and the TDOA approaches, the DOA approaches are insensitive to the environment. Benefiting from the virtual aperture of the MIMO radar, high-precision DOA estimation can be easily obtained.(b)A practical scenario that the MIMO radar with limited power is considered. We assume the target vehicles are slow-moving so that their DOA information area is prior to the radar nodes. To pursuit efficient and high-accuracy vehicle positioning, we need to choose an optimal Tx/Rx pair from a full array. The antenna selection is transformed into a multiclass classification(c)A CNN-auxiliary framework is proposed for fast classification. To obtain the training data, the greedy search strategy is adopted; the optimal Tx/Rx pair is accepted if the minimum CRB is obtained. Thereafter, the upper triangle measurement of the array covariance matrices is designated as the input of CNN to indicate the well-chosen Tx/Rx pairs. Finally, the CNN is utilized to Tx/Rx selection online.

The paper is organized as follows. In Section 2, the problem of vehicle positioning using the colocated MIMO radar is formulated. In Section 3, the derivation of CRB is given. In Section 4, the details of the proposed CNN framework are described. In Section 5, the simulation results are outlined. Finally, a brief conclusion is given in Section 6.

*Notations*. Lowercase italic letters, e.g., *a*, boldface lowercase letters, e.g., **a**, and boldface capital letters, e.g., **A**, are reserved for scalars, vectors, and matrices, respectively; the superscripts, , , and stand for the operations of transpose, Hermitian transpose, inverse and pseudoinverse, respectively; , ⊙, and represent, respectively, the Kronecker product and the Khatri-Rao product and the Hadamard product; is to get the mathematical expectation; denotes the vectorization operation; denotes the identity matrix; denotes the combination of selecting r terms out of . accounts for the block diagonal matrix with the diagonal blocks in the bracket.

#### 2. Problem Formulation

The architecture of the proposed vehicle positioning system is illustrated in Figure 2. The MIMO radar nodes are fixed at the roadside with known position, and they are connected with the cloud platform using low-latency optical fiber. The position of the target vehicle barely changes during consecutive scans. In the first scan, a full Tx/Rx array is utilized to detect the vehicle position. The covariance matrix of the MIMO radar node is fed to a CNN to find an optimal Tx/Rx pair for the next scans. All the nodes are well synchronized, and the measured DOAs are uploaded to the cloud to calculate the vehicle position.

Now, we consider a colocated MIMO radar scenario, as depicted in Figure 3. Each radar node is equipped with *M* transmit antennas and *N* receive antennas, both of which are uniform linear arrays (ULA) with the same interelement distance *d*. Taking the first Tx/Rx antenna as reference element, the position sets of the Tx/Rx are given by

Due to power limitation, we can only choose transmit antennas and receive antennas from a full Tx/Rx array (the reference elements are enforced to be chosen). Thus, the position sets corresponding to the selected Tx/Rx are and , respectively. Suppose that *K* target vehicles are appearing the far-field of the radar nodes, thus they can be regarded as point targets. Let denote the DOA of the vehicle, and let and denote the steering vectors corresponding to the selected transmit array and the selected receive array, respectively. The element of and the entity of are respectively given bywhere *λ* denotes the carrier wavelength and and are the and entities of and , respectively. Assume that the selected transmit antennas emit mutual orthogonal pulse waveforms . For any , there existswhere *t* accounts for the fast time index, *T*_{p} denotes the pulse duration, and stands for the Kronecker delta. The echoes received by the array can be written aswhere is the pulse index, denotes the reflection coefficient of thevehicle, denotes the transmit waveform vector, and **w**(*t*) is the noise vector, which is the Gaussian white with variance , i.e.,

Matching with yieldswhere is the virtual response matrix with the virtual vector given by and is the reflection coefficient vector. denotes the matched array noise. According to [6], is still a Gaussian white with variance , i.e.,

Consequently, the covariance matrix of can be expressed aswhere accounts for the covariance matrix of the reflection coefficient. In the presence of *L* samples , **R** can be estimated via

To estimate the DOA from or its covariance matrix **R**, thousands of estimations are available. Typical algorithms include MUSIC, ESPRIT, PM, and tensor-aware approaches [20–24]. Besides, additional information can be exploited to improve the estimation accuracy [25–30]. How to estimate the DOA is an interesting topic, but beyond the scope of this paper. Once DOA of the target vehicle is obtained, the position information can be accurately recovered via solving an inverse problem [3]. In this paper, we only focus on how to select the optimal Tx/Rx pair.

#### 3. CRB Derivation

CRB provides a lower bound for unbiased parameter estimation. It is usual to evaluate the parameter estimation accuracy. In what follows, we will show how it is derived. Firstly, we rewrite (6) aswhere , and . Next, we construct a vector . Suppose that are deterministic but unknown to the MIMO system. Then, the mean and covariance matrix of **y** arewhere , . Next, let us define the following vectors , . The unknown parameter vector can be formulated as . According to [31], the CRB matrix for is given bywhere .

We now focus on each part of **Ψ**. Firstly, it is to find that

Step further, we have withwhere is the kth element of . Therefore, . Furthermore, we can obtain

Since we are only interested in the CRB on DOA estimation, by means of diagonalization, we can extract those counterparts from **J**. Definethanks to the nonsingular property of , is valid. Now, we define

It can be found that

Define as the orthogonal projection of onto null space

Obviously, . Then, we havewith , . Based on the properties of a partitioned diagonal matrix, we obtainwhere stands for the irrelevant part. Inserting (21) and (15) in (12) and removing all the unaffected parts, we can get the CRB on DOA estimation as

Define , and let . Recalling (14), we can find that **Δ** can be rewritten as . Thus, we havewhere and account for the column of **D** and **F**, respectively. Hence, can be expressed aswhere denotes the element of and . Finally, we can get the CRB on DOA estimation as

#### 4. The Proposed TX/RX Selection Framework

##### 4.1. The Proposed CNN Architecture

The CNN model that utilized in this paper is depicted in Figure 4. It consists of four parts: input layer, convolutional layer, fully-connected layer, and output layer. The output of the CNN can be formulated as the nonlinear mapping of the input. Among which the activation function is a key point of each neuron, and it is linear for input and output layers and sigmoid for hidden layers. The Input layer contains a dataset with samples and associate labels. The convolutional layer is to extract features of the input. Herein, two convolutional layers are depicted. The convolutional kernel size is a key issue in CNN, and it should be designed according to input data. The fully-connected layer is fundamentally a classifier, and three dense layers are shown here. The output layer is a probability distribution, which contains the possibility of each Tx/Rx pair.

Rectified linear unit (ReLU) acts as a role of activation function following behind each available layer before the last dense layer. In the dense layer, SoftMax is applied. Let be the output of the neuron in a given layer, then the function with respect to ReLU and SoftMax can be formulated as

##### 4.2. Training Data Generation

To pursuit the optimal Tx/Rx pair, the CRB on DOA estimation is adopted to evaluate the selection result. The CRBs on the DOAs can be achieved from the diagonal elements of the CRB matrix. For multiple targets scenario, the optimal Tx/Rx selection criterion is defined by minimizing the average CRBs.

In order to train the CNN, we need enough labeled training data. To this end, the spatial domain is firstly discretized into *Q* grids, and we assume that all possible DOAs are on the grid. For different DOA combinations, we calculate the CRBs corresponding to various Tx/Rx pairs, and pick up the optimal Tx/Rx pair corresponding to min CRB. The antenna selection problem can be interpreted as a problem of permutation and combination; the optimal Tx/Rx pair can be obtained by greedy searching. Since the reference antenna must be chosen, there needs searches. To alleviate the search burden, the random strategy can be adopted [32]. The labeled data include two essential factors: DOA pair and Tx/Rx indexes. Some of the selected results are shown in Table 1, in which the signal-to-noise ratio (SNR) is set to −10 dB, and *L* = 200 snapshots are considered; all the results are obtained from 100 independent trails. Some of the above details are shown in Figures 5 and 6.

Before we train the CNN using the optimal Tx/Rx pair, we need to label the dataset. According to the CRB expression, one should know the SNR and target number in the training. Therefore, it is necessary for us to preprocess the matched data, i.e., to estimate the target number *K* and the noise power . Thereafter, labeled datasets can be generated via the preprocessing results. Finally, all the datasets can be divided into two subsets: the training datasets and the test datasets. The training datasets are utilized to get the weight values of the CNN, while the test datasets to evaluate the classification results.

It should be noticed that the array data is complex valued. However, only real-valued datasets are acceptable in typical CNN. So, it is necessary for us to convert the complex values to real ones. A common way is to extra the real part, the image part, and the angle of the covariance matrix, as illustrated in [18]. In this paper, however, instead of inputting these counterparts, the amplitude values and phase values of the covariance matrix are identified as the input of the CNN. The detailed steps of training CNN are depicted in Table 2.

#### 5. Simulation Results

To verify the effectiveness of the improved CNN-based Tx/Rx selection framework for the MIMO radar (marked with CNN-MIMO), computer simulations have been carried out. In the simulation, we consider a monostatic MIMO radar setup, which is configured with *M* transmit elements and *N* receive elements in total, both of which are ULAs with half-wavelength spacing. Suppose there are *K* uncorrelated sources, the reflection coefficients of which fulfill the Swerling II model and *L* snapshots are available. For power saving purpose, *M*1 Tx and *N*1 Rx are chosen from the transmit array and the receive array, respectively.

In the first example, we test the estimation performance of the CNN-MIMO and the CS-MIMO framework in [7]. The root mean square error (RMSE) is defined aswhere *T* denotes the total number of trail, and represent the *k*th DOA and its estimate for the Monte Carlo trial. Herein, we assume there are two DOAs at directions and , respectively. , , and *L* = 200 are fixed in the simulation. Figure 7 illustrates the RMSE performance of the CNN-MIMO framework and the CS-MIMO method. Obviously, the proposed CNN framework provides much better estimation performance than the CS-MIMO method.

In the second example, we evaluate the DOA estimate performance of both methods with various snapshot numbers *L*, where SNR is set to 0 dB; other simulation conditions are the same to that in the first example. In Figure 8, similar to the previous result, the performance of all the methods gradually improved with *L* increasing. Besides, The CNN-MIMO framework offers much better RMSE performance than the CS-MIMO method. The above improvement benefits from the fact that the CNN-MIMO framework always brings the optimal Tx/Rx pair, while the array geometry of the CS-MIMO method is randomly generated.

In the third example, we give the loss function of the proposed CNN-MIMO framework in one training duration. Figure 9 shows that the loss function is quickly decreased at the first few iterations. However, once the iteration step reaches a threshold, the loss function is slowly decreased with iterative steps increasing. Finally, it can reach the given threshold and all the weight values will accomplished temporally. After all, the training sets have been finished, and the weight values will be fixed.

In the last example, we present the confusion matrices with respect to the CNN framework and the traditional SVM method, in which only ten optimal array configurations have been utilized. Herein, 10000 datasets have been used. Figure 10 summarizes the prediction results with respect to the CNN framework and the SVM method. Each row of the confusion matrix shows the accuracy of right shot and wrong shot with the test set. From this, we observe that the proposed framework provides better accuracy than the SVM scheme, especially in class 4.

#### 6. Conclusions

In this paper, we considered a realistic scenario in IIOV, in which the MIMO sensor is with limited power to calculate the vehicle position. To this end, the MIMO sensor needs to get the optimal Tx/Rx pair with limited number of Tx/Rx.

From all possible Tx/Rx pairs, which can be formulated as a multiclass-classification problem, a CNN-based framework is proposed for this issue. Firstly, the optimal Tx/Rx pair is calculated at the cost of finding the minimum CRB on DOA estimation. Thereafter, DOAs are classified into various pairs, and the optimal Tx/Rx combinations are one-to-one mapped to the DOAs. Then, 1200 sets are randomly generated associated to each DOA pair, in which 200 groups are used for training and the remainder are utilized for testing. The CNN is trained by the training set for learning the best weight values. After this, the CNN can be utilized to quickly determine the best array configuration once it gets the matched array measurement. The proposed framework provides much better performance than the existing CS-based method; it may get a bright prospect in future IIOV applications.

#### Data Availability

The data used to support the findings of this study are included within the article.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.