Abstract

In this paper, we propose a Doppler spread estimation approach based on machine learning for an OFDM system. We present a carefully designed neural network architecture to achieve good performance in a mixed-channel scenario in which channel characteristic variables such as Rician K factor, azimuth angle of arrival (AOA) width, mean direction of azimuth AOA, and channel estimation errors are randomly generated. When preprocessing the channel state information (CSI) collected under the mixed-channel scenario, we propose averaged power spectral density (PSD) sequence as high-quality training data in machine learning for Doppler spread estimation. We detail intermediate mathematical derivatives of the machine learning process, making it easy to graft the derived results into other wireless communication technologies. Through simulation, we show that the machine learning approach using the averaged PSD sequence as training data outperforms the other machine learning approach using the channel frequency response (CFR) sequence as training data and two other existing Doppler estimation approaches.

1. Introduction

Information about Doppler spread can be used for handoff, adaptive modulation, equalization, power control, etc., in wireless communication systems. Various Doppler spread estimation approaches have been studied for isotropic scattering and Rayleigh fading channels [16]. The Doppler spread estimation approaches of [5, 6] in single-antenna systems have been shown to outperform previous Doppler spread estimation approaches in isotropic scattering and Rayleigh fading channels. However, the performances of previous Doppler spread estimation approaches designed for isotropic scattering and Rayleigh fading channels degrade when the channels are shaped by nonisotropic scattering and line-of-sight (LOS) channel components. Therefore, it is still worthwhile to develop powerful new techniques for nonisotropic scattering and Rician fading channels for single-antenna systems. Meanwhile, machine learning has been the focus of extensive research in recent years because of its empirical success in various fields [724]. First of all, machine learning has been used for visual object recognition and speech recognition [7, 8]. Machine learning-based approaches have been successfully applied to wireless communication systems [918]. Recently, efforts have been made to apply machine learning to estimate the angle of arrival (AoA) [1922] and apply it to indoor positioning [23, 24]. In [25], machine learning was used to improve the performance of human detection and activity classification based on the information of micro-Doppler signatures, i.e., Fourier transforms of the Doppler radar data measured under line-of-sight (LOS) conditions. In [26], machine learning was used to improve the performance of vehicle collision avoidance services based on information from the Doppler profiles, i.e., spectral representations of temporal non-line-of-sight (NLOS) Doppler energy. In [27], machine learning was used to estimate Doppler spread based on channel state information (CSI) by a high-speed train (HST) system. The channel considered in [27] differs from a typical mobile radio communication channel in that it features a LOS component formed based on the position of the HTS moving on the same track with respect to adjacent radio head units. To the best of our knowledge, machine learning has not been used for Doppler spread estimation in the context of general mobile radio communication channels. This led to the study of applying machine learning to Doppler spread estimation for mobile radio communication channels in this paper.

Machine learning can work efficiently even when the system model is unknown, or the parameters cannot be accurately estimated. Doppler spread estimation techniques designed for specific channel conditions do not work well in real-world channel environments where channel characteristic variables such as Rician K factor, azimuth angle of arrival (AOA) width, average azimuth AOA, and channel estimation error are arbitrarily generated. Therefore, it is interesting to see if additional gains in performance can be obtained by applying a machine learning approach to Doppler spread estimation, especially when the channel characteristic variables are randomly generated. In [28], it was mentioned that machine learning requires a very large number of training data to effectively train the weights of a neural network for accurate classification. In other words, machine learning will not perform well if the amount of training data is not sufficient. However, it is often difficult to obtain a very large number of training data in real mobile radio communication systems. Given a limited number of training data, preprocessing the training data into high-quality training data by using feature selection and feature extraction can improve the performance of machine learning [29, 30]. This motivated our study to find out how to preprocess the collected CSI into high-quality training data to improve the machine learning performance for Doppler spread estimation.

In this paper, we propose a Doppler spread estimation approach based on machine learning for an OFDM system. We present a carefully designed neural network architecture to achieve good performance in a mixed-channel scenario in which channel characteristic variables such as Rician K factor, azimuth AOA width, mean direction of azimuth AOA, and channel estimation errors are randomly generated. When preprocessing the CSI collected under the mixed-channel scenario, we propose averaged power spectral density (PSD) sequence as high-quality training data in machine learning for Doppler spread estimation. We detail intermediate mathematical derivatives of the machine learning process, making it easy to graft the derived results into other wireless communication technologies. Through simulation, we show that the machine learning approach using averaged PSD sequence as training data outperforms the other machine learning approach using channel frequency response (CFR) sequence as training data and two other existing Doppler estimation approaches. The main contributions of this paper are summarized as follows: (1) This paper applies machine learning to Doppler spread estimation in the context of mobile radio communication channels in a mixed-channel scenario. (2) The machine learning process detailed in this paper can be easily applied to other wireless communication technologies as well.

The rest of this paper is organized as follows. Section 2 describes the system model. Section 3 proposes two machine learning approaches and presents comprehensive algorithms for grafting machine learning into Doppler spread estimation. Section 4 presents the simulation results and compares the performance of the proposed and conventional approaches. Finally, Section 5 provides concluding remarks.

2. System Model

We consider an OFDM system with FFT size . We assume that the number of OFDM symbols required to compute the PSD function is , and the number of test user equipments (TUEs) required to collect training data for machine learning is . The channel impulse response of the -th TUE for is modeled as a tapped delay line with taps, for , where denotes the OFDM symbol index and denotes the coefficient of the -th channel path. To consider the effects of nonisotropic scattering and LOS channel components on Doppler spread estimation, we model the channel coefficients as in [1] by where and denote non-LOS and LOS components, respectively, denotes the Rician K-factor defined on a linear scale meaning the ratio of LOS component power to non-LOS (or fluctuating) component power, and denotes the power delay profile. For Rician fading, most estimated K-factor values are less than 3 dB in urban areas [31]. For the simulation, we assume an exponential power delay profile where for is given by with . The autocorrelation function of was presented in [1, 5, 32] as where denotes the Doppler spread, denotes the azimuth AOA width factor which controls the width of the azimuth AOA, denotes the mean direction of the azimuth AOA, denotes the OFDM symbol period, and denotes the zero-th order modified Bessel function of the first kind. The -th Rician channel component, , can be written as where is the parameter controlling the azimuth AOA direction of the LOS component and is the parameter representing the phase of the LOS component with a uniform distribution between and . The autocorrelation function of was presented in [5, 32] as The CFR coefficient over the -th subcarrier is given by the discrete Fourier transform (DFT) of , for . The OFDM demodulated signal over the -th subcarrier of the -th OFDM symbol for the -th TUE can be written as where denotes the transmitted symbol and denotes a zero-mean circularly symmetric complex Gaussian noise with variance . is defined as assuming that the transmitted symbol has unit average power. If the number of pilot symbols in the OFDM block is , the CFR coefficient over the subcarrier of the -th pilot symbol for can be estimated by the least square detection method in [33] as where denotes the subcarrier index of the -th pilot symbol. We use the minimum mean square error-based channel interpolation method in [34] to estimate the CFR coefficients, . By using this channel estimation method, the channel estimation error increases as the SNR decreases. The ideal PSD function of the channel can be obtained as in [35] by taking the Fourier transform of , for . The Doppler spread index is defined by where denotes the period of collecting information of consecutive OFDM symbols and denotes the frequency resolution required to determine the Doppler spread index with the Doppler spread. Since base stations (BSs) in macrocells generally suffer from severe nonisotropic scattering, has its maximum quantity at [5]. For severe nonisotropic scattering, it is reasonable to assume for all [36]. The Doppler spread is found by transforming to based on for and finding [5]. For severe nonisotropic scattering, the angle spread is mostly less than 30°, and for very severe nonisotropic scattering, it is less than 10° [5]. If is not too small, can be approximated as [37]. We assume that the transmitted signal undergoes severe nonisotropic scattering and that has a uniform distribution between 10° and 30°. Although there are several ways to estimate the PSD function [38], periodogram-based nonparametric techniques are often used because of its simplicity. Using the periodogram-based nonparametric technique, multiple PSD functions can be computed with multiple series of CFR coefficients; the -th PSD function computed using a series of the -th estimated CFR coefficient, , can be written as for and . Because of channel estimation error and the lack of channel coefficients used to compute the PSD function, the accuracy of the PSD function decreases. With an inaccurate PSD function, the maximum value of given in (12) does not always occur at the ideal Doppler spread index . Since , a slight deviation in the value of results in a times greater deviation in the value. Therefore, it is important to set to a small value to improve the performance of the nonparametric Doppler spread estimation approach. In order for to be small with fixed, should be larger because . The sampling theorem states that in the time dimension, the channel sampling rate must be at least twice the Doppler spread, i.e., . A good rule of thumb for finding the maximum of that can be estimated by a Doppler spread estimation approach is to assume that the maximum of lies between and ([35], pp. 845–846). We set the maximum value of to 0.1. Therefore, we assume that , or equivalently, .

3. Proposed Doppler Spread Estimation

Figure 1 shows the block diagram of the proposed neural network structure to realize forward signal propagation in machine learning. The proposed neural network structure consists of an input layer with nodes (or neurons), an output layer with nodes, and three hidden layers. Since , we set to and let the output layer act as a multiclass classifier generating a one-hot encoded binary sequence of multiple zeros and single ones. The position of a single 1 in the one-hot encoded binary sequence represents . The output signals of the three hidden layers are processed with the batch normalization function [39], the dropout function [40], and the rectified linear unit (ReLU) activation function [41]. The output signal of the output layer is processed with a softmax regression (SoftMax) activation function [42]. Since a bias factor of 1 is added to every sequence entering each hidden layer, the dimensions of the four weight matrices, , , , and , are chosen as , , , and , respectively, where is selected as 1000. We randomly initialize all components of , , , and using a Gaussian distribution with mean 0 and variance as in [43], where denotes the column number of the matrix to initialize. The CFR can be said to be a kind of raw CSI, and the averaged PSD sequence can be said to be a kind of processed CSI. In this paper, as two representative machine learning approaches, we consider a machine learning approach using the CFR sequence (i.e., raw CSI) as training data and a machine learning approach using an averaged PSD sequence (i.e., preprocessed CSI) as training data. We name the two machine learning approaches ML1 and ML2, respectively. In ML1, the training data are selected from a CFR sequences of length (i.e., ). The -th training data for can be written as where

In ML2, the training data are selected from the averaged PSD sequences of length (i.e., ). The -th training data for can be written as where

The following is a summary of the notations used in machine learning algorithms: (i) represents the -th component of a vector or sequence, (ii) represents the component-wise product of the two vectors, and (iii) represents the -th row of the matrix (iv) represents the length of the vector(v) represents a column vector whose components are the components of the vector from the -th position to the -th position(vi)Given two vectors and of equal length, represents a vector whose -th component is 1 if and 0 if for (vii) represents a randomly generated matrix whose components follow a Gaussian distribution with zero mean and unit variance(viii) represents the transpose operator(ix)For input vector , represents a function whose output is given as (x) represents a square matrix whose diagonal components are given by the components of the vector (xi)For two column vectors, and , represents a matrix created by stacking two row vectors and

A full overview of the proposed machine learning algorithm is given in Algorithm 1. In both ML1 and ML2, the forward signal propagation algorithm, the backward error propagation algorithm, and the parameter update algorithm are performed times in two “for” loops. In the forward signal propagation algorithm, the input vector is initialized with . The output vector resulting from is used to yield the final loss (or final cost) . To compute , the sequence of that satisfies is prepared in advance, where denotes the -th target Doppler spread index sequence written as a one-hot encoded binary sequence consisting of multiple 0’s and a single 1. One example of can be written as where the index of a single 1 in represents the Doppler spread index defined in (11). The whole data that needs to be prepared in advance to run the proposed machine learning algorithm for Doppler spread estimation can be arranged in two matrices; first, the target Doppler spread index sequence matrix, and second, the training data matrix,

To gather information about , the BS can have a TUE move with a constant velocity and have that TUE report its Doppler spread index, , where (m/sec) and is the carrier frequency. To gather information about , the BS needs to estimate the CSI from that TUE and use the CSI to compute the training data. This method of collecting information by the BS entails system overhead. However, unless there are significant changes in Doppler spread statistics, reusing trained machine learning weights for Doppler spread estimation can reduce the overhead. Since it is difficult to actually collect the measured CSI at various times and places that can generate different Doppler spreads, in this paper, the collection of CSI necessary for machine learning or performance evaluation is limited to work through simulation.

1 Input:
2  which denotes the data of the averaged PSD sequences
3  which denotes the data of the target Doppler
4 shift index sequences
5
6 Fixed Parameters:
7 
8
9 Weight Matrix Initialization:
10 
11 
12 
13 
14 
15
16 for do
17 for do
18  
19  
20  Forward Signal Propagation (Algorithm 2)
21  Backward Error Propagation (Algorithm 4)
22  Parameter Update (Algorithm 6)
23 end
24 end
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
1
2
3
4
5
6
7 return

By the double “for” loops of Algorithm 1, three subalgorithms, namely, “Forward Signal Propagation” (Algorithm 2), “Backward Error Propagation” (Algorithm 4), and “Parameter Update” (Algorithm 6) are executed iteratively. In Algorithm 2, the training data are propagated through the proposed neural network structure. Firstly, the input sequence is incremented by a bias factor 1 to produce the increased input sequence . Then, is multiplied by to produce . Then, is batch-normalized to yield . The batch normalization function [39] is written in Algorithm 3. Then, is activated by the ReLU function [41] to generate . The ReLU function can be written as where

Then, is applied to the dropout function [40] resulting in . Dropout is a technique to solve the overfitting problem by omitting hidden nodes with predefined probabilities. The dropout function can be written as where denotes the dropout rate given by 0.02. for denotes a randomly generated vector of zeros and ones that satisfy the condition that the number of zeros is equal to the product of the value of and the number of components in . In (22), for the purpose of maintaining the same power of as in , is multiplied by . Then, is obtained by inserting a bias factor of 1 into . Similar to the process above, and will also be obtained. Then, is multiplied by to produce . Finally, is created by activating with SoftMax function ([42]), i.e., , where for . In Algorithm 4, the back-propagation error is derived using the backward error propagation technique [44, 45]. To determine the final cost (or final loss) , cross-entropy function [42] is used with two input vectors and as

When an error of amplitude 1 is back-propagated from the output position of the cross-entropy function to the -th input position of the SoftMax function, a back-propagation error is induced as

Therefore, if an error of amplitude 1 propagates back from the output position of the cross-entropy function to the input vector position of the SoftMax function, the resulting back-propagation error can be written as

Since , we derive

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
1
2
3
4
5
6
7
8
9 return

By the chain rule [46] accompanying (26) and (27), back-propagation error generated at the input position of can be written as

Taking into account the bias factor contained in , we derive where denotes a zero-column vector of length . Therefore, the back-propagation error at the input position of the dropout function can be written as

Since , we can write for . By expanding (31), we derive

From the result of (30) and (32), the back-propagation error at the input position of the ReLU function can be written as

1
2
3
4
5
6
7
8
9
10

The backward error propagation function for batch normalization is defined in Algorithm 5, whose detailed derivation is provided in the Appendix. Applying to that function will result in the back-propagation error, , at the input vector position of the batch normalization function. Similar to the process above, and can also be obtained. In Algorithm 6, batch normalization parameters, , , , , , and , and weight matrices, , , , and , are updated according to stochastic gradient descent [47]. The batch normalization parameters, and for , are updated according to where denotes the learning rate chosen as 0.001. The values of and are generated by the backward batch normalization function, the detailed derivation of which is presented in the Appendix. is updated according to the stochastic steepest descent method as where denotes the learning rate chosen as 0.01. To find , we first derive

Since , we derive denotes a vector of length , where the -th component is given as 1 and the other components are given as 0’s. By substituting the result of (37) into (36), we derive

By stacking the results of (38) for all indices of and , we can write

By substituting the result of (39) into (35), we derive

Similar to the process above, the formula for updating for can also be derived as

Note that the BS can train the weight matrices offline using Algorithm 1. The trained weight matrix can be used by the BS to estimate the Doppler spread of the target user equipment (UE) online. When the weight matrices are trained by ML1, the BS can find the Doppler spread of the target UE by computing the CFR sequence based on the CSI of the target UE and using the CFR sequence as in Algorithm 2 to generate . When the weight matrices are trained by ML2, the BS can find the Doppler spread of the target UE by calculating the averaged PSD sequence based on the CSI of the target UE for and applying to Algorithm 2 to find . Since the maximum component index of represents the Doppler spread index , the Doppler spread of the target UE is given by based on (11).

4. Simulation Results

In the simulation, OFDM system parameters are chosen as , , and  sec. The number of pilot subcarriers is chosen to be , which means that the subcarrier spacing between two neighboring pilots is given by . The machine learning settings for two machine learning approaches, ML1 and ML2, are summarized in Table 1.

To evaluate the Doppler spread estimation performance, we consider the root mean square error (RMSE) and the normalized RMSE (NRMSE) where denotes the estimated Doppler spread at the -th TUE and denotes the Monte Carlo simulation number, which is set to . For comparison, the two previous Doppler estimation approaches introduced in [5, 6] are also considered and named PREV1 and PREV2, respectively. PREV1 estimates Doppler spread by referring to the maximum position of the maxima of the multiple PSD functions derived from channel multipaths [5]. PREV2 estimates Doppler spread by referencing the maximum position of the averaged PSD values greater than 10% of the maximum averaged PSD value [6].

In Figure 2, we evaluated the RMSEs of ML1, ML2, PREV1, and PREV2 in terms of SNR in a Rayleigh and isotropic channel scenario, where , , , and were assumed to be 40 Hz, 0, 0, and 0°, respectively. The same channel scenario was used to generate the CSI used for training ML1 and ML2 in the simulation. In the figure, the curve for ML2 is not shown because all RMSEs for ML2 are given as zero. This means that ML2 perfectly estimated the Doppler spread at all SNRs, unlike other approaches. PREV2 outperformed PREV1 and ML1 at SNRs above -0.8 dB, while PREV2 degraded as SNR decreased, performing worse than PREV1 and ML1 at SNRs below -2.5 dB. The reason for the poor performance of PREV2 at low SNR is that the maximum position of the averaged PSD values greater than 10% of the maximum averaged PSD value becomes more uncertain due to the PSD values adversely affected by small SNRs or large channel estimation errors. Although ML1 had higher implementation complexity than ML2, it performed much worse than ML2 because it could not effectively train the weight matrices using 2000(=) CFR sequences. This indicates the effectiveness of using averaged PSD sequences as training data in machine learning for Doppler spread estimation compared to using CFR sequences when the number of the training data is limited.

Depending on the location and velocity of the UE and the scattering environment between the UE and the BS, channel characteristic variables such as SNR, , , , and can be randomly generated. For the purpose of making ML1 and ML2 work well anytime and anywhere, in the following simulations, ML1 and ML2 were trained based on the information of the channels generated assuming a mixed-channel scenario in which SNR, , , , and have a uniform distribution between -6 dB and 30 dB, between 0 Hz and 100 Hz, between 10° and 30°, between 0 and 2, and between -180° and 180°, respectively.

In Figure 3, we evaluated the RMSEs of ML1, ML2, PREV1, and PREV2 in terms of SNR for the channels whose characteristic variables, , , , and , have a uniform distribution between 0 Hz and 100 Hz, between 0 and 2, between 10° and 30°, and between -180° and 180°, respectively. was chosen equal to as in [36]. It was observed that ML2 outperformed the other approaches at all SNRs because ML2 effectively trained the weight matrices in the mixed-channel scenario. ML2 yielded RMSE values of less than 21 for SNR above 0 dB, which means that when the carrier frequency is 2.4 GHz and the SNR is above 0 dB, the amount of error in user speed estimation is less than 9.45 km/hr. On the contrary, ML1 performed worst for SNRs higher than -1 dB because the 2000 CFR sequences were not sufficient to properly train ML1’s weight matrices. Since ML1 used unprocessed CSI (i.e., CFR sequences) for Doppler spread estimation, it yielded a constant RMSE curve. PREV2 outperformed PREV1 at SNRs above 6 dB, while PREV2 degraded as SNR decreased, performing worse than PREV1 at SNRs below 6 dB. The reason for the poor performance of PREV2 at low SNR is that the maximum position of the averaged PSD values greater than 10% of the maximum averaged PSD value becomes more uncertain due to the PSD values adversely affected by small SNRs or large channel estimation errors.

In Figure 4, we evaluated the NRMSEs of ML1, ML2, PREV1, and PREV2 in terms of for the channels whose characteristic variables, SNR, , , and , have a uniform distribution between -6 dB and 30 dB, between 0 and 2, between 10° and 30°, and between -180° and 180°, respectively. It was observed that ML2 outperformed the other approaches for all values of because ML2 effectively trained the weight matrices in the mixed-channel scenario. ML1 yielded a constant NRMSE curve because it used unprocessed CSI (i.e., CFR sequences) for Doppler spread estimation. ML1 performed worst when was greater than 53 Hz. PREV2 performed worse than PREV1 at all values because PREV2 tended to yield very large NRMSEs at low SNRs, which led the overall performance of PREV2 to be worse than PREV1 when the SNR was uniformly generated between -6 dB and 30 dB.

In Figure 5, we evaluated the RMSEs of ML1, ML2, PREV1, and PREV2 in terms of for the channels whose characteristic variables, SNR, , , and , have a uniform distribution between  dB and 30 dB, between 0 Hz and 100 Hz, between 0 and 2, and between -180° and 180°, respectively. Note that representing the azimuth AOA width can be expressed as through [37]. It was observed that ML2 outperformed the other approaches at all values of because ML2 effectively trained the weight matrices in the mixed-channel scenario. PREV1 outperformed ML1 and PREV2 for all values. As increased, the performance of PREV2 decreased, which can be explained as follows. As increases, the channel autocorrelation takes a sharper shape according to (5). With sharper shaped channel autocorrelation, the PSD function forms a smoother shape according to (10), resulting in a smaller maximum PSD value. Therefore, the maximum position of the averaged PSD values greater than 10% of the maximum averaged PSD value becomes more uncertain by a larger , especially if the PSD values are adversely affected by a small SNR or large channel estimation error.

In Figure 6, we evaluated the RMSEs of ML1, ML2, PREV1, and PREV2 in terms of Rician K-factor for the channels whose characteristic variables, SNR, , , and , have a uniform distribution between -6 dB and 30 dB, between 0 Hz and 100 Hz, between 10° and 30°, and between -180° and 180°, respectively. It was observed that ML2 outperformed the other approaches at all values because ML2 effectively trained the weight matrices in the mixed-channel scenario. PREV2 degraded as increased, which can be explained as follows. As increases, the effect of the fixed channel component becomes more significant than the fading (or fluctuating) channel component. The greater the effect of the fixed channel component, the more imprecise it is to find the Doppler spread by referencing the maximum position of the averaged PSD values greater than 10% of the maximum averaged PSD value. Therefore, the performance of PREV2 degrades as increases. At high SNR, PREV1, which estimates the Doppler spread by referring to the maximum position of the maxima of the multiple PSD functions derived from channel multipaths, outperformed ML1 and PREV2 because at least one channel multipath can have a strong fading component with high probability [5] and PREV1 could benefit from that multipath. Very small values of cause very large RMSEs in PREV1 and PREV1. This is because for very small values, a low SNR or a large channel estimation error greatly deteriorates their Doppler spread estimation performance and thus causes a large overall Doppler spread estimation error even if the SNR is uniformly generated.

In Figure 7, we evaluated the RMSEs of ML1, ML2, PREV1, and PREV2 in terms of mean direction of the azimuth AOA for the channels whose characteristic variables, SNR, , , , have a uniform distribution between -6 dB and 30 dB, between 0 Hz and 100 Hz, between 10° and 30°, and between 0 and 2, respectively. was chosen equal to as in [36]. It was observed that ML2 outperformed the other approaches for most values. However, for values with , ML2 performed worse than PREV1 and PREV2. The reason for this phenomenon is that ML2 trained the weight matrices with the aim of improving the overall Doppler spread estimation performance for all values while effectively avoiding the overfitting problem. It was observed that PREV1, PREV2, and ML2 produced RMSE curves that fluctuate in terms of because PREV1, PREV2, and ML2 used PSD sequences for Doppler spread estimation. Notice the fact that the maximum position of the PSD function fluctuates due to the term as shown in (6). However, ML1 yielded a constant RMSE curve because it used unprocessed CSI (i.e., CFR sequences) for Doppler spread estimation. Because PREV1 and PREV2 assumed and estimated the Doppler spread, the performance of PREV1 and PREV2 deteriorated as value changed from 0° to 90°.

In Figures 8(a) and 8(b), we investigated the impact of on the performance of ML2. In Figure 8(a), we evaluated the RMSE in terms of SNR using the same simulation parameters as given in Figure 3. In Figure 8(b), we evaluated the NRMSE in terms of using the same simulation parameters as given in Figure 4. It can be seen from Figures 8(a) and 8(b) that the RMSE and NRMSE performance improved with increasing . However, increasing to a value greater than 2000 produced only minor gains. From the above simulation results, we conclude that is a good choice for the value in ML2.

5. Conclusion

A neural network structure in machine learning for Doppler spread estimation was proposed for an OFDM system. The weight matrix update algorithm was derived with the help of the backward error propagation technique and the stochastic steepest descent method. Numerical simulations have shown that averaged PSD sequences can be used to effectively train machine learning weights for Doppler spread estimation. The proposed machine learning approach using averaged PSD sequence as training data outperformed other Doppler spread estimation approaches under various channel conditions.

Appendix

A. The Derivation of the Back-Propagation Errors at All Edges of the Batch Normalization Function

The backward error propagation of the batch normalization function was derived in [48]. We present the back-propagation error at every edge of the batch normalization function with respect to the parameters defined in this paper. First, we summarize the backward error propagation properties [48] for some basic functions as follows: (i)Given two vector inputs, and , the back-propagation error through the “component wise addition” function arriving at the position of is the same as the error given at the output position of that function. This property is used to derive the results between ① and ⑬ in Figure 9(ii)Given two scalar inputs, and , the back-propagation error through the “addition” function arriving at the position of is the same as the error given at the output position of that function. This property is used to derive the results between ⑥, ⑩, ⑭, and ⑮ in Figure 9(iii)Given two vector inputs, and , the back-propagation error through the “component wise multiplication” function arriving at the position of is given by the component wise multiplication of and the error given at the output position of that function. This property is used to derive the results between ①, ②, and ⑨ in Figure 9(iv)Given two scalar inputs, and , the back-propagation error through the “multiplication” function arriving at the position of is given by the produce of and the error given at the output position of that function. This property is used to derive the results between ③ and ⑪ in Figure 9(v)Given a vector input of length , the back-propagation error through the “mean” function arriving at the input position of that function is given by the product of and the error given at the output position of that function, where denotes a vector of length consisting of all s. This property is used to derive the results between ⑦ and ⑫ in Figure 9(vi)Given a scalar input , the back-propagation error through the “reciprocal” function arriving at the position of is given by the product of and the error given at the output position of that function. This property is used to derive the results from ④ in Figure 9(vii)Given a scalar input , the back-propagation error through the “square root” function arriving at the position of is given by the product of and the error given at the output position of that function. This property is used to derive the results from ⑤ in Figure 9(viii)Given a vector input , the back-propagation error through the “componentwise-square” function arriving at the position of is given by the product of and the error vector given at the output position of that function. This property is used to derive the results from ⑧ in Figure 9

Second, we summarize the back-propagation errors at all edges denoted by circled numbers as shown in Figure 9, which were derived based on the properties mentioned above.

The fact that is used when deriving the result of ⑩. Finally, the backward propagation error arriving at the input of the batch-normalization function is given as the resulting sum of ⑫ and , which can be written as

Data Availability

The data is available at http://home.konkuk.ac.kr/~ecyoon/DATA_PAPER/Data_for_Figures.pdf.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. NRF-2018R1D1A1B07051392, No. NRF-2018R1D1A1B07050232, No. NRF-2021R1F1A1047578).