Abstract
The health management of weather radar plays a key role in achieving timely and accurate weather forecasting. The current practice mainly exploits a fixed threshold prespecified for some monitoring parameters for fault detection. This causes abundant false alarms due to the evolving working environments, increasing complexity of the modern weather radar, and the ignorance of the dependencies among monitoring parameters. To address the above issues, we propose a deep learning-based health monitoring framework for weather radar. First, we develop a two-stage approach for problem formulation that address issues of fault scarcity and abundant false fault alarms in processing the databases of monitoring data, fault alarm record, and maintenance records. The temporal evolution of weather radar under healthy conditions is represented by a long short-term memory network (LSTM) model. As such, any anomaly can be identified according to the deviation between the LSTM-based prediction and the actual measurement. Then, construct a health indicator based on the portion of the occurrence of deviation beyond a user-specified threshold within a time window. The proposed framework is demonstrated by a real case study for the Chinese S-band weather radar (CINRAD-SA). The results validate the effectiveness of the proposed framework in providing early fault warnings.
1. Introduction
A weather radar is a type of radar used to find precipitation, calculate its motion and intensity, and estimate the precipitation type [1]. Typically, a weather radar consists of five main subsystems including signal processor, transmitter, antenna, receiver, and control/communication processor. The data is sent to the data centre and is essential for timely and accurate weather forecasting. Modern weather radar has become more advanced with higher levels of digitization and integration, which poses challenges to maintaining its operational efficiency [2]. There is a need for proactive maintenance programs to monitor and manage the health condition of weather radar in a cost-effective manner. In the current practice, a fault alarm scheme is implemented by setting a predefined threshold for some state parameters, which often causes a large number of false alarms mainly due to the evolving working environments and the weakness of fixed threshold strategies [3].
The recent advances of machine learning provide powerful tools to explore the value of operational data in the health monitoring of weather radar. However, it is still at an infant stage for the meteorological community to use the data accumulated through the operating experience of weather radar [3–5]. Indeed, a wide range of methods has been applied in other industrial sectors that utilize data analytics to extract knowledge from historical data [6]. For instance, develop a knowledge-based system approach for sensor fault detection [7], diagnose the fault of rotating machinery based on decision tree and principal component analysis [8], monitor the condition of bridges based on a clustering approach [9], and identify contamination source using a sequential Bayesian approach [10].
Growing attention has been paid to deep learning techniques for end-to-end health management frameworks, because of their capability to handle large datasets and to automatically learn hidden features. Various deep learning architectures and their variants are employed for fault detection, diagnosis, and prognostics such as feedforward neural network, convolutional neural network, recurrent neural network, and autoencoder [11]. For instance, develop a convolutional neural network-based model to detect wafer structural defect using wafer images [12], adopt a stacked autoencoder for the fault diagnosis of rotating machinery [13], and address the variable working conditions through deep transfer learning [14, 15]. Note that most current studies are developed using simulation or lab-testing datasets. There is still a gap for real field applications in aggregating data from various sources, which raises the challenges of data preparation [16]. The literature on deep learning applications in PHM is quite large and has received growing attention in a broad range of sectors. We refer the interested readers to the references in [17, 18] for a comprehensive review on this topic.
In this paper, we propose a deep learning-based health monitoring framework with real applications to Chinese S-band weather radar (CINRAD-SA). In particular, we develop a long short-term memory (LSTM) network-based predictive model to capture the temporal patterns of working conditions of weather radar. The multidimensional time-series data collected in the normal condition are used to train the LSTM network and hence obtain an LSTM network to represent the health state of weather radar. Given any future time instant, the anomaly can be identified based on the deviation between the actual measurement and the prediction provided by the LSTM. Once the degree of deviation goes beyond a user-specified threshold, the weather radar would be considered at a fault condition. Ultimately, the number of occurrences beyond the threshold within a time window would signify the severity of abnormalities, which is then used to construct the health indicator of the weather radar.
The effectiveness of our proposed framework is demonstrated using the operational data obtained at a radar station from 2019/01/01 to 2020/10/14. We discuss the issues in problem formulation and our solution based on a two-stage approach. Then, we validate the proposed framework by successfully showing an early warning of a severe fault that occurred on 2020/01/14. The results indicate the potential value of the proposed framework in support of practical maintenance planning.
The rest of this paper is organized as follows. Section 2 describes the background of recurrent neural networks and the data acquired through the operational experience of CINRAD-SA. Section 3 presents the proposed framework to predict and assess the health condition of weather radar. Section 4 demonstrates the proposed framework using a real case study. Section 5 presents the conclusions and discusses future research.
2. Background
2.1. Data Available in the Chinese S-Band Weather Radar
There are three types of data collected through the operation of the CINRAD-SA: (1) the real-time monitoring data are multidimensional time series in form of floating-point and are collected based on the built-in sensors of the weather radar. There are typically hundreds of parameters for a weather radar; (2) fault alarm records are stored as a binary data format once some monitoring parameters go beyond a prespecified threshold value. Note that most of the alarm records are false alarm without the actual occurrence of fault; and (3) maintenance records contain the repair start and end times, the fault description, the replacement part, and the affiliated subsystem. The maintenance records can be used to locate the actual occurrence of a fault.
2.2. Long Short-Term Memory
An LSTM is a type of recurrent neural network (RNN) that is specialized for sequential data such as time-series data, text stream, and audio clips. RNN can learn the sequential characteristics of data by formulating a looping mechanism through a stack of building units, called the cells. Specifically, a cell can memorize the information given the current input and then pass through the same cell sequentially to produce a single output for each step, namely, the hidden state. Then, feed this hidden state and new data input to the next step. This allows the cell in the next step to learn from the previous steps, to understand the sequential characteristics of the data.
As the simplest RNN, the vanilla RNN only has one hidden gate and is facing a common problem of gradient vanishing given long sequences of data. Hence, various cell designs lead to variants of RNN, such as gated recurrent unit and LSTM [11], among which the latter is the most commonly used. LSTM cell consists of three gates: forget, input, and output gates, the operations of which are shown in Figure 1. The previous hidden state and the current data are fed into the forget gate to remove the information that is not relevant to the previous cell. Then, update the cell state given the past hidden state and the current state using the input gate and generate a new hidden state using the output based on the information of the updated cell .

3. Proposed Health Monitoring Framework
This section presents the proposed framework as illustrated by the flowchart in Figure 2. There are two main parts including offline development and online deployment. The historical data would be prepared and preprocessed to be used for the LSTM-based predictive model development in the offline stage. Then, construct the health indicator of weather radar by leveraging the LSTM-based prediction and the online measurement. This results in a health monitoring framework that tracks the health condition of weather radar and provides early warning one health indicator. The details of each part are discussed in the following sections.

3.1. Problem Formulation
We develop a two-stage approach to address the challenges involving problem formulation. There are two main challenges as follows: (i)The first challenge is data preparation to label the available monitoring data as either faulty or healthy conditions. Particularly, the monitoring data and fault alarm records have the same timestamp. Ideally, the fault alarm records can be applied to annotate the radar condition. However, most of the fault alarm records are false alarms and cannot properly represent the actual radar condition and due to the methodological deficiency of the prefixed threshold strategy in the current practice. On the other hand, maintenance records represent the actual fault occurrence but have timestamps different from the real-monitoring data and fault alarm records(ii)The second challenge is the limited amount of actual fault occurrence in the field. As such, it is not applicable to formulate a classification problem and directly use the binary state variable as the response variable indicating the radar state
Figure 3 illustrates a flowchart of the proposed two-stage approach. In the first stage, address the issue of abundant false alarms by synchronizing between fault alarm records and maintenance records. Specifically, we calibrate the fault alarm records by labelling the alarm records as 1 if their timestamp matches the maintenance records, otherwise 0. This results in the calibrated fault alarm records, which are representative of the actual radar faults. In the second stage, we intend to explore the association between the monitoring parameters and the calibrated fault alarm records. Then, formulate the health monitoring task as a regression problem. The response variable is set as the most relevant monitoring parameter and uses the other associated monitoring parameter as the explanatory variable. Overall, it is important for proper annotation of radar states and problem formulation so that assure the quality of data for model development and enable satisfactory predictive performance.

3.2. Predictive Model Development
Note that the working condition of weather radar is evolving continuously and pose challenges in learning predictive models with heterogeneity. Using the data of a large period would introduce much heterogeneity and hence compromise the predictive performance. Therefore, we recommend using the data two to three weeks before the time of interest for model training and learning to represent the latest working condition of the weather radar. Suppose the condition of weather radar can be represented by multidimensional time-series data, a sliding window is firstly applied to segment the data into batches and reshape the data into the format (number of windows, window length, number of monitoring parameters). This produces the data vector corresponding to each LSTM cell as shown in Figure 4. Denote the training dataset as , where represents the multidimensional time-series data, and represents the state parameter representing the health state of the weather radar. The network training aims to estimate the weights and bias parameters that characterize the predictive model. This is conducted by gradient-based optimization algorithms.

3.3. Health Indicator Construction
Suppose the LSTM-based predictive model is well trained and is then deployed online, given any time instant , one can make a prediction for a newly arrived real-time monitoring data within a time window length , where . As shown in Equation (1). then calculate the deviation between the prediction and the actual measurement , where measures the dispersion under healthy conditions and is calculated based on the entire training dataset.
Given a time instant, fault alarm would be triggered once its deviation greater than a user-specified threshold , which is set as in this study. However, the likelihood of false alarms would be high due to the turbulence of the working condition. To alleviate the issue, a health indicator needs to be constructed by aggregating the features of deviation within a time duration. The key idea is that the number of occurrences beyond the threshold within a time duration can be considered as a precursor of the possibility or severity of abnormalities. Suppose the length of the time window for health indicator construction is , we derive the health indicator as to the portion of occurrences less than the threshold in Equation (2). The health indicator can be used to track the health evolution of weather radar and provides references to support maintenance planning. where is the health indicator at time , and is the indicator function, which equals 1 for deviation less than the threshold, and otherwise equals 0. The health indicator ranges from 0 to 1. The higher the health indicator, the better the performance of weather radar.
4. Case Study
This section demonstrates the real application of the proposed framework to monitor the health evolution of weather radar. Section 4.1 describes the problem formulation and model development. Section 4.2 presents the results and discussions. The proposed framework was developed based on Python v3.6 and TensorFlow v1.5.0 using a desktop with Intel Core i7 9700 [email protected] and 32 GB DDR4 RAM.
4.1. Problem Formulation and Model Development
For problem formulation, we apply the two-stage approach proposed in Section 3.1 to process the historical data as follows. (1) Use the maintenance records to calibrate the fault alarm records. Any fault alert records that happen within 6 minutes forward or backward of the maintenance records would be annotated as the actual faulty condition (1), otherwise healthy condition 0; (2) investigate the association between the monitoring parameters and the calibrated fault alarm records using the stepwise regression method [20]. We start with no monitoring parameter and test the addition of each monitoring parameter based on a linear regression model. The monitoring parameter is added if its inclusion can statistically significantly improve the model fit. Repeat this process until no parameter can further improve the model.
Indeed, the parameters considered important for fault detection might vary accordingly in different radar stations. This would detrimentally affect the stability of the prognostic model. Therefore, the feature selection is conducted using the historical data from 31 radar stations as summarized in Table 1. For instance, in the first radar station, we analyzed the data collected between 1/1/2019 and 10/14/2020. The data contains 177814 examples with 139 monitoring parameters, each of which is collected every 6 minutes, 57625 fault alarm records, and 8 maintenance records. The number of false fault alarms is far greater than the actual number of faults when comparing the number of fault alarm records and the maintenance records. This highlights the limitation of the current practice using a predefined threshold as discussed in Section 3.1.
We proceed to identify the top monitoring parameters for each radar station, respectively, which account for 95% of the sum of the absolute value of the regression coefficient. Then, aggregate all the identified monitoring parameters. For the parameters shared in multiple stations, we sum up their regression coefficient in each station, namely, aggregated regression coefficient. The importance of each parameter can be further measured by the percentage of contribution to the total aggregated regression coefficient. As such, 38 monitoring parameters are identified as summarized in Table 2 in descending order. Expert judgment from the radar specialist is used as additional data for aiding feature selection. Accordingly, we screen out the parameters with an index within a range [24, 25] and [30, 38]. Finally, this results in 27 monitoring parameters in this study. This could help eliminate the parameter that is not important from a perspective of radar operation and also reduce the dimensionality of the data to facilitate the following mode development.
The parameter of ANT_AVG_PWR has the strongest relationship with a regression coefficient far greater than that of the other monitoring parameters. Therefore, the parameter of ANT_AVG_PWR is set as the response variable to represent the radar state, referred to as the state parameter in the following discussions. The other monitoring parameters constitute a 27-dimensional feature matrix describing the operating condition of weather radar and are used as the explanatory variables in the predictive model.
To demonstrate the proposed framework, we examine that whether the proposed framework can provide an early warning of the occurrence of an actual fault on 2020/01/14. A detailed failure analysis report showed that the root cause is the inverse peak overcurrent resulting from a transmitter modulator failure. We adopt the data collected both before and after the fault occurrence that is from 2019/11/01 to 2020/01/02 for model development. Specifically, the radar worked under healthy conditions from 2019/11/01 to 2020/01/02, and hence the corresponding data is used as the training dataset with 15435 examples; use the data between 2020/01/03 and 2020/01/20 as the test dataset with 4022 examples in either faulty or health condition. The training and test datasets are standardized and are reshaped with a time window length of 15. The model architecture consists of an LSTM layer.
4.2. Results and Discussions
Figure 5 shows a comparison between the actual state parameter and the LSTM-based prediction in both training and testing phases, where the -axis is the operational time of weather radar, and the -axis is the state parameter. The prediction shows a good fit for the actual measurement in the training phase. A significant difference is observed between the prediction and actual measurement in the testing phase, which indicates the occurrence of a radar fault.

We further calculate the deviation between the actual measurement and prediction according to Equation (1). Figure 6 displays the deviation’s temporal evolution and distribution. In Figure 6(a), the deviation fluctuates in healthy conditions below the threshold value as illustrated by the red dotted line and then tends to increase rapidly once an anomaly happens. As expected, the distribution of the deviation is positively skewed in Figure 6(b). A bimodal distribution is observed and shows the existence of two different modes (i.e., healthy and faulty conditions) in the operational weather radar. Only a few occurrences beyond the threshold value of 0.78 and provide the basis to construct the health indicator.

(a)

(b)
The health indicator is derived by examining the deviation calculated within a duration of 80 measurements. This is equivalent to 8 hours since the measurement interval is 6 minutes. In other words, we determine the radar condition based on its overall performance within an 8-hour duration, to alleviate the impacts of the turbulence of working conditions. The resulting health indicator is illustrated in Figure 7, where the yellow line marks the time of actual fault occurrence and the red line marks the fault prediction given a threshold of health indicator as 0.05. Specifically, the time of fault prediction triggered is on 2020/01/04 at 17 : 15, which is nearly 10 days ahead of the actual fault occurrence on 2020/01/14. This would provide an early warning and hence avoid serious consequences and unscheduled downtime. Also, an early warning would provide sufficient time to order the repair or replacement parts as needed. Note that specifying the threshold of health indicators involves a trade-off between the operational economics and the risk of missed detections of actual faults. A higher threshold value leads to a lower risk of missed detection but raises operational costs.

5. Conclusions
In this paper, we developed a deep learning-based health monitoring framework based on the real-time monitoring parameters in weather radar. Specifically, we proposed a two-stage approach to address the issues of fault scarcity and abundant false fault alarm records in the current practice. Then, formulate the health monitoring framework as a regression problem based on the monitoring parameter relevant to actual radar fault. An LSTM model was developed to represent the temporal evolution of radar under healthy conditions. In doing so, any anomaly can be captured by the deviation between the actual measurement and the prediction provided by the LSTM. Ultimately, a health indicator of weather radar was constructed based on the portion of the occurrence of deviation beyond a user-specified threshold within a time window. The effectiveness of the proposed framework was validated by the data collected from 2019/01/01 to 2020/10/14. The results showed that the proposed framework successfully provided an early warning of the actual fault occurrence on 2020/01/14. Future work would be the development of maintenance planning based on the health monitoring framework and case studies using the monitoring data collected in other radar stations.
Data Availability
The data are not publicly available due to privacy restrictions.
Conflicts of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Acknowledgments
This research was funded by the Meteorological Observation Center of China Meteorological Administration [2020330401000754].