#### Abstract

The current satellite management system mainly relies on manual work. If small faults cannot be found in time, it may cause systematic fault problems and then affect the accuracy of satellite data and the service quality of meteorological satellite. If the operation trend of satellite will be predicted, the fault can be avoided. However, the satellite system is complex, and the telemetry signal is unstable, nonlinear, and time-related. It is difficult to predict through a certain model. Based on these, this paper proposes a bidirectional long short-term memory (BiLSTM) deep leaning model to predict the operation trend of meteorological satellite. In the method, the layer number of the model is designed to be two, and the prediction results, which are forecasted by LSTM network as the future trend data and historical data, are both taken as the input of BiLSTM model. The dataset for the research is generated and transmitted from Advanced Geostationary Radiation Imager (AGRI), which is the load of FY4A meteorological satellite. In order to demonstrate the superiority of the BiLSTM prediction model, it is compared with LSTM based on the same dataset in the experiment. The result shows that the BiLSTM method reports a state-of-the-art performance on satellite telemetry data.

#### 1. Introduction

Meteorological satellite in a complex space environment that contains all kinds of loads is a complicated system, whose components have coupling relationship. Abnormality and faults easily occur. Small faults may cause systematic fault problems, then affect the accuracy of satellite data and service accuracy, and even cause significant economic losses and serious disasters to people's livelihood and the country. If the operation trend of satellite will be predicted, the fault can be avoided. It is of great significance for the intelligent health management of meteorological satellite and has reference value for the health management of other satellites.

At present, the trend prediction methods mainly include statistical prediction, mathematical prediction, intelligent prediction, and information fusion prediction. Among them, Auto Regression Moving Average (ARMA) prediction model, Support Vector Regression (SVR) prediction model, Back Propagation Network (BP) prediction model, and Long Short-Term Memory (LSTM) prediction model are widely studied and applied. ARMA prediction model is a linear model with finite parameters. For short-term prediction, the model has high fitting accuracy [1, 2]. But it is not suitable for nonlinear and nonstationary sequences. SVR prediction model based on structural risk minimization criterion has better prediction generalization ability for small sample training set, higher prediction accuracy, and better robustness than ARMA model [3]. But it can only be used as short-term prediction algorithm. In recent years, SVR is widely used to solve practical problems in various fields by combining with other algorithms [4–6]. BP network has become one of the most widely used neural network models because of its strong nonlinear mapping ability and self-learning ability [7–9]. However, it has some problems, such as slow convergence speed, lack of scientific theoretical basis for the determination of hidden layer nodes, and easily falling into local minimum points [10]. Deep learning is a branch of neural networks. The recurrent neural network (RNN) with depth and time series has been widely used as a prediction model for sequence data [11–13]. Although RNN can deal with time series problem, it has serious gradient dispersion problem. In order to weaken the adverse effects of gradient disappearance and long-distance dependence on neural network, the long-term memory ability of RNN is improved by replacing RNN chain unit with LSTM chain unit [14]. LSTM that uses additional memory cell to store states shows better ability for time series prediction than RNN. In recent years, LSTM has achieved a significant success in many fields [15–17].

Although LSTM model has good prediction ability of nonlinear time series [18], it takes historical time series data as the data input of the model, neglects the availability of future time series data, and lacks deep data mining. Moreover, the telemetry signal generated and transmitted from the satellite that reflects the operation of the satellite is unstable, nonlinear, and time-related. It is difficult to predict the satellite operation through a certain model. Bidirectional Long Short-Term Memory (BiLSTM) can process the sequence data in both forward and reverse directions and provide the past and future sequence information for each time in the sequence [19]. Therefore, BiLSTM mode is proposed to predict the operation trend of satellite in this paper. Increasing the number of depth layers of neural network can help enrich the feature set of network learning, increase the processing capacity of neural network, and improve the accuracy of the model. In addition, the paper presents using LSTM to predict the future trend and take the prediction results and historical data as the input of BiLSTM in order to provide the future time series data.

The remainder of this paper is organized as follows. In the next section, a brief description of problem definitions is presented, followed by the introduction of the proposed BiLSTM network architecture. Then, the application to meteorological satellite based on BiLSTMdeep learning model and our experimental results are shown. Finally, conclusions are drawn.

#### 2. Problem Definitions

Satellite management plays an important role for satellite safe. However, the current satellite management mainly relies on manual work, which takes time and effort. And these can lead to the fact that fault cannot be found in time. This paper intends using artificial intelligence algorithm to solve the current problems, by deeply mining and analyzing telemetry data, study the automatic prediction method for the operation trend of meteorological satellite, and solve the problems that restrict the stable operation of satellite and abnormalities, which cannot be found in time.

Based on the safety and reliability requirements of satellite, when it is on orbit, sensors are designed in the main functional modules of each key subsystem during satellite development. These sensors provide telemetry parameters data of satellite from the launch to in-orbit operation to the retirement. The satellite telemetry system collects the working conditions and values of various subsystems on the satellite according to a certain sampling period and forms telemetry data after A/D transformation and coding. And then, it transmits them to the ground through modulation and amplification, and these telemetry data are gotten through the inverse process after receiving them on the ground. Satellite telemetry data are divided into two categories: digital quantity and analog quantity. The digital quantity reflects the functional state of the measured unit on the satellite. Analog is the numerical measurement value of the measured unit, which usually reflects the performance state of the measured unit. Satellite telemetry parameters include thermodynamic parameters, power system parameters, and dynamic parameters. It is no doubt that a large amount of data will be generated. These telemetry data reflect the state of satellite payload and the operation situation of satellite platform, which are stored in time series. Therefore, the operation trend for meteorological satellite is related to an amount of spatial-temporal data.

The telemetry data contain a lot of objective laws and knowledge that can be used for trend prediction. The operation trend of satellite platform and load can be predicted based on these heterogeneous, coupled, and large-scale telemetry data (Figure 1 shows some parameters). However, the research faces the following challenges:(1)Feature representation is difficult: there are thousands of satellite telemetry data variables; telemetry data have problems such as outliers and uneven time intervals in telemetry data; due to the coupling and correlation of satellites, it is difficult for a single parameter to describe the comprehensive performance, and to determine which parameters can accurately describe a certain performance.(2)It is difficult to predict the trend: the satellite system is complex, the telemetry signal is nonstationary and nonlinear, and the telemetry parameters have three different variation patterns: stationary, abrupt, and periodic.

Facing the difficult of feature representations, this paper selects key features according to the contribution of the operation trends of satellite load and the relationship of the features. Through the analysis of the traditional machine learning algorithm for time series learning prediction, the traditional algorithms are no longer suitable for the operation trend prediction of complex meteorological satellite. Therefore, this paper will build a new prediction model for satellite telemetry data.

The central problem in the model of predicting satellite operation can be represented in the following terms. The problem of satellite operation can be defined using the spatiotemporal variable sequence of *X*_{t}-_{T} for prediction. The model can be denoted bywhere *Y*_{t} + _{T} is the predicted object in next *T* hours, *f* represents the final model learnt by the historical data, *X*_{t} denotes the datasets at the predicting moments, and *X*_{t}-_{T} are the datasets in *T* hours before the predicting moments.

#### 3. The Proposed Scheme

##### 3.1. The Introduction of LSTM Model

Compared with the traditional RNN, LSTM not only has a hidden state, but also adds a cell state. At the same time, it adds input gate, output gate, and forgetting gate in each layer to control the degree of adding or deleting information. The model can learn the long-term dependence information and avoid the problem of gradient disappearance [20, 21], with smaller error and higher prediction accuracy. At present, the most widely used LSTM network is to use the LSTM unit to replace the neural nodes in the hidden layer of RNN [22]. Its structure is shown in Figure 2, where *c*_{t-1} denotes the cell state of the previous moment, *h*_{t-1} is the output of the former LSTM, *x*_{t} and *h*_{t} represent the input and state output for the current moment respectively, and *f*_{t}, *i*_{t} and *o*_{t} represent the output value of the forgetting gate, the input gate, and the output gate, respectively. The LSTM unit update process is as follows:where and *c*_{t} represent the candidate cell status and the current cell state, respectively. *Wc*, *W*_{f}, *W*_{i} and *W*_{o} represent the weight of the candidate input gate, the forget gate, the input gate, and the output gate, respectively. *b*_{c}, *b*_{f}, *b*_{i} and *b*_{o} represent the bias of the candidate input gate, the forget gate, the input gate, and the output gate, respectively. and tanh represent sigmoid activation function and hyperbolic tangent activation function, respectively.

The training process of LSTM model adopts BPTT algorithm, which is similar to the classic back propagation (BP) algorithm. It can be roughly divided into four steps:(1)Calculating output value of LSTM cells according to forward calculation method (formulas (2)–(7))(2)Reversely computing error terms for each LSTM cell, which include two reverse propagation directions by time and network level(3)Calculating the gradient of each weight according to the corresponding error term(4)Updating weights based on the Optimization Algorithm of Gradient

There are many kinds of gradient-based optimization algorithm, such as Stochastic Gradient Descent (SGD) [23], Adaptive Gradient (AdaGrad) [24], Root Mean Square Prop (RMSProp) [25], and Adaptive Moment Estimation (Adam) [26]. Under the same number of iterations, the convergence speed of Adam algorithm is better than that of other algorithms. Furthermore, the loss function of Adam algorithm has the lowest value, and it costs the least time in all these stochastic optimization methods. Therefore, this paper chooses Adam algorithm as the optimizer, which performs better in practice than other stochastic optimization methods based on gradient.

#### 4. The Proposed BiLSTM Deep Learning Model

The situation of satellite operation is characterized by telemetry data transmitted from satellite, which has a certain spatial and temporal correlation. BiLSTM can not only solve the problem of long-term dependence between current moment and past moment, but also pay attention to the correlation between current moment and future moment. Based on this, the paper introduces BiLSTM to predict the operation trend of satellite. But, in practical application, we cannot get the future time series data, so that the paper presents using LSTM model to predict the future trend and takes the prediction results and historical data as the input of BiLSTM. Although BiLSTM is a deep neural network in time sequence, its network structure is extremely shallow in the number of layers. Increasing the number of depth layers of neural network can increase the processing capacity of neural network. The paper proposes stacking multiple BiLSTM and designing a deep bidirectional long short-term memory neural network. By stacking multiple hidden states, the deep BiLSTM is expanded to predict the operation trend of meteorological satellite. Trend prediction framework is shown in Figure 3.

The Deep BiLSTM network model contains multiple hidden layers, and the output of the former layer is used as the input of the latter layer, which helps mine the relationship between the front and back time series data, enrich the feature set of network learning, and improve the accuracy of the model. In the model, the transmission of network parameters in the input layer is the same as BiLSTM. But, in the out layer, Deep BiLSTM transmits the state to the hidden layer as input, and the state of multiple hidden layers is transmitted upward layer by layer, and finally to the output layer, as shown in Figure 4.

The historical time series data is input as the forward data of Deep BiLSTM network model, and the future time series feature data predicted by LSTM is input as the backward data of Deep BiLSTM network model.

In BiLSTM network, the historical time series data *X* = [*x*_{1}, *x*_{2}, …*x*_{T}] is input to the forward network unit of BiLSTM, which gets the forward hidden layer state. Historical time series data *X* = [*x*_{1}, *x*_{2}, …*x*_{T}]as the input of LSTM network, is used to predict the future time series data *X* = [*x*^{′}_{1}, *x*^{′}_{2}, …*x*^{′}_{T}]. The predicted time series data *X* = [*x*^{′}_{1}, *x*^{′}_{2}, …*x*^{′}_{T}] is input to the backward network unit of BiLSTM, which gets the backward hidden layer state. The output *p*_{t} is obtained byand, which are expressed as follows:where is the nonlinear activation function of the hidden layer, is the weight from the input *x* of the current neuron to the hidden layer at this moment, is the weight from the state quantity at the previous moment to the current state quantity, is the output value of the hidden layer at the previous moment, and is the offset term.

The BiLSTM model is iterated in two directions at the same time and calculates the predicted value *p*_{t} by weighting the hidden layer state, which is expressed as

Here, is the weight from the hidden layer to the output, is the weight from the hidden layer to the output, and *b*_{p} is the offset term.

#### 5. Experiments

##### 5.1. Datasets

FY4A on orbit is the second generation of geostationary meteorological satellite. It has four loads, that is, Advanced Geostationary Radiation Imager (AGRI), Geostationary Interferometric Infrared Sounder (GIIR), Lightning Mapping Imager (LMI), and Space Environment Monitoring Instrument Package (SEP). Among these, the operation trend of AGRI is used to evaluate the performance of the approach proposed in the paper, whose main function is to realize the quantitative observation of Earth’s surface and clouds. The telemetry data of AGRI, which are generated and transmitted on certain days, are put into use for identifying the algorithm.

Among telemetry data, 26 observation variables most likely to be relevant to the operation trend, that is, 26 features, are selected. Therefore, each sample data for trend prediction consists of 26 variables, that is, *X* = [*TMC1, TMC2,* …, *TMC26*], which are shown in Table 1. The dataset is generated at a sampling interval of 2 second. A total of 43189 data samples are collected as training samples and test samples. Figure 5 shows that the variables change with time from UTC 01 : 00 to UTC 01 : 59 on February 23, 2019. Figures 5(a)-5(d) are, respectively, the changing situation of variable *TMC3*, *TMC6*, *TMC11*, and *TMC19*with time. From Figure 5, we can see that different variables can have different operation trends with time.

**(a)**

**(b)**

**(c)**

**(d)**

#### 6. Evaluation Metric

In order to evaluate the prediction effect of BiLSTM deep learning model proposed by this paper, the Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) are selected as the criteria.

The MAE calculation formula is as follows:where and *y*_{i} represent the predicted value and the actual value, respectively, and *n* is the number of samples. The smaller the MAE is, the more accurate the prediction is.

The RMSE calculation formula is as follows:where *n* is the number of samples, is the predicted value, and *y*_{i} is the actual value. The smaller the RMSE is, the more accurate the prediction is.

MAE and RMSE both can measure the error of the predictions. Their difference is that RMSE can penalize huge errors, but MAE cannot.

#### 7. Results and Discussion

In order to prove the effectiveness of the BiLSTM deep learning model proposed by the paper, this method is compared with LSTM using the same training set and test set data under the same operating environment. All the experiments are carried out under the running environment of Inter(R) Core(TM) i7- 1.99 GHz, 8.00 GB RAM, and Windows 10. The development environment is Python 3.6 along with TensorFlow 2.0.0.

For the sake of reducing the dimension difference of characteristics and its possible impact, all data are normalized. After data preprocessing, roulette method is used to divide data sets, and 70% of the data set is used as training set, 20% as verification set, and the remaining 10% as test set. Because the setting of experimental parameters has a great influence on the experimental results, the method of fixed parameters is used in the experiment. Multiple experiments are done when the dropout is 0.1, 0.2, …, 0.9, respectively. Among them, when dropout value is 0.5, the RMSE is the smallest; that is, the prediction accuracy is the highest, as shown in Figure 6. By comparing the results of many experiments, it is found that the prediction effect of BiLSTM deep learning model proposed is the best when the parameters in Table 2 are taken. The epoch is 100, the optimizer chooses Adam, batch size is 64, training steps are 200, and learning rate is 0.0001.

Considering that the depth of neural network can improve the model representation, the number of BiLSTM layers is set to two. The BiLSTM layer number is determined by evaluating the loss value and the prediction of different BiLSTM layers, which are, respectively, shown in Figure 7 and Table 3. Figure 7 illustrates that the loss value of two layers is smaller than that of one layer. In Table 3, the RMSE and MAE of two layers are 0.0231 and 0.0188, respectively, which are smaller than those of one layer. These mean that the two layers are superior to one layer. Thus, the layer number of the BiLSTM model proposed by the paper is chosen as two.

In the experiment, infrared main amplifier −12 V, medium wave photovoltaic preamplifier +1.5 V, medium wave photovoltaic preamplifier −1.5 V, motor control +5 V, motor drive +8 V, temperature control +5 V, excitation +5 V, excitation −5 V, induct synchronizer +15 V, induct synchronizer −15 V, simulation conditioning +5 V, and simulation conditioning −5 V that sign the operation situation of meteorological satellite are predicted, which are labeled as *R*1, *R*2… *R*12, respectively. The training set data is used to train BiLSTM deep learning model, LSTM model, and RNN model. The verification set data is applied to adjust parameter of the model. And the model achieved by training is used to predict the test set data. The evaluation error indexes of each method can be calculated on the base of the predicted value of each method and the real value. The comparison results of the three methods are shown in Table 4. We can see that the RMSE of BiLSTM is smaller than LSTM and RNN from Table 4. This means that the BiLSTM deep learning model proposed in this paper can better predict the operation trend of meteorological satellite than LSTM and RNN models.

#### 8. Conclusions

In order to avoid the occurrence of major satellite fault and provide the complete and accuracy satellite data, this paper proposes meteorological satellite operation prediction method. Considering the unstable, nonlinear, and temporal characteristics of satellite telemetry data, a BiLSTM deep learning model for satellite operation trend prediction is put forward. The method uses multidimensional telemetry variables as the input, and LSTM is used to predict the future trend, and then, we take the prediction results and historical data as the input of BiLSTM deep learning model. Increasing the number of depth layers of neural network can enrich the feature set of network learning, increase the processing capacity of neural network, and improve the accuracy of the model. The research on the data sets transmitted from AGRI load of FY4A satellite shows that the BiLSTM deep learning model proposed by the paper has highest prediction accuracy and better performance than the classical LSTM model.

Further research work will be undertaken to improve the current method to make the results more accurate based on more dataset. Further work will also research that the method can be applied with real-time prediction for satellite, rather than just focusing on existing historical data set.

#### Data Availability

The telemetry data used to support the findings of this study are included within the article.

#### Conflicts of Interest

The authors declare no conflicts of interest.

#### Acknowledgments

This research was supported by the National Key Research and Development Program of China (No.2018YEC1507803).