RAEF: An Imputation Framework Based on a Gated Regulator Autoencoder for Incomplete IIoT Time-Series Data
The number of intelligent applications available for IIoT environments is growing, but when the time-series data these applications rely on are incomplete, their performance suffers. Unfortunately, incomplete data are all too frequent to a phenomenon in the world of IIoT. A common workaround is to use imputation. However, the current methods are largely designed to reconstruct a single missing pattern, where a robust and flexible imputation framework would be able to handle many different missing patterns. Hence, the framework presented in this study, RAEF, is capable of processing multiple missing patterns. Based on a recurrent autoencoder, RAEF houses a novel neuron structure, called a gated regulator, which reduces the negative impact of different missing patterns. In a comparison of the state-of-the-art time-series imputation frameworks at a range of different missing rates, RAEF yielded fewer errors than all its counterparts.
Today’s IIoT sensors are capable of collecting an inordinate amount of data, and the applications built to process these data are allowing us to monitor, analyze, and understand how things in our physical world are changing over time . However, to continue improving our capacity for time-series analysis, it is not enough to improve just the analysis methods with better context recognition , expanded service recommendations , improved anomaly detection , and so on. The quantity and quality of the time-series data also need to be improved. For the most part, improving data quality means making sure a data stream is comprehensive and complete. Scope tends to be the easier of these two to address—simply adding more and different types of sensors will get the job done. Unfortunately, completeness is often a more common and difficult issue to overcome . Data can be incomplete due to noise, sensor malfunctions, equipment error, human error, incorrect measurements, and other unavoidable circumstances . As such, almost every data stream produced by a sensor will be incomplete to some degree .
Being so common, there are several methods of dealing with incomplete data. The first is to install redundant sensors as backups. If one sensor fails to capture some data, the other may not. The main drawback with this solution is that two sensors cannot be in exactly the same place, nor do they tend to operate on exactly the same timing, so it can be difficult to align the temporal and spatial characteristics of the data . Hence, a more common remedy has been some form of data manipulation: generally, either deletion or imputation .
Deletion is a simple and efficient answer when the amount of missing data is very small in comparison with the total. However, in applications that are very sensitive to time series, deleting a small number of records can be enough to destroy the coherence of a sequence and may seriously affect the correctness of the results. Further, most data analysis methods, especially machine-learning methods, require a complete set of time stamps and are not robust to missing data. In contrast, imputing missing data can reduce sensitivity and provide a complete set of time stamps. Hence, imputation has commanded the bulk of the research focus in recent decades [10, 11].
The easiest methods of imputation simply replace the missing information with statistically reasonable values, such as means, modes, medians, or any predefined value . However, while straightforward and convenient, the accuracy of such methods relies on the complexity of the samples. With small, basic samples, they work fine, but when the features become complex, these methods are not reliable. For example, imputing with multivariate data generally requires an algorithm based on clustering. Similar samples are grouped into the same cluster and then used to evaluate missing values group-by-group . Rahman and Islam  rough fuzzy k-means algorithm is one example of this type of clustering imputation. Here, the researchers exploited fuzzy expectation-maximization and fuzzy clustering to build a missing value imputation framework for data preprocessing. Raja and Sasirekha  method was designed to handle missing values, while Zhao et al.  developed a local similarity imputation method that estimates missing data based on the stacked autoencoder (SAE) fast clustering algorithm and the top k-nearest neighbors. There is no doubt that these clustering-based imputations methods yield excellent results. However, clustering an entire time series is hugely time-consuming to the extent that these approaches cannot keep pace with today’s dramatic increases in data volume. Additionally, there comes a point when the data may be too incomplete for these methods to work with any level of accuracy.
In the IIoT paradigm, sensor data have special properties. For instance, multiple sensors are often used to record the same/similar measurements in many systems . Sensors that are geographically close to each other tend to be highly correlated for certain periods of time . This means that missing data can sometimes be imputed from the associated sensors, whether spatially or temporally. In these situations, modeling time series and then applying an imputation method such as smoothing or interpolation  can be a good choice.
Generally, smoothing or interpolation methods have a low computational overhead and are simple to implement, although they are not suitable for finding long-term correlations in time-series data. Machine-learning techniques can correlate features, which can improve imputation performance, such as generative adversarial models [19–22] and recurrent neural networks (RNNs). Among them, RNNs are known to be good at modeling time series, and for this reason, many hybrid-RNN methods have been developed. This is because vanilla RNNs estimate missing values from the data immediately preceding the gap. For instance, Kim et al.  devised an RNN model to impute missing medical examination data. The time series are modeled by RNNs, which compensate for the missing measurements and then predict future values. Minseok et al. , for example, developed an imputation framework called DeepIN based on this type of correlation information. DeepIN uses a deep network consisting of multiple LSTMs arranged according to the correlation information of each IIoT device. Ma et al.’s  LIME-RNN models incomplete time series (linear memory vector recurrent neural network). A learnable linear combination of previous history states means gradient information can be propagated efficiently. In this way, LIME-RNN can take full advantage of the previously observed information to reduce the negative impact of missing values. Alternatively, Li et al.  proposed a multi-view learning method for estimating missing values in time-series traffic data that combine RNNs and collaborative filtering techniques. There is a large body of papers on imputing with incomplete time series that assume any missing data from the current time step are the same as the previous time step [25, 27, 28] or that apply a decay mechanism to a hidden state to impute the missing data [29–31]. Yet, with RNNs, imputation performance suffers when the missing values become continuous. Further, the above imputation strategies can lead to instability during training, and with high missing rates, decay mechanisms will not find sufficient hidden information.
Another branch of investigation in the search to improve imputation performance is missing patterns. In this stream, Minseok et al.  compared the effects of missing continuity and discontinuity on imputation performance. Anindita et al.  and Tsai and Chang  considered the missing patterns of arbitrariness and monotonicity in medical data. Insuwan et al.  found that the rating data present a special missing pattern caused by user preference genres. Tak et al.  distinguished and contrasted the missing patterns in traffic data caused by prolonged physical damage to the sensors and measurement noise. However, to the best of our knowledge, no special missing patterns for IIoT environments have been proposed. This study is an attempt to change that. As such, our contributions are as follows:(1)We propose a framework based on a recurrent autoencoder, called RAEF. The encoder turns an incomplete time series into vector representations of both local information and global information. The decoder then initializes using the global information, decoding the local information into complete time-series data.(2)As an alternative to decaying the hidden state, inside RAEF, a gated regulator focuses on discriminating between ground truth information and fictitious information. This mechanism is better able to reduce the negative impact of increased missing rates in different missing patterns.(3)In empirical evaluations in a real IIoT environment, RAEF proves to be effective. Additionally, comparisons between RAEF and several state-of-the-art frameworks demonstrate that RAEF results in fewer errors at each missing rate tested.
The remainder of this study is organized as follows. Section II. presents the problem formulation and some necessary preliminaries. Section III. describes RAEF’s structure. Sections II. and IV. present the details of the experiments and results, and Section V. concludes the study.
2.1. Incomplete Time-Series Data
A sequential time-series data are a sequence of observations. At each time step , the observation has features . A sequential binary missing mask is applied when generating representations of the data, where denotes which features are missing at time step . The features missing at time step can be described as follows:
Thus, an incomplete sequential time series is denoted as .
The following rule is applied when training the model to create an artificial incomplete time series:
2.2. Analysis of Missing Pattern
With an analysis of a large amount of time-series data from the real IIoT environment, a piece of knowledge is that the main missing pattern for ITS is two types: univariate missing and common-mode missing.
Univariate missing data are the most common pattern, which often appears as a series of reading losses in a single sensor over a short period of time, as shown in Figure 1. The usual cause is a fault in the sensor itself. For simplicity, we have only considered recoverable cases in this study—namely where the data collection can be recovered in a limited time. Here, is the maximum length of continuous missing data, noting that, in general, .
The other type of missing pattern is common-mode missing data, also known as common-mode failure. In these cases, a large number of sensors fail to upload their readings at the same time. Usually, this is caused by some external factor, such as a disk error, a network communications error, and human intervention. . Figure 2 shows an example of this type of missing pattern.
2.3. Recurrent Neural Networks
Recurrent neural networks (RNNs) are especially suited to dealing with temporally and spatially correlated information because they process historical information recursively and model historical memory. RNNs are neural networks that work on a variable length sequence by maintaining a hidden state over time. At each time step , the hidden state is updated by the following equation:where is an activation function. Often is as simple as performing a linear transformation on the input vectors, summing them, and applying an element-wise logistic sigmoid function. is an internal intermediate state, and the model parameters are symbolized by , , , , and . Further, we can simplify the RNN at time step as an function formulated by the following equation:where encapsulates the different RNN variants. LSTMs  and gated recurrent units (GRUs)  are both very popular RNN variants.
3. The RAEF Imputation Framework
Figure 3 shows the structure of the RAEF. It learns to encode a sequence that may contain missing data and then to decode those vectors back into sequential time-series data without missing data. Note that the basic neuron used in the RAEF includes a novel GR.
3.1. RNN Encoder
The encoder is a model based on an RNN or variant. In our case, since may have missing data, it cannot be used to update as per Equation (4). So, when is missing, the output of the previous time step is used instead. The information in this previous time step is a type of local information. Further, the mean of across time steps, denoted as and . , can be described as follows:
Formally, the initial hidden state is initialized as an all-zero vector. From to , the model is updated by the following equation:where is a learnable scalar, initialized as 0. Introducing a learnable allows the network to first rely on the cues in . Gradually, it learns to assign more weight to . Hence, the encoder can be described as , where is the sequential input, is a hidden state, and is a differentiable function represented by Equation (6) with the parameters . Once the sequential time-series data have been fed into the encoder, is recorded, and a vector is generated that contains global information about the full sequence of time-series data input:where are some nonlinear functions. Here, we consider a simple deployment and so assume that . The loss function of the encoder is as follows:where is a coefficient weighting, which represents the importance of the previous imputation at each time step. Intuitively, it does not need to be overly precise for the first few time steps of training the encoder. In common, it is assumed that:
3.2. RNN Decoder
The decoder is also a model based on an RNN or variant that aims to decode the sequence from the encoder back into a sequential time-series data without missing data. is used to initialize the hidden state of the decoder. Note that, according to Equation (6), is considered to be a replacement to . Hence, the decoder works backwards, reading the sequence in the reverse order (i.e., from to ). The sequential outputs of the decoder can be derived using Equation (3), denoted as .
Hence, the decoder can be described as . Finally, the decoder trains the parameters by minimizing the errors between the output and the input sequential time-series data . The loss function is defined as follows:where uses the absolute error,
3.3. Gated Regulator
Since the operation of the encoder is represented in Equations (5) and (6), the input data of each time step are not completely consistent in authenticity. Intuitively, if the imputation framework can evaluate the input data authentically at an early stage, and before calculating the candidate state, the hidden state can reduce the incidence of inaccurate information. A gated structure, i.e., a gated regulator, is therefore integrated into the encoder, as shown in Figure 4. The motivation is to allow the encoder to decide how much of the current hidden state will gain its information from the current input without increasing the extra information. Formally, this can be described as follows:
Equation (6) becomes
Note that the gated regulator is an independent structure, which means it must be compatible with the RNN or variants. As an example, LSTM-GR means an LSTM with the gated regulator.
3.4. Training Process
To prevent a vanishing gradient or problems with explosion while back propagating RAEF, the training algorithm, Algorithm 1, prescribes that the encoder and decoder are trained asynchronously. The training process is therefore divided into three parts:(1)Input into the encoder, and update the encoder by descending its gradient .(2)Record the encoder’s output (3)Input into the decoder, and update the decoder by descending its gradient .
Throughout, weight clipping is used to limit changes in the encoder’s gradient.
4. Experiments Details
Our experience of real-world IIoT data, as shown in Figure 5, is that a great many data points can be missing from time series collected in these environments. The levels shown in Figure 5 indicate just how widespread the problem of missing data is in IIoT environments. This disturbing phenomenon not only affects the ability to monitor devices in real time but also reduces the accuracy of any subsequent analysis done by downstream applications.
In a series of analyses, we compare imputation with RAEF to several state-of-the-art imputation frameworks based on RNNs. Then, we illustrated how incomplete time-series imputation can improve the effectiveness of data applications. Last, we discuss the choice of .
4.1. Dataset and Experiment Setup
The datasets used in the experiment are summarized in Table 1.
4.1.1. UCI Air Quality Data (UAQ)
The UCI dataset contains 9358 records of average hourly responses from an array of 5 metal oxide chemical sensors embedded in an air quality chemical multi-sensor device taken between March 2004 and February 2005. The air quality data points have 12 features, and 7.5% of the values are missing. After removing the records with missing data, we randomly selected 20% of the data for testing and the others for training. Pearson’s correlations between each feature are shown in Figure 6. This dataset can be thought of as an incomplete time-series dataset of a real IIoT environment that is rich in information and has a low- to middle-level missing rate.
4.1.2. Base Station Status Data (BSS)
This dataset was collected from an ePLCM002FR edge node, developed by Hangzhou Yiyitaidi Information Technology Co., Ltd. and deployed in a base station located at the Spring Shopping Mall in the Zhangdian District, Zibo, Shandong Province (see Figure 7). The dataset comprises 14,820 data readings taken between February 2018 and February 2019. Every data point contains six attributes: the temperature and current intensity of two rectifiers, the air conditioning setting temperature, and environmental temperature. 18.2% of the values are missing. We used the data collected for May and September 2018 and February 2019 for testing. The remaining data were used for training.
Pearson’s correlations between each feature are shown in Figure 8. Compared with the UAQ dataset, the BSS dataset has shorter collection cycles, low data dimensions, and a higher missing rate. To stabilize the training with each dataset, we normalized the raw data via a linear transformation using the maximum and minimum (min-max normalization) before the experiment. However, because the BSS dataset does not contain any ground truth labels, experimenting with the actual missing values was not possible. Thus, we simulated missing data by randomly omitting data according to different missing rates, and using the real values as a ground truth label in Table 1 provides the details.
The results were assessed in terms of (MAE) and (MRE), calculated as follows:where denotes the index set of the missing values, and denotes the size. is the ground truth of the th missing item, and is its imputed value.
Brief descriptions of the comparators we chose as baselines are as follows:(1)Border mean (BM)—uses the average of the previous and posterior record of the missing values as the imputed value.(2)K-Nearest neighbor (KNN)—uses KNN  with a fixed to find similar samples and imputes the missing values according to the weighted average of the neighbors.(3)BRITS —a novel method based on RNN that combines a bidirectional recurrent hidden state decay mechanism and forward imputation. This approach can impute the missing values in a bidirectional recurrent dynamical system without any specific assumptions.(4)LIME-LSTM —a novel framework for modeling incomplete time series based on LIME-RNN using an LSTM, where a network learns the residual connection between time steps and implements a linear combination of previous historical states.
Note that BM and KNN are common approaches to imputation. BRITS and LIME-LSTM are both imputation frameworks for time series based on RNNs.
4.3. Implementation Details
We developed two implementations of RAEF: one with an LSTM and the other with a GRU. Further, we configured each model with and without a gated regulator to result in four baselines as follows.
For the proposed RAEF implementation, we tried two RNN variants: LSTM and GRU. Hence, we implemented 4 kinds of RAEF, RAEF-LSTM, RAEF-GRU, RAEF-LSTM-GR, and RAEF-GRU-GR:(1)RAEF-LSTM (RFL): both the encoder and decoder are implemented as LSTMs(2)RAEF-LSTM-GR (RFLR): the encoder was an LSTM incorporating a gated regulator, and the decoder was an LSTM(3)RAEF-GRU (RFG): both the encoder and decoder were implemented as GRUs(4)RAEF-GRU-GR (RFGR): the encoder was implemented as a GRU with a gated regulator, and the decoder was a GRU.
For all methods, we fixed the parameters of the RNNs to be the same. The dimensions of the hidden state were , and the learning rate was . In deploying the RNN-based models, we cut the datasets into sequences with a fixed length of and input samples at once for training. The settings for the values of and are shown in the last two lines of Table 1 and were applied to all RNNs consistently. Note that, in the training process, instead of using a validation set, we ended the training when the training loss leveled off.
Our experimental procedure had three main steps. First, we randomly deleted data from the complete time series to mimic the different missing patterns and with different missing rates. We then split the data into training and testing sets according to the proportions mentioned in Section A. Second, we trained all the frameworks. Third, we used different frameworks to generate imputation results for the testing set and evaluated the frameworks by comparing results with the ground truth data in terms of the evaluation metrics.
All experiments were run on the TensorFlow platform using an Intel Core i7-8700K, 3.60-GHz CPU with 16-GB RAM, and a GeForce RTX 2080 8G.
5. Result and Discussion
5.1. Imputation Performance for Single Missing Pattern
Tables 2 and 3 show the results of the imputations, where MP denotes missing pattern and MR stands for missing rate. From these results, we drew the following observations.(1)Border mean (BM) was quite inaccurate and became less accurate as the missing rate increased.(2)KNN was not effective for imputing missing values with the common-mode pattern because the distance between the samples often could not be measured given the complete loss of all attributes. KNN was able to achieve a result with the univariate-type missing patterns at low missing rates but was sensitive to changes in this rate, and its performance grew worse as the rate increased.(3)LIME-LSTM, with its unidirectional RNNs, did not perform as well the frameworks that contain a bidirectional RNN, i.e., BRITS and RAEF.(4)LIME-LSTM and BSS could not cope with high missing rates. At low missing rates, RAEF and BRITS demonstrated similar performance. However, as the missing rate increases, RAEF performed significantly better than BRITS, especially the LSTM-GR implementation.(5)The LSTM versions of RAEF generally outperformed the GRU versions.(6)Focusing on the common-mode patterns, the RNN approaches clearly show how much greater the impact of this type of missing data is over the univariate brand. For example, looking at BSS with a high missing rate (20%–30%), the performance of all frameworks deteriorates rapidly.
In addition to these basic observations, we also noted some distinguishing performance features when comparing the gated regulator variants of RAEF to the plain version. In general, the gated regulator implementations of RAEF both outperformed the other frameworks within a limited range of missing rates and had obvious advantages under higher missing rates. From 5% to 15% missing rates on UAQ, the percentage increase in MRE over the non-gated versions of RAEF with the common-mode patterns was 7.46% and 23.54%, respectively. With the univariate pattern, this number was 8.57% and 20.23%. We can see the same trend with the BSS dataset. These results suggest that the gated regulator was able to reduce the negative impact of increased missing rates with both types of missing patterns.
5.2. Imputation Performance for Mixed Missing Pattern
Ideally, the missing data in a time series will conform to a single pattern—either univariate mode or common mode. However, there are occasions where both patterns will be present. For this series of experiments, we fixed the missing rates—at 10% for the UAQ dataset and at 20% for BSS. We then simulated the following patterns of missing data in the time series: 100% univariate, 20% common mode (CM)/80% univariate (UM), 40%CM/60%UM, 60%CM/40%UM, 80%CM/20%UM, and 100% CM. Figure 9 shows the results. RAEF-LSTM-GR was the clear performer with significantly better results than the others.
5.3. Task: Imputing Missing Values in an Incomplete Time Series
To more clearly show the importance of data imputation for downstream applications, we undertook a prediction task using incomplete time-series data and compared the results to the same task using imputed data. To approximate different real application scenarios, we performed the tasks with a range of missing rates. More specifically, we prepared versions of the UAQ dataset with missing rates of 5%, 10%, and 15% and conducted three groups of experiments A, B, and C as follows:(1)A: impute with R AIN-LSTMF and then use an LSTM for prediction(2)B: impute with BRITS and then use an LSTM for prediction(3)C: impute with LIME-LSTM and then use an LSTM for prediction
Groups A-C all used the same LSTM for predictions, which contained 64 neurons and were trained with a complete dataset. The difference between the prediction results and the ground truth data was measured using MAE. Each experiment was repeated 50 times, and the results were recorded as shown in Figure 10.
5.4. The Choice of
To assess choice process, we varied the value of . As shown in Figure 11, RAEF-LSTM-GR generally delivered the best performance for each dataset. But, as the missing rate changed, the optimal value of was slightly different. At higher rates, the RAEF-LSTM-GR preferred a larger to obtain more information from the input time series. However, when was too large, performance drops, indicating that the model was affected by an exploding gradient.
This study presents RAEF, an imputation framework for IIoT environments based on a recurrent autoencoder. RAEF identifies the missing patterns in incomplete time series and uses them as a guide to impute the values that are missing. As part of this research, we, for the first time, summarized the missing patterns in incomplete IIoT time-series data. Unlike some other methods, which decay the hidden state, RAEF uses a gated regulator to reduce the negative impact of larger missing rates. Tests on both synthetic and real data with this approach show that RAEF has greater robustness, more flexibility, and returns fewer errors than other state-of-the-art imputation frameworks designed for time-series data.
The BSS and UAQ data used to support the findings of this study are available from the corresponding author upon request.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
This work was supported in part by the National Key R&D Program of China (2020YFB2010901) and in part by the Science and Technology Program of Zhejiang Province (no. 2020C01031).
B. S. Panda and R. Kumar Adhikari, “A method for classification of missing values using data mining techniques,” in Proceedings of the 2020 International Conference on Computer Science, Engineering and Applications (ICCSEA), pp. 1–5, Odisha, India, March 2020.View at: Publisher Site | Google Scholar
M. R. Malarvizhi and A. S. Thanamani, “K-nearest neighbor in missing data imputation,” International Journal of Engineering Research and Development, vol. 5, no. 1, 2013.View at: Google Scholar
I. Arous, M. Khayati, P. Cudré-Mauroux, Y. Zhang, M. Kersten, and S Stalinlov, “Recovdb: accurate and efficient missing blocks recovery for large time series,” in Proceedings of the 2019 IEEE 35th International Conference on Data Engineering (ICDE), pp. 1976–1979, IEEE, Macao, China, April 2019.View at: Publisher Site | Google Scholar
S. Guastello, Nonlinear Dynamical Systems Analysis for the Behavioral Sciences Using Real Data, Routledge, Oxford, England, 2011.
H. G. Kim, G. J. Jang, H. J. Choi, M. Kim, Y. W. Kim, and J. Choi, “Recurrent neural networks with missing information imputation for medical examination data prediction,” in Proceedings of the 2017 IEEE International Conference on Big Data and Smart Computing (BigComp), pp. 317–323, Jeju, South Korea, February 2017.View at: Publisher Site | Google Scholar
L. E. E. Minseok, A. N. Jihoon, and L. E. E Younghee, “Missing-value imputation of continuous missing based on deep imputation network using correlations among multiple iot data streams in a smart space,” IEICE-Transactions on Info and Systems, vol. 102, no. 2, 2019.View at: Publisher Site | Google Scholar
W. Cao, D. Wang, J. Li, H. Zhou, Y. Li, and L. Li, “Brits: bidirectional recurrent imputation for time series,” in Proceedings of the 32nd International Conference on Neural Information Processing Systems, ser. NIPS’18, pp. 6776–6786, Curran Associates Inc., Red Hook, NY, USA, May 2018.View at: Google Scholar
T. Feng and S. Narayanan, “Imputing missing data in large-scale multivariate biomedical wearable recordings using bidirectional recurrent neural networks with temporal activation regularization,” in Proceedings of the 2019 41st Annual International Conference of the IEEE Engineering in Medicine & Biology Society, Berlin, Germany, July 2019.View at: Publisher Site | Google Scholar
N. Anindita, H. A. Nugroho, and T. B. Adji, “A combination of multiple imputation and principal component analysis to handle missing value with arbitrary pattern,” in Proceedings of the 2017 7th International Annual Engineering Seminar (InAES), Yogyakarta, Indonesia, Auguest 2017.View at: Publisher Site | Google Scholar
W. Insuwan, U. Suksawatchon, and J. Suksawatchon, “Improving missing values imputation in collaborative filtering with user-preference genre and singular value decomposition,” in Proceedings of the 2014 6th International Conference on Knowledge and Smart Technology (KST), pp. 87–92, Chonburi, Thailand, January 2014.View at: Publisher Site | Google Scholar