Abstract

The compressors used in today’s natural gas production industry have an essential role in maintaining the production line operational. Each of the compressors’ components has routine maintenance tasks to avoid sudden failures. Hence, the significant advantages and benefits of performing preventative maintenance tasks in time are decreasing equipment downtime, saving additional costs, and improving the safety and reliability of the whole system. In this paper, anomaly classification and detection methods based on a neural network hybrid model named Long Short-Term Memory (LSTM)-Autoencoder (AE) is proposed to detect anomalies in sequence pattern of audio data, collected by multiple sound sensors deployed at different components of each compressor system for predictive maintenance. In research methodology, this paper has conducted experiments that employed different RNN architectures such as GRU, LSTM, Stacked LSTM, and Stacked GRU with various functions to create a baseline for model evaluation. Each architecture used audio signals dataset received from the compressor system for experiments to consider each neural network model’s performance. According to performance results, an optimal model for anomaly detection with the best performance scores has been proposed in this research. Experiments combined one-dimensional raw audio signal features using SC and Mel spectrogram features were fed to deep learning models to evaluate performance. Hence, such hybrid methods can effectively detect normal and anomaly audio signals collected from a compressor system, increasing the compressor system’s reliability and the sustainability of the gas production line. The combination of multiple-resource features in the proposed hybrid model showed a 100% score in all four-evaluation metrics such as accuracy, precision, recall, and F1 in LSTM-based autoencoder in both test and train results.

1. Introduction

Natural gas compressors are mainly used for providing pressure to transport gas in pipelines. The compression system interlocks upstream gas production and downstream consumer use by pressuring natural gas in channels. The compressors used in today’s natural gas production industry have an essential role in maintaining the production line operational. Each compressor unit consists of multiple components and subsystems: an engine or electric motor, a compressor system, crankcase, valve body, one or numerous compression cylinders, cooling system, turbo unit, fan, and water pump. The existence of such components depends on the nature of the compressor. Each subsystem consists of noise pollution contributed by the whole system of each compressor [1]. Each of these components has routine maintenance tasks to avoid sudden failures. Since the installation cost for a compressor is high, loss of the compression system can be costly due to repair costs and lost production economic effect [2]. Hence, the significant advantages and benefits of performing preventative maintenance tasks in time are decreasing equipment downtime, saving additional costs, and improving the safety and reliability of the whole system [3]. There are several approaches to conduct the preventive maintenance tasks in oil and gas organizations for compressor systems, such as time-based and periodic maintenance to check compressors in regular time intervals to perform troubleshooting and avoid potential failures. However, predictive maintenance could be a solution since such components could also face unexpected or unscheduled downtime due to faults that negatively affect gas production. Such a method is based on predicting future failures of such components before occurrence. This approach works with real-time conditions of compressor systems and could reduce costs and unexpected downtime [2]. Each maintenance task is conducted for increasing production line reliability; hence, it is mandatory to keep the efficiency and functionality of gas production lines, which depends on the reliability of each of the components and systems, and compressors are part of it [4]. The technicians’ original and current solution for predictive maintenance tasks consists of traditional approaches. It means that current predictive maintenance tasks use traditional human-based inspection methods and need experienced technicians to listen to compressor sounds and noises by the human ear to inspect the defects manually. Afterward, the technician would report required information of inspection and examination such as the timestamp of the event, the type of identified failure if found, and the timestamp of any conducted maintenance task. However, such a method consists of several disadvantages [5].

For instance, conducting such a traditional inspection method is very time-consuming, inaccurate, and challenging considering that the decibel level of a compressor sound could be louder than a jet engine, making it difficult for technicians to find defects properly. Additionally, the distance of reaching the component is usually far away from the reach of the technicians since the location of compressors and pumping stations is generally out of town. Hence, it needs to introduce an efficient automated approach for diagnosing defects and classifying normal abnormalities appropriately to solve such challenges [6, 7]. Therefore, this research paper aims to develop an automated system to accurately detect and classify normal audio signals and anomaly audio signals for midstream compressor systems. If such methods could classify and detect anomalies accurately, it would significantly reduce the risk of any sudden failure and loss in the production line at last.

Hence, this research paper has been organized as follows: Related studies for conducting the literature review and research contribution are mentioned in the next section. Basic definitions are reviewed in Section 3, methodology has been presented in Section 4, Section 5 shows the experimental results and discussion and the outcomes and would present the conclusions and future work suggested at last.

There is an in-depth literature review for the methods mentioned in similar studies based on statistical, machine learning–based supervised learning, and other deep learning–based unsupervised approaches for anomaly detection. For instance, there are several methods for such classifications, such as traditional statistical methods, which would be considered non-neural network methods. Those methods include K-means clustering, random forest, and machine learning and deep learning methods for anomaly detection [68] and system prognostic [9], prediction [10], classification [11], and system reliability improvement [7]. For instance, random forest (RF) is a supervised learning algorithm in which several decision trees would be constructed while training is in progress [12]. RF would take the classes’ mean to predict the trees as the output. RF would be used for nonlinear regression tasks [13]. As a research example, in Munir et al., several methods and modeling approaches have been mentioned to detect and classify anomalies, which is considered statistical modeling for conducting anomaly classification process [14]. For instance, K nearest neighbor (KNN) method is one of the most commonly used distance-based methods for anomaly detection. In KNN, for every data point in each dataset, K nearest neighbors would be considered to conduct classification tasks [14, 15]. In other words, according to these points, abnormalities in a sequence of time series data could be detected and classified. As presented in several works of literature, a histogram-based outlier score (HBOS) would also be used as a traditional anomaly detection method using an abnormality score for outlier detection [15, 16]. In this method, first, a histogram for the features would be generated, and then the data set instances would be multiplied to the inversed height of the bins of all features. HBOS is a statistically unsupervised method based on histograms. The advantage of using such a method is that it is less computationally expensive than distance-based and clustering-based anomaly detection [16]. In other research, such as Liu et al., isolation forest has been used for outlier detection. Isolation forest is based on density and distance measures. In other words, this approach would isolate normal instances from anomalies. This method would use a binary tree called isolation tree to conduct the anomaly detection task and isolate the abnormal instances [17]. Another method for anomaly detection would be extreme gradient boosting outlier detection (EGBOD). This method is semi-supervised learning for conducting anomaly detection tasks. Extreme gradient boosting is an extension of the gradient boosting algorithm method. This algorithm is defined to reduce the error caused by deviation from generalization [18]. This algorithm could be used for classification and regression modeling problems [13]. Another classification method is support vector machine. SVM was proposed as anomaly detection and classification method. SVM is the supervised learning method that has usage in classification models. For instance, one-class SVM is an unsupervised algorithm that aims to find the maximum margin hyperplane that best separates the data from the center [19]. Other traditional algorithms could be used for performing classification tasks effectively. Another example is the Naïve Bayes algorithm. Such an algorithm is basically could be used for supervised learning. Naïve Bayes [20] can solve classification problems. It could perform fast classification and quick predictions. Naïve Bayes would predict the probability of different classes based on various attributes which have been identified. A Naïve Bayes algorithm could be used in classification problems that consist of several classes [15, 20]. On the other hand, tree-based classification techniques such as decision trees and random forests [13, 21] are simple and very popular methods in data mining problems. They can map nonlinear relationships effectively and efficiently [22, 23]. For instance, in the decision tree, each leaf node is assigned a label according to dataset classes. There are also nonterminal nodes, consisting of the root node and other internal nodes. They contain test conditions for separating instances with different characteristics [15, 23]. Both traditional machine learning and deep learning methods [6, 9] have advantages. However, according to this project specification, scope, and dataset, and because of high accuracy, efficiency, and reliability compared to traditional methods and other machine learning algorithms, deep learning–based anomaly detection methods have been proposed in this research. Furthermore, deep learning technology has grown significantly in the past few years and has achieved many impressive impacts in various research areas. Such areas include artificial intelligence, computer vision, deep medicine [24], pattern recognition, predictive modeling [25], and image and signal processing [26], which are widely used in different areas such as healthcare, energy industry, automation, and manufacturing. Various deep learning and machine learning models have been used to conduct similar tasks in similar research studies for anomaly detection and classification. For example, Kavitha et al. used support vector regression (SVR) for classification. SVR would be used for supervised learning. The problems which SVR could address are regarding classification and regression. The process would be like a support vector machine (SVM) with some changes. In SVR, the appropriate line for fitting the model would be defined based on acceptable predefined residuals [27]. In another example from the deep learning-based model for anomaly detection, DeepAnT is proposed [14, 28]. This model is an unsupervised anomaly detection method used in time series data. This method consists of two modules. The first module is a predictor for the time series data, while the value instances are fed into the anomaly detector module. This module could detect normal and anomalies. The predictor module is based on a convolutional neural network (CNN). Additional to that, FuseAD was introduced based on DNN and statistical methods for unsupervised anomaly detection consisting of two modules. The first module is the forecasting modules used ARIMA [29], and the forecast output would feed into the CNN module along with time series data. CNN is appropriate for analyzing imagery data and has high capability in the feature extraction process [30]. The output of these modules would be augmented by a summation layer at last [14, 28]. ARIMA model, which has been used in the mentioned method, is used to forecast and analyze time series data in linear form. The autoregressive integrated moving average model is an extended model from the ARMA that would use lagged observation to predict observation in time series data. ARIMA [31] is one of the most effective models in machine learning for forecasting time series data and basically would perform regression in previous time steps to predict the next instance [25].

As mentioned regarding CNN models, before applying the proposed method in this research phase, ResNet50 has been used in two datasets collected from compressors as the first phase of this project published previously [30, 32]. In the mentioned study, the ResNet50 convolutional neural network model extracted high-level features from the input data [30, 33]. ResNet50 consists of 50 different layers, including different layers such as max pooling, convolution, and fully connected layers. A ResNet50 is defined as a pretrained network that used ImageNet database. This database consists of more than 14 million images in a wide range of categories [34, 35]. In the CNN architecture which has used ResNet50 as a deep learning layer [34], Mel-frequency Cepstral coefficients (MFCC) have been computed using the input audio signals which created a two-dimensional matrix of features. On the other hand, SC features are obtained from the input audio signals. As the next step, for extracting high-level deep features from MFCC, a pretrained deep learning neural network based on ResNet50 was used. Then, both deep MFCC features and SC features are fed to a principal component analysis (PCA) unit for the final step as feature extraction and reducing the dimensions. The extracted MFCC and SC features are combined to train a support vector machine (SVM) classifier in which could conduct normal and anomaly audio signal classification tasks. As mentioned in this study, MFCC [36] would be defined as one of the familiar data representations from the audio signal which could be used for further processing and feature extraction [30, 32]. In such a feature extraction method, the spectrum of the input audio signal after fast Fourier transform (FFT) is going to be filtered by Mel filters. This would create Mel spectrum, and as the next step, the Cepstral analysis is carried out to the logarithm of the Mel spectrum. At last, Cepstral coefficients earned by the mentioned process are defined as Mel-frequency Cepstral coefficients (MFCC) [37, 38]. Additional to that, PCA has also been mentioned as part of the method. PCA would be used in data compression and dimension reduction. PCA process could keep important information of the data and can convert the data from a higher-dimensional space into a lower-dimensional space. This process could reduce the size and dimensions. A 2-fold cross-validation used to evaluate the performance using datasets for ResNet50 architecture [28, 30]. The mentioned model has been summarized in Figure 1 [30].

In the conducted experiments, two datasets consisting of 2343 raw audio signals in dataset 1 with 156 anomaly audio signals in the OGG format and 7853 audio signals with 1085 anomaly audio signals in dataset 2 have been used for the classification and detection task experiments. Both datasets were very unbalanced. There are many more normal samples than anomaly samples in the dataset. The output of the PCA feature set contains combined deep MFCC features extracted by the ResNet50 model and the SC features. This feature set was then fed to train SVM classifier to classify the audio signals into normal and anomaly classes as mentioned in the method. In the mentioned research [30, 32], experiments demonstrated that combined deep MFCC features and SC features achieved the best performance for normal and anomalies classification for all four-evaluation metrics used in the experiments named precision, recall, accuracy, and F1 [39].

Along with using the mentioned methods and models in anomaly detection, there are also hybrid models based on deep learning for similar research questions. For example, the CNN-LSTM [40, 41] is one of the most used hybrid machine learning models to increase performance, and the capability to make better predictions. LSTM-based model [42] could extract required information in sequence patterns and time series data by its gated architecture [43]. On the other hand, CNN can extract important features by reducing noise and filtering algorithms [30, 44]. Additionally, Convolutional LSTM is a sequential model, based on LSTM. Its internal layers are replaced with convolutional layers and would be able to extract the best features to feed the classifier for better performance in the classification task. As another example, stacked autoencoder (SAE) has been proposed in similar research for unsupervised learning. SAE includes layers such as input, output, and hidden layers in its neural network architecture. In SAE, encoder and decoder would conduct the process of training. In this model, the main steps are including training the input data by autoencoder and feeding the next layer for training until training of the network would be completed, and at last, backpropagation algorithm would be used for minimizing the cost function. Then, layers weights would be updated by training set to conduct fine tuning [45]. The advantages of the hybrid models in neural networks are the capability of such models to overcome the shortcomings of massive and high-dimensional datasets [46]. According to considered similar literature, the hybrid deep learning neural network model to classify and detect anomalies could be an appropriate model and a baseline for predicting mechanical component failures in this paper [9].

2.1. Research Contribution

In comparison to similar literature mentioned, in this research project, sound sensors (e.g., smart microphones) have been widely used in the time series data collection process. Multiple sound sensors are deployed at different components of a compressor system which can be controlled by the Internet of Things (IoT) techniques to collect and transfer the auditory data in a specific period for storing in a database. As the next step, advanced data processing and mentioned deep learning model have been used for audio classification of normal and anomaly signals. The defined deep learning network model needs to be trained on each individual component using normal and abnormal sound signals for anomaly detection and classification [2, 6]. Therefore, in this paper, the main contributions are mentioned as the following points: (i)This study proposes a different hybrid neural network method for applying to collected audio signals to detect anomalies and classify them accurately. Therefore, according to the research problem mentioned in this paper, LSTM-Autoencoder architecture [45, 46] is designed to analyze one-dimensional raw audio signals. In another word, audio signal features consisting of Mel spectrogram and SC features would be extracted from raw data for feeding the network to be trained and using that for anomaly detection in test data to evaluate the performance of the proposed method as well as comparing that with other architectures [31, 32]. Audio raw data collected from compressors audio sensors would be used to conduct experiments on the mentioned model to see whether they could classify the normal and anomaly effectively by using appropriate functions, configuration, and classification method and presenting the result at last and evaluate it with validation dataset to assure that the model is well fitted without any overfitting [17, 33](ii)The proposed hybrid model has been compared with baseline models such as GRU, Stacked GRU, LSTM, and Stacked LSTM to assure that the model performs better than other RNN-based models with different hyperparameters and functions in the same experiments(iii)This study highlights the importance of this research area and bolds the importance of the application of deep learning methods in anomaly detection for time series data in predictive maintenance (PdM). PdM would reduce maintenance costs and to have sustainable operational components such as compressors. The core of PdM is to predict the next failure which could lead to conducting maintenance tasks that can be scheduled before it happens [31]. Such methods are basically based on statistical calculation for estimating the time to failure based on the maintenance data; however, this paper proposed deep learning–based approach which is more accurate(iv)In the previous study mentioned in related studies section for anomaly detection, we have used ResNet-50 with pretrained network and employed MFCC and SC as feature sets for anomaly detection with smaller datasets [30]. This research is a complementary study in which has in-depth study in different RNN variant architectures as baseline models and proposed a hybrid LSTM-AE for anomaly detection as best-scored model at last. This study used bigger dataset and employed Mel spectrogram and SC as feature sets for training the model to evaluate the performance of different RNN models, hyperparameters, and layers in anomaly detection and classification of unseen data using audio signals. Additionally in the previous study, the MFCC features have been used in ResNet50 and then combined with SC feature. PCA applied for dimensionality reduction and SVM employed for classification task. However, in this study, both Mel Spectrogram and SC features fed directly to the network, and in LSTM-AE, the dimension reduction and information gain have been processed within the network and last layer with SoftMax layer conduct the classification task(v)In this study, a hybrid deep learning architecture model based on LSTM-AE has been proposed for the analysis of operational data such as audio signals from the working compressor which could be the basis for increasing the reliability of operational components of compressor systems in a more efficient approach. Such methods could be more accurate, faster in analysis, and more effective for having a more sustainable production line according to performance analysis results in this paper [37, 47]. Therefore, the importance of the application of using the deep learning method could be highlighted in this research rather than using other similar approaches. Time and computational cost analysis have been presented as well for the proposed method

3. Definitions and Notations

This section would describe the terms, features, and functions that have been used in the paper.

3.1. Features: Spectral Centroid and Mel Spectrogram

Spectral centroid (SC) [38] is regarding the brightness of the sound. In another word, SC would show where the midpoint of mass of the spectrum is in the audio signal [48]. In the other hand, in Mel spectrogram, the frequency of the signal would be converted to log scale and amplitudes which could show the spectrogram. Afterward, the frequency domain would be mapped to the Mel scale to shape the Mel spectrogram at last [49, 50].

3.2. Recurrent Neural Network (RNN)

Applying deep learning methods on time series data has been widely used in many areas such as fault detection in manufacturing machines [51, 52]. RNN has a neural network architecture in which it would use time series data for prediction and remembering important information. There are some most used and common RNN cells such as LSTM and GRU [47, 53]. GRU is very similar to LSTM architecture with less complexity and computational time [53, 54].

3.3. Deep Learning Models’ Functions

Different functions and layers in the model have been used in this paper for achieving better performance and tuning [55].

3.4. Activation Functions

Sigmoid function has the sigmoid curve characteristic where is the Sigmoid function and is Euler’s number and the output would be between 0 to 1 which makes it a good choice for prediction [56]:

SoftMax function would generalize the logistic regression which would be used in classification task for multiple classes where is softmax, is defined as input vector and is standard exponential function applying on the input vector, is also defined as a number of classes, and is a standard exponential function which performed on output vector [57, 58]:

3.5. Loss Functions

Categorical cross entropy would be used as loss function in conducting classification in multiple classes where is the model parameters such as weights of the neural network, is defined as the number of data points, is true labels, and is the predicted labels [59, 60]:

The difference between categorical cross entropy and sparse categorical cross entropy [57] loss functions is regarding the format of true labels. For instance, if you use integers in true labels, we can use sparse categorical entropy. Additionally, mean squared error (MSE) would be able to measure the average of the squares of the errors where among data points in true and predicted labels [61].

4. Methodology

In this section, the proposed method for conducting classification and detection tasks as well as baseline methods has been presented for giving an overview of the performance of the experiments. The suggested method is based on LSTM-Autoencoders, and baseline methods are based on other recurrent neural networks such as GRU, LSTM, Stacked LSTM, and Stacked GRU which have similar core architecture as the proposed method to show the qualification and efficiency of LSTM-based autoencoder. Figure 2 shows the step-by-step process in the methodology. The methodology section is then followed by a description of the proposed hybrid model and baseline models architecture in detail.

4.1. LSTM-Based Autoencoders’ Anomaly Detection

LTSM-based autoencoder is a neural network architecture that uses the autoencoder architecture for encoding and decoding a sequence of input data. The encoder-decoder process performs by the LSTM network in LSTM-AE. For a better understanding of the model, we would describe each part of the model such as autoencoder and LSTM separately as follows.

4.2. Long Short-Term Memory (LSTM)

RNN cannot memorize important information from a sequence of data; therefore, an extension of RNN named LSTM could be used. Such network architecture could process an entire sequence of data. Such architecture could have longer memory rather than RNN. LSTM consists of feedback connections which mean the entire sequence data, as well as a single data point, could be processed by the network. And each cell unit of the network could be updated each time. LSTM has been used for part of the experiments of audio data anomaly detection and classification in this research. As it has been mentioned, each cell could remember the past sequence of data and combine the information with the current input sequence. LSTM cell would consist of three gates which are named input gate, output gate, and forget gate. In these gates, the functionality of the network has mathematical representations which are presented in the following equations.

An LSTM network input gate would be responsible for deciding about the information which needs to be transferred to the cell. The following equations which have been defined previously and in related studies would describe its mathematical process starts with describing the input gate [53]:

Forget gate would decide about keeping the required information from the previous memory state. The mathematical equation of this process has been mentioned as follows:

The information in each cell would be updated by update gate in which the following equations show its mathematical process:

The previous time step in the hidden layer would be updated by the output gate. This gate would also update the output of the given data as well. The following equations represent the process:

where represents the input of the cell, while , and , would describe the hidden states and the cell states. The rest of the mentioned variables would describe trainable weights and biases which would be used in the LSTM model [47, 53].

4.3. Autoencoder

AE consists of two modules: encoder and decoder. The data would be fed to the encoder which would learn the underlying features of a process. These features are basically in a reduced dimension. The decoder on the other hand would recreate the original data from these underlying features. The output would have reduced noise and dimensions. Basically, an autoencoder could effectively compress features from high-dimensional data. Employing AE is very popular in anomaly detection problems since AE could significantly increase the accuracy of abnormalities detection which other methods such as PCA would fail to conduct [39]. Another advantage of using AEs is that it is easy to train and does not require computational complexity which other methods like kernel PCA face. Autoencoder architecture could handle large-scale data and conduct feature selection which made it an appropriate choice for applying to this research work domain [46, 62].

4.4. LSTM-Autoencoder

Based on mentioned advantages for LSTM and autoencoders, therefore, LSTM-based autoencoders [46, 63] are defined as a main proposed hybrid model which could effectively conduct feature selection among fed features based on information importance, anomaly detection, and classification task in audio data. Figure 3 gives an overview of the defined hybrid deep neural network architecture [15].

To clarify the process, original one-dimensional auditory data would be used for extracting spectrum centroid (SC) features to feed the model as input to the LSTM-based AE for training, which can maintain information in the state temporarily for a long number of time steps. Mel spectrogram features created using auditory data would be also used as feed input to the LSTM-AE. Such features have been selected based on using combined deep features in similar research which showed the best performance in the same work domain for anomaly detection [30]. The semi-supervised training will be applied to train the model. The process steps overview would be mentioned as Figure 4.

4.5. Anomaly Detection Using GRU and Stacked GRU

GRU is the extended model from RNN which is less complex than LSTM and consists of two gates for using the information in a gated recurrent unit. To train the GRU, original one-dimensional auditory data would be used for extracting spectrum centroid (SC) features to feed the model as input to the GRU. On the other side, Mel spectrogram features created using auditory data would be also used as input to the GRU layers. The semi-supervised training will be applied to train the model. The same process would be applied to conduct experiments in Stacked GRU model architecture. In a stacked GRU, there are multiple GRU layers which are called stacking. The main reason for defining such a model is the ability to conduct greater complexity in experiments, and increasing the layer numbers would increase the ability to conduct anomaly detection and classification more accurately. The process overview would be mentioned as Figure 5.

4.6. Anomaly Detection Using LSTM and Stacked LSTM

To train the LSTM model in this model, original one-dimensional auditory data would be used for extracting spectrum centroid (SC) features to be used as input to the LSTM model which can maintain temporal information from data in each cell for a longer number of time steps and has three gates and more complexity than GRU networks. Mel spectrogram features that have been extracted by using auditory data would be also used as feed input to the LSTM network. The semi-supervised training will be applied to train the model. The process would be also applied to the Stacked LSTM design as well. Stacked LSTM models are deep networks with multiple LSTM hidden layers which have been connected. In such model architecture, the output of an LSTM hidden layer will be fed into the next of the LSTM layer as the input. These stacked layers design would improve the learning capability of neural networks and fit the model performance more effectively [64]. The process for both LSTM and Stacked LSTM is mentioned in Figure 6.

5. Experimental Study

According to proposed deep learning methodologies, this study has used a dataset from the Turbo component of the selected compressor. Turbo is one of the most important components of each compressor system, and its full functionality would be crucial for the production line to be operational. This section would explain the details of the dataset which has been used in experiments on different mentioned models in methodology as well as each experiment specification.

5.1. Experimental Data

The dataset specification and details which have been used in this research are listed as follows.

5.2. Dataset’s Specification

The following dataset collected by Well Checked Systems International from the Turbo component of the selected compressor system has been used for experiments in the proposed anomaly detection methodology. (i)Dataset: This dataset has a total of 12,190 raw audio signals collected from the Turbo component system of the selected compressor in a specific period as time series data with two class labels as normal and anomalies. There are 8559 normal audio signals and 3631 anomaly audio signals. Each signal is 3 seconds long and saved in the OGG format. This is an unbalanced dataset which means that there are many more normal samples than anomaly samples

The normal data has been defined based on the normal working conditions of the Turbo component of the compressor system. The sounds of normal condition of the Turbo component have been used for labeling the normal class. However, on the other hand, the anomalies were the cases that consist of a noise in the working condition audio signals. In this research, two classes of normal and anomaly are defined to detect and classify anomalies; however, the fault reasons and root of the anomaly could be checked by technicians during conducting maintenance task.

To give an overview of data feature representation, the following figures were created to show the anomaly and normal difference in two samples of audio signals in each class. Hence, the selected features represented as SC and Mel Spectrogram are as follows.

5.3. Spectral Centroid Representation

The data representation as a generated image for normal and anomaly sample data from original audio signals for SC is mentioned in Figure 7. The difference between normal and anomaly signals is clear in the log power spectrum and SC representation.

5.4. Mel Spectrogram Representation

The data representation as an image for normal and anomaly samples from the original audio dataset for Mel Spectrogram differences in Mel frequency spectrogram is shown in Figure 8.

5.5. Experiments

Experiment specification and detailed view of steps in the proposed method as well as baseline models have been mentioned in this section.

5.6. Experiment Specification Using LSTM Auto-Encoder

In this research, the network would be trained by 80% of the dataset consisting of both normal and anomaly audio signal extracted features. The remaining data were used for validation and testing. Each of them consists of 10% of the remained data in the shuffled form. SC and Mel spectrogram features have been extracted and have 259 dimensions to feed the LSTM-Autoencoder. In this research, the network consists of 50 units. The designed network input shape consists of several units, several features, and data dimensions.

SoftMax [57] and Sigmoid function have been used as activation functions in the output layer in different experiments. Such functions would work based on the probability distribution of the output to predict the result label. This function has been used for the detection of normal and anomalies, and the final used confusion matrix would show the performance of the network in the test and train dataset at last.

However, to create an optimal model, different functions such as sparse categorical cross entropy for loss function and various activation functions have been tested, and the Adam optimization algorithm for training the deep learning model has been employed to change the required attributes for reducing the losses of the model. The result of experiments on train and test datasets has been presented in the next section.

There are several approaches to validate the model such as splitting the dataset into train, test, and validation. By considering the model loss and accuracy in the train and validation dataset, the model could be evaluated against overfitting and underfitting. The dropout technique has been also employed for avoiding overfitting by dropping out some of the units in the network by a defined rate.

The plots for each experiment’s accuracy and loss and interpretation have been added to each result section for the LSTM model. The audio signal classification and anomaly detection using the LSTM-AE network program was implemented using Python. Additional to that, Keras, NumPy, Pandas, and TensorFlow library packages have been used for implementing the program and building the neural network model. The experiments have been conducted in a MacBook Pro laptop with 2.4 GHz 8-Core Intel Core i9, 32 GB of Memory, and AMD Radeon Pro 5500 M 4 GB for experiments processing.

The model hyperparameters and configuration, as well as model operations and dimensions detailed view, have been mentioned in Tables 1 and 2 for the conducted experiments.

5.7. Experiment Specification Using GRU

The method used extracted Mel spectrogram features and SC features to train GRU network, and different loss functions such as mean square error and activation function such as sigmoid and SoftMax have been tested for this model. To be able to create a baseline to show the efficiency of the proposed hybrid model based on LSTM-AE, the experiment specification process would remain the same as mentioned for LSTM-AE. The model hyperparameters and configuration, as well as model operations and dimension detailed view, has been mentioned in Tables 35.

5.8. Experiment Specification Using Stacked GRU

The method used extracted Mel spectrogram features and SC features to train Stacked GRU to be able to create a baseline to show the efficiency of proposed hybrid model based on LSTM-AE; the experiment specification process would be remained the same as mentioned for LSTM-AE. The model hyperparameters and configuration as well as model operations and dimension detailed view have been mentioned in Tables 6 and 7.

5.9. Experiment Specification Using LSTM

The method used extracted Mel spectrogram features and SC features to train the LSTM network. To be able to create a baseline to show the efficiency of the proposed hybrid model based on LSTM-AE, the experiment specification process would remain the same as mentioned for LSTM-AE. The model hyperparameters and configuration, as well as model operations and dimension detailed view, have been mentioned in Tables 810.

5.10. Experiment Specification Using Stacked LSTM

The method used extracted Mel spectrogram features and SC features to train Stacked LSTM network. To be able to create a baseline to show the efficiency of the proposed hybrid model based on LSTM-AE, the experiment specification process would remain the same as mentioned for LSTM-AE.

The model hyperparameters and configuration as well as model operations and dimension detailed view have been mentioned in Tables 11 and 12.

5.11. Evaluation Metrics

Four evaluation metrics have been defined for evaluating the anomaly detection performance in each of the experiments as well as the proposed LSTM-Autoencoder network for anomaly classification and detection in train and test experiments. The accuracy, precision, recall, and F1 have been used as evaluation metrics [65] which have been defined in the following equations [47]:

In the mentioned evaluation metrics, TP would be considered the number of true positives, TN is describing the number of true negatives, FP is describing the number of false positives, and FN is defined as the number of false negatives. F1 would be a combination of two metrics as and and measures the overall performance of the experiments. In such metrics, if both and are reaching 1, then would reach 1 as well [39, 65]. Since the dataset is imbalanced, hence, using all evaluation metrics is required to see if the model is well fitted based on TP, TN, FN, and FP performances. According to the dataset used in this research, using only one metric like accuracy is not sufficient. Therefore, the F1 score is a reasonable metric for an evaluation model based on precision and recall. The lower number of and could show that the model performs better, and a higher F1 score assures the well-fitting of the employed model [66]. Therefore, using all the four-evaluation metrics could give a better overview of model performance as presented in the results.

5.12. Experiments Results and Discussion

According to performance metrics explained in the previous section, the result of the following experiment has been extracted from different model parameters and configuration which has been presented in the tables as follows for both test and train performance.

5.13. Experiment Results on Train and Test Data

Train dataset performance in different model architectures, as well as various activation and loss functions applying to the mentioned dataset, has been mentioned in Table 13.

The result of performance for LSTM-AE in test dataset as well as other model’s architectures with varied activation and loss functions applied on the mentioned dataset by the trained model has been mentioned in Table 14 for showing the best-scored model summary in all evaluation metrics.

5.14. Model Accuracy

The following plots (Figures 9(a)9(g)) show the model accuracy for the models with acceptable performance.

5.15. Model Loss

The following plots Figures 10(a)10(g) show the model loss for the models with proper results to compare the loss output:

6. Discussion

Several experiments were conducted based on different models for parameter optimization, configuration, and architecture revision such as defining the number of units, layers, number of epochs, dropout rate, and the loss and activation function for the classification layer. As mentioned, such a procedure for tuning the model is an empirical process, and the results are the best-achieved score results for each proposed architecture. LSTM-AE, as a proposed model in this paper, showed the best performance among all the models based on RNN architecture. Although stacked networks based on GRU and LSTM have more trainable instances, however, using LSTM-based autoencoder could increase the performance of evaluation metrics with less trainable parameters. For instance, although Stacked LSTM has more parameters (e.g., 102,502 parameters) than all other models, however, it does not show the best performance as adding autoencoder does. This means selecting the right architecture as well as the right parameters could improve the performance. Figure 11 shows the differences between models in terms of trainable parameter numbers based on their architecture.

Additionally, as mentioned in the result section, there are both train and test summary tables. The reason behind presenting each table is to present a comparison between test and train to ensure its ability to classify the detected labels correctly without any overfitting or underfitting issue. As it is clear in the table, different configurations have different performances in evaluation metrics. For instance, using the mean square error loss function has the least performance in the GRU model, and the sparse categorical entropy has the best performance in all models as well as LSTM-AE which achieved the optimal performance in both train and test which reached 100% for all the evaluation metrics. LSTM-AE performance showed that it could be a proper model to be generalized on unseen data anomaly detection and classification.

Additionally, in the result sections model, accuracy, and model loss have been presented in different plots. Model loss shows a good fit. Since based on the definition of a good fit, training and validation loss shall decrease to a point of stability in the plot with a minimal gap. For instance, final loss values between validation and trained instances should reach near zero. On the other hand, model accuracy plots could be used to see the performance of the model as well as considering whether the model consists of any sort of overfitting or underfitting which in this case the performance for the LSTM-AE showed very well fit. However, GRU-1 and GRU-2 accuracy plots show that these models are not performing as well as LSTM-AE, since in GRU-1 and GRU-2 the validation and train curves are divergent at last.

Computational time and model comparison has been also mentioned in Table 15 which shows the computation time. For such a component with a need for continuous condition monitoring and accurate failure detection, the LSTM-AE computation time would be reasonable due to the high performance of the model, although other models could perform faster. Such computation time is calculated excluding the time for feature extraction time in 50 epochs. The GRU and LSTM-based models with different hyperparameters and configurations have the same computation time and hence categorized as LSTM and GRU in Table 15.

Figure 12 shows an overview of the computational train time.

Figure 13 could give a better overview which highlights the fact that adding AE to the LSTM architecture has an impact on the improvement of performance in an audio dataset in this paper. This model could be generalized for a larger dataset in time series data as the nature of the AE for handling complex datasets.

There is also a need to highlight the fact that the application of deep learning models is performing better in the larger data set and could give better performance in comparison to other learning algorithms such as statistical methods or machine learning algorithms for conducting a similar task. As shown in Figure 14 [67], the accuracy of the performance of deep learning methods increases in larger datasets which is the reason behind employing deep learning models for the current research challenge for anomaly detection in compressors audio signals [67].

At last, to mention the importance of using deep learning models in the predictive maintenance of such systems, there was a similar research project which has employed an LSTM-based encoder-decoder for anomaly detection. In Pankaj et al., a different number of hidden layers and hyperparameters as well as functions for the purpose of anomaly detection are employed for a time series dataset consisting of the Space shuttle, engine, and ECG multisensor data to consider anomalies based on computing likelihood of anomalies [68]. Additionally, Yu et al. also analyzed different RNN-based variants such as PLSTM and BiLSTM for classification based on SVM and estimated the remained useful life of the machines. This research has employed a random search for hyperparameter tuning and bidirectional RNN autoencoders performed better than unidirectional RNN-based autoencoders [69].

To summarize, in our research, we have employed LSTM-Autoencoders with 8 hidden layers which have been fed by Mel spectrogram and SC features extracted from audio signals from the compressor system Turbo component for the task of anomaly detection and classification with SoftMax function. We have also created a baseline of comparison of anomaly detection performance with other RNN variants, and the proposed hybrid model reached a 100% score for all four evaluation metrics. We have also conducted empirical experiments employing different functions and parameters to find the best model configuration for the proposed network.

7. Conclusion

In this research, we have used deep learning neural network models for anomaly detection and classification instead of statistical, machine learning-based, or non-neural network methods which are traditionally used in reliability analysis and predictive maintenance (PdM) analysis and system health and condition monitoring. This paper proposed a hybrid deep learning method for extracting high-level features for feeding the network and a feed-forward neural network in which each cell could remember the important features for the prediction of the output based on LSTM-Autoencoder architecture by minimizing the reconstruction error. The proposed hybrid deep learning–based methods in this research can detect and classify the normal and anomaly audio signals collected from a compressor system effectively. The model has been compared with other baseline neural network architectures based on RNN such as GRU, LSTM, Stacked GRU, and Stacked LSTM. After conducting experiments with different hyperparameters and model configurations, the best performance was based on adding an autoencoder to the LSTM network as a hybrid model. Such a method could be proposed for increasing the reliability of the compressor systems with the ability to detect defects in time.

In the proposed model Mel Spectrogram and SC have been used as features for feeding the network for training. The combination of multiple-resource features and hybrid models showed a 100% score in all four-evaluation metrics such as accuracy, precision, recall, and F1 in the LSTM-based autoencoder in both test and train results as the best-scored model.

This paper could also highlight the fact that employing the deep learning method in predictive maintenance tasks could maintain the safety of technicians and staff by avoiding any dangerous risk of failures for the compressor system components that could harm them. Additionally, it could reduce the risk of failures of the compressor’s system which could increase system reliability at last by in-time anomaly detection.

8. Future Work

Using more balanced datasets as well as increasing dataset size could be a baseline for future research. Additional to that using autoencoder along with LSTM enables the capability to define reconstruction error which could be used as a baseline for defining a threshold for anomaly detectors to generalize and detect unseen anomalies. Alongside applying AE, variational autoencoder could be also used for anomaly detection combined with RNN-based networks. VAE in comparison to AE is working with distribution and reconstructing the probabilities from input data which could be used as a baseline for anomaly detection in low probabilities.

Data Availability

The auditory data used to support the findings of this study have not been made available because of third-party rights on commercial confidentiality.

Disclosure

This research project was performed as part of the employment of the authors as graduate student research assistants and faculty members of Lamar University.

Conflicts of Interest

The research authors declare that they have no conflict of interest about this work.

Acknowledgments

This research is conducted in the Departments of Industrial and Systems Engineering and Computer Science, Lamar University, and dataset provided by Well Checked Systems.