Abstract

Power quality disturbance (PQD) is essential for devices consuming electricity and meeting today’s energy trends. This study contains an effective artificial intelligence (AI) framework for analyzing single or composite defects in power quality. A convolutional neural network (CNN) architecture, which has an output powered by a gated recurrent unit (GRU), is designed for this purpose. The proposed framework first obtains a matrix using a short-time Fourier transform (STFT) of PQD signals. This matrix contains the representation of the signal in the time and frequency domains, suitable for CNN input. Features are automatically extracted from these matrices using the proposed CNN architecture without preprocessing. These features are classified using the GRU. The performance of the proposed framework is tested using a dataset containing a total of seven single and composite defects. The amount of noise in these examples varies between 20 and 50 dB. The performance of the proposed method is higher than current state-of-the-art methods. The proposed method obtained 98.44% ACC, 98.45% SEN, 99.74% SPE, 98.45% PRE, 98.45% F1-score, 98.19% MCC, and 93.64% kappa metric. A novel power quality disturbance (PQD) system has been proposed, and its application has been represented in our study. The proposed system could be used in the industry and factory.

1. Introduction

Due to the increase in technological developments and easier access to products, many technological devices have started to be used widely. These devices consume energy, and this energy is mostly consumed as electrical energy. All these distributed devices connected to the grid can cause PQD [1]. The leading causes of PQD include nonlinear loads, flexible AC transmission devices, variable frequency drives, arc devices, and converters used in power electronics [2]. Ideally, grid voltages and currents should be in a clean sinusoidal form. If disturbing components are included in the network system, power losses and various disturbances occur. In this case, almost all electronic devices, from industrial equipment to electronic household appliances, are adversely affected. Besides, energy providers are badly affected in this case [3]. Understanding the causes of these situations enables action to be taken. For this, classifying the problems that arise is one of the most effective solutions [4].

Pattern recognition (PR) applications for PQD detections have critical importance. Due to machine learning (ML) methods and AI applications that have recently become widespread, this process has become easier [5]. Some studies in the literature divide the PQD classification process into three parts: feature extraction, feature selection, and designing of classifiers [6]. In fact, these three stages are interconnected. In the feature extraction stage, specific situations based on the problem are obtained. Although more effective solutions are used for these stages today, in the past, the feature extraction step was highly dependent on the expert’s experience and statistical capacity. Uyar et al. [7] proposed a wavelet entropy-based feature extraction approach for PQD classification. Jayasree et al. [8] presented a PQD classification framework consist of two steps. In the first step, an envelope detector using the Hilbert transform is used. In the second step, an artificial neural network architecture is used to classify the information from the first step. Reaz et al. [9] used discrete wavelet transforms as a feature extractor. Fengzhan and Rengang [10] used S-transform as a feature extractor, and Janik and Lobos [11] used radial basis function (RBF) networks. Lopez-Ramirez et al. [12] used empirical mode decomposition (EMD) for the classification of PQD data. Most of these methods are generally determined according to the properties of a specific dataset. When these methods can be used with another dataset, their performances decrease. In addition, the dimensions of some features are not suitable for classifiers, or classification takes a long time. For this reason, some researchers use feature selection methods to overcome these problems. After the feature selection procedure, a low-dimensional representation of the problem occurs. The representation power of these less parameterized features remains the same. Lee and Shen [13] proposed a feature selection algorithm named as probabilistic neural network-based feature selection (PFS) for PQD data. Panigrahi and Pandi [14] used the genetic algorithm for feature vector the selection to increase PQD classification performance. Singh and Singh [15] used the ant colony optimization technique to select optimal features. Huang et al. [16] presented a feature selection framework for PQD. Their framework includes entropy-importance (EnI)-based random forest (RF) model for selection process. Feature selection algorithms contribute positively to the success of the classification and reduce the time. But the computation of these methods is very complex, and it is model-sensitive.

The purpose of feature extraction and selection in the PQD classification process is to increase the classification performance. Binary and multiclass classifiers are used together with the mentioned hand-crafted features and feature selection methods. Hidden Markov model (HMM) [17], decision tree [18], rule-based systems [19], support vector machine (SVM) [11], probabilistic neural network [20], ANN [21], independent component analysis (ICA) [22], and K-nearest neighbor classifier (kNN) [23] are the most used algorithms for PQD classification. Studies with these classifiers continued intensely until the deep learning approach became popular. Deep learning methods produce striking results in almost all AI problems. Inspired by its high success in other studies, PQD classification studies have been started with deep learning approaches.

CNN, the most popular deep learning technique, is particularly effective in analyzing two-dimensional matrices and images. The robust feature representation power of the CNN architecture in 2D matrices comes from 2D kernels in the convolution layer, in addition to other factors such as other layers of the architecture and activation functions [24]. Among the current PQD classification studies in the literature, those including CNN are briefly reviewed below. Wang and Chen [25] proposed a closed-loop deep-learning method to classify PQD data. Liu et al. [26] used deep CNN and SVM together to classify PQD. Cai et al. [27] combined Wigner–Ville distribution (WVD) with CNN for the PQD dataset. Deng et al. [28] proposed a sequence-to-sequence deep learning model with bidirectional GRU for PQD classification. Shen et al. [29] proposed improved principal component analysis-guided 1D-CNN for PQDs. Rodriguez et al. [30] presented a convolutional auto-encoder compression framework and a stacked long short-term memory (LSTM). Subudhi and Dash [31] proposed the grey wolf optimization- (GWO-) based extreme learning machine (ELM) algorithm to classify PQD signals with limited data. Bashawyah and Subasi [32] classified five PQDs signals with different machine learning algorithms. Biswal and Dash [33] used the fast dyadic ST algorithm with fuzzy decision tree for power quality disturbance. Khokhar et al. [34] proposed an optimal feature selection algorithm to classify PQDs. Li et al. [35] detected and classified PQDS by using Dag-SVMs with double resolution S-transform. CNN-based studies have superior performance in real samples as well as artificial samples. However, the CNN algorithms are not suitable for datasets that do not contain enough samples[3638].

In this work, an efficient CNN architecture [3941] is presented to classify single or composite PQD problems. For this purpose, the dataset is focused on preventing the CNN architecture from falling into the problem of overfitting. As learned from the literature on this subject, using a relatively large and sufficient dataset is essential for training the CNN algorithm. As the second stage, a high-performance CNN architecture designed for PQD sample classification is presented. The proposed architecture is a GRU-supported linear CNN architecture. The proposed CNN-GRU combination improves the performance compared to its counterparts in the literature. It includes a linear CNN architecture based on the proposed deep learning technique. Fully connected layers (FCLs) in this linear CNN architecture have been removed. Instead of these layers, GRU layers have been added. The contributions of the proposed method can be summarized as follows:(i)Thanks to the proposed GRU-supported CNN architecture; high classification performance is obtained for PQD datasets containing fewer samples(ii)It is faster because it contains fewer parameters than CNN architectures with FCL(iii)Suitable for end-to-end training(iv)It performs more effectively than current state-of-the-art methods

The rest of this paper is organized as follows: Section 2 includes background information of related algorithms and the proposed method. Section 3 provides experimental details and results. Finally, the conclusion is presented in Section 4.

2. Methodology

2.1. The Background in the Deep Learning Models

The concept of deep learning represents a relatively deep understanding of knowledge. In other words, it aims to be as knowledgeable as an expert in a problem. This is possible using deeper networks and the increase in training samples. Additionally, some mathematical interventions are done in architecture. The most widely used deep learning architecture today is CNN [42]. CNN is ideal for analyzing one-, two-, and three-dimensional matrices or vectors. This section focuses on 2D-CNN. The best way to understand CNN architecture is to understand the CNN layers. For this reason, the operation of CNN layers is covered in this section.

The layer most associated with the CNN architecture is the convolution layer. A CNN architecture contains many convolutional layers of various kernel sizes. These convolution layers learn about the features of the problem. Convolution layers are applied to the image by a convolution operation. One of the most significant advantages of sliding a convolution kernel over the image is parameter sharing. In this way, the total number of parameters in the network is reduced. Another layer in a basic CNN architecture is the pooling layer. This layer is usually used after the convolution layer or after the rectified linear unit (ReLU) layer. The most important task of the pooling layer is downsampling. This reduces the total number of parameters while preserving essential features in the matrix. There are types such as max-pooling, sum-pooling, and average-pooling; max-pooling is used in this study. The other essential layer of a basic CNN architecture is ReLU. Generally, one ReLU is used after almost every convolution layer. The task of the ReLU layer is to disrupt the linear structure of the network and to disaggregate the network. The traditional ReLU layer pulls negative parameters to zero. FCL layer is a kind of artificial neural network structure. It consists of many neurons, and all nodes are interconnected. This section is generally used for the classification of extracted features. A CNN architecture consisting of these three basic layers is calculated as in the following equation:where lnext represents the input of the next layer, pool is the pooling layer, n is the pooling window, σ means ReLU function, represents convolution kernel, Din represents the input, and b is the bias. The softmax function is generally used at the end of classifier networks. It calculates the probability distribution for m class output using the following equation:

In addition to these base layers, many new layers have been proposed and continued to be submitted. However, these layers are sufficient for a clear understanding of the CNN architecture proposed in this study.

2.2. Gated Recurrent Unit (GRU)

GRU has emerged to effectively avoid the gradient burst or loss problem in recurrent neural networks (RNN). GRU can be thought of as a simpler version of long short-term memory (LSTM) [4345]. Two gates are defined in a GRU unit: reset gate and update gate. The reset gate, which we will denote with r, puts the new input in cooperation with the previous memory. The update gate, which we will indicate with z, is responsible for protecting the previous memory value. To calculate the transition functions of a GRU, the following equations can be used:where denotes the element-wise product, k represents the dimensionality parameter of hidden vectors, and , and b are the shared parameters.

2.3. The Proposed Method

The proposed method is designed to investigate the effect of GRU layers on CNN architecture. A performance comparison was made between the proposed model with Vgg-16, GoogleNet, and ResNet-50 models. Pretrained CNN networks start learning with a transfer learning approach and are trained shallowly with generated power quality distribution data. These data in the one-dimensional signal format are converted into a two-dimensional matrix format by the STFT method. The obtained data through this transformation contain more features than raw signal data. The feature extraction and classification process have occurred automatically by providing this data input to our proposed GRU-based CNN network. In the training of the network, the number of mini-batches is ten, and the learning rate has been chosen as 0.0001. Dropout is set to 0.2. A high learning rate does not bring success in problem convergence. The learning rate with very low values requires a long training period [42, 46, 47]. A stochastic gradient descent algorithm is used to provide parameter optimization of the proposed CNN model [48].

In the proposed CNN architecture, the GRU structure has replaced fully connected layers. Two-level GRU blocks are included in the proposed CNN architecture. The first GRU block consists of 200 hidden neurons, and the latter GRU block consists of 100 hidden neurons. A dropout layer has been added between GRU blocks to avoid overfitting problems. The proposed method is shown in Figure 1.

The basic architecture of the proposed method is the structure called a block. The block structure consists of one convolution, one batch normalization, and one rectifier linear unit (ReLU) layer, respectively. There are a total of five block structures in the proposed architecture. There are maximum pooling layers between the first four block structures. In the proposed architecture, there are two GRU layers instead of fully connected layers. By means of dropout layers between the GRU layers, the overfitting problem is avoided.

3. Results and Discussion

3.1. Dataset Description

The dataset used within the study’s scope was generated in the simulation environment, and noise levels of 20–50 dB were added. The dataset consists of 7 classes in total and includes 12336 signals. 75% of these data is reserved for training and 25% for testing. The dataset comprises singular and composite power quality defects. While singular defects are in sag, swell, oscillatory transient, flicker, and harmonics, composite defects consist of sag + harmonics and swell + harmonics classes. The presence of different and high levels of noise in the signals made it more realistic and challenging. In Table 1, the number of seven PQD signals containing pure sine waves is given to evaluate the performance of the GRU-based CNN network. Parameter changes were made in accordance with the IEEE-1159 standard [33]. Examples of the PQD signal are shown in Figure 2.

The amount of PQD signals classically is very important. This situation, which is called as data balance, is vital for training and testing CNN networks. The generated dataset contains 7 PQD signal classes in total. There are 1800 pieces of sag, 2000 pieces of swell, 1736 pieces of oscillatory transient, 1600 pieces of flicker, 2000 pieces of harmonic, 1600 pieces of sag + harmonic, and 1600 pieces of swell + harmonic PQD data. Table 1 includes the class distribution of these data.

3.2. Evaluation Metric

Seven classification metrics were used to evaluate the proposed CNN architecture and pretrained CNN networks. These metrics are accuracy (ACC), sensitivity (SEN), specificity (SPE), precision (PRE), F1-score, Matthews correlation coefficient (MCC), and kappa, respectively. In addition, there are terms used in the calculation of metrics. These terms are true positive (TP), true negative (TN), false positive (FP), and false negative (FN). The computations of these metrics are given in the following equations:

3.3. Results

The proposed method was performed with a computer with Intel Core i7-7700K CPU (4.2 GHz), 32 GB DDR4 RAM, and NVIDIA GeForce GTX 1080 graphic card. To compare the performance of the proposed method, a comparison was made with a pretrained CNN network in three different architectures. The importance of this process is to show the performance of the proposed CNN architecture with scratch structure when supported by GRU modules. Also, training charts of each CNN network are shown in Figures 36. Blue lines on training curves show training accuracy, and black lines show test accuracy. Red lines on the loss curve represent education; black lines represent test loss. As can be seen from the graphs, GRU-based CNN architecture showed higher performance. However, the overfitting problem cannot occur in the proposed CNN + GRU architecture. The proposed method quickly begins to converge on the problem during the training process. The availability of a sufficient amount of training data led to high performance in all CNN networks.

According to Figures 46, pretrained CNN networks using transfer learning and the proposed CNN architecture showed very high performance. The main reason for this situation is that there is enough data for training and testing. A comparative analysis is presented for specified CNN models in Table 2.

Table 3 contains the number of parameters of CNN architectures. The number of parameters of the proposed method is very low.

3.4. Discussion

Considering the performance of the proposed CNN architecture and the number of parameters, it stands out more than other pretrained CNN networks. According to the results in Table 2, GoogleNet showed the lowest performance. Performance of this CNN architecture is 96.73% ACC, 96.68% SEN, 99.45% SPE, 96.76% PRE, 96.71% F1-score, 96.17% MCC, and 86.63% kappa. The performance of the VGG-16 and ResNet-50 models is very close to each other. However, the VGG-16 model exhibited higher performance. The metric performance of the VGG-16 model provided 97.83% ACC, 97.80% SEN, 99.64% SPE, 97.83% PRE, 97.81% F1-score, 97.45% MCC, and 91.13% kappa. The proposed CNN model has the highest performance. It achieved better results than other pretrained CNN models in 7 classification metrics. The performance of this model is 98.44% ACC, 98.45% SEN, 99.74% SPE, 98.45% PRE, 98.45% F1-score, 98.19% MCC, and 93.64% kappa. On the contrary, when the number of parameters is evaluated, the proposed method shows a higher performance with fewer parameters. The VGG-16 model has the highest number of parameters, with 138 million. Among the pretrained CNN models, the minimum number of parameters belongs to the GoogleNet model with 7 million. The proposed method includes a very low number of parameters with 245284. The number of updated parameters is a factor that directly affects the training and test processes of the model.

Table 4 includes the results of the state-of-the-art. The lowest performance belongs to Khokhar et al. [34] with 86.86%. The highest performance is obtained with 97.94% accuracy by the FST + fuzzy DT method [33]. These datasets do not contain any noise on PQDs. In our work, PQD datasets include 20–50 dB noise levels. This makes it a challenging problem for AI algorithms. On the contrary, the number of samples is too much than other previous studies. In conclusion, the proposed CNN + GRU algorithm has succeeded the highest performance with 98.44% within Table 4.

4. Conclusions

In this study, a GRU-based CNN architecture is proposed for the identification of PQD defects. A CNN model has been designed that can analyze individual and composite disorders within PQD. PQD signals are formed into a two-dimensional form by applying STFT. Despite the high level of noise in the PQD signals, a significant classification performance has been achieved. In this way, the proposed CNN model maintains its feasibility even in noisy environments and exhibits an adaptive feature. Of course, there are some shortcomings in the algorithm and the study, and it is recommended that researchers who continue to work on this issue should analyze more PQD error classes. It can also measure error times.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that they have no conflicts of interest.