Abstract

Transient stability assessment is playing a vital role in modern power systems. For this purpose, machine learning techniques have been widely employed to find critical conditions and recognize transient behaviors based on massive data analysis. However, an ever increasing volume of data generated from power systems poses a number of challenges to traditional machine learning techniques, which are computationally intensive running on standalone computers. This paper presents a MapReduce based high performance neural network to enable fast stability assessment of power systems. Hadoop, which is an open-source implementation of the MapReduce model, is first employed to parallelize the neural network. The parallel neural network is further enhanced with HaLoop to reduce the computation overhead incurred in the iteration process of the neural network. In addition, ensemble techniques are employed to accommodate the accuracy loss of the parallelized neural network in classification. The parallelized neural network is evaluated with both the IEEE 68-node system and a real power system from the aspects of computation speedup and stability assessment.

1. Introduction

In recent decades, dozens of large power blackouts have occurred. Loss of stability has been widely recognized as the most critical factor that leads to power system collapse. Meanwhile, modern power systems are exposed to higher risks than ever before due to the increasingly stressed operation conditions caused by renewable energy penetrations, electricity market gaming, insufficient awareness technique, and shortage of investments [1]. These situations consequently reduce the dynamic stability of power systems when the severe disturbances occur.

Transient stability assessment (TSA) is an effective resort to evaluate dynamic security under various operations in control centers. To facilitate TSA, machine learning technologies have been widely applied in the past two decades, which is well summarized in an early literature [2]. Most of the existing works of the transient stability identification are focused on binary stable state prediction using clustering and classification. For example, Support Vector Machine, Decision Tree, and Artificial Neural Network (ANN) are the widely used approaches to detecting instability of power systems by using postfault trajectories within a few cycles [35]. On the other hand, a few of machine learning techniques have been investigated to enable dynamic coherency identification of power systems, providing critical information for system equivalents [6], islanding control [7], and area detection [8]. But coherency analysis has limited ability to determine the most disturbed units, which may lead to the eventual desynchronization.

Besides awareness of globally stable status, it is important for emergency control to understand which generator or group of generators have a tendency of desynchronization. Traditional stability predicators cannot point out the leading units while the coherency-based classification needs a longer time window to observe perturbance trajectories. The most feasible solution is to establish a set of trained predictors for each generator to enable individual identification [9]. But it is admitted that it is computational intensive due to the fact that a power system normally has hundreds of generators, which generate massive volumes of data. Few machine learning techniques have considered the impact of the critical unstable generators (CUGs) in TSA of power systems. As a result, it has become a challenge for standalone machine learning techniques running on single computers to deal with TSA taking into account the impact of massive CUGs [10]. For this purpose, the application of high performance computing techniques has become a necessity.

This paper presents HBPNN, a high performance back propagation neural network using MapReduce computing model. Hadoop [1113], which is an open-source implementation of MapReduce, is first employed to parallelize the neural network. The parallelized neural network is further enhanced using HaLoop [14] to reduce the computation overhead incurred in the iteration process of the neural network. In addition, ensemble techniques are employed to maintain high accuracy in classification when datasets are split into small data chunks and processed in parallel nodes. The parallelized neural network is evaluated with both the IEEE 68-node system and a real power system from the aspects of computation speedup and stability assessment.

The rest of the paper is organized as follows. Section 2 discusses the related work about the application of machine learning techniques for TSA. Section 3 presents in detail the design of HBPNN. Section 4 evaluates the performance of the parallelized neural networks and analyzes the experimental results. Section 5 concludes the paper and points out the future work.

As wide area monitoring systems (WAMS) are now being deployed in large number of power systems, phasor measurement unit (PMU) is playing an ever increasingly vital role in dynamic security assessment [15]. A number of researches have been carried out to assess transient stability using PMU data. Among these research efforts, PMU trajectories based indicators are considered as efficient estimators to understand dynamic behaviors of power systems, especially in severe disturbances. For example, Alvarez and Mercado proposed seven trajectory based indices, which are suitable for fuzzy inference on real-time dynamic vulnerability [16]. Furthermore, Makarov et al. [17] presented a review on PMU-based security assessment offering a clear roadmap for further development.

Machine learning techniques have been widely employed for instability detection or stability margin estimation. However, few studies have been carried out for TSA by identifying CUGs in power systems due to massive volumes of data generated from the large number of the CUGs. For this purpose, this paper employs back propagation neural network (BPNN) to identify CUGs in a timely manner.

BPNN has proven to be effective in classification due to its gradient-descent feature that results in its remarkable function approximation. However, large-scale data processing brings a significant challenge to BPNN in computation. Rizwan et al. [18] employed a neural network on solar energy estimation. It is admitted that the large volume of data makes the data processing an extremely complex task, which affects the training efficiency severely. Wang et al. [19] pointed out that large-scale neural network becomes one of the mainstream tools for processing massive data. Al-Masri et al. [10] also applied adaptive neural network to evaluate stability for every single generator, aiming at providing more detailed stability information. But real power systems usually have hundreds of generators. It is admitted that standalone neural networks running on single computers can hardly handle the problem in a reasonable time.

In order to speed up the efficiency of BPNN, distributed computing technologies have been employed [2022]. Gu et al. [23] presented a parallel neural network using in-memory data processing techniques to accelerate neural network. However, in their work the training data is simply segmented into data chunks without considering accuracy loss. Liu et al. [24] presented a MapReduce based parallel BPNN in processing a large set of mobile data. This work further employs AdaBoosting algorithm to accommodate the loss of accuracy of the parallelized neural work. Although AdaBoosting is a popular sampling technique, it may enlarge the weights of wrongly classified instances, which would deteriorate the algorithm accuracy. Another major limitation of this research lies in that it does not consider the high overhead of Hadoop in dealing with input and output files in the iteration process.

To solve the issue of processing large-scale data using BPNN in power system for stability analysis especially for identification of CUGs, the presented work in this paper employs HaLoop to reduce the high overhead incurred in computation iterations. It also proves feasibility of MapReduce based high performance neural network on efficient stability assessment, providing a general tool to parallelize the machine learning algorithms to facilitate coordinated training to a large number of generators.

3. The Design of HBPNN

3.1. BPNN

BPNN has been proved to be effective in classification. It employs feed-forward and back propagation mechanisms to train the parameters of the network.

In the feed-forward phase, let(i) denote weight from th neuron to th neuron,(ii) denote bias for varying the activity of the th neuron,(iii) denote output of the th neuron from last layer,(iv) denote output of the th neuron of the current layer,(v) denote input of the th neuron in hidden and output layers.

Therefore, can be represented by

In the neuron, the nonlinear equation is sigmoid function; therefore the output of the th neuron from the current layer to next layer can be represented by

The output layer finally outputs its . The feed-forward phase is completed.

In the back propagation phase, let(i) denote the error-sensitivity of certain layer,(ii) denote the desirable output of neuron in the output layer,(iii) denote error-sensitivity of one neuron in the last layer,(iv) represent corresponding weight of .

Therefore, in the output layer and in the hidden layers can be represented by

The weight and bias can be tuned, where denotes the learning speed:

The back propagation phase is completed. Afterward, a second round of training starts. BPNN terminates if (5) or (6) is satisfied or a certain number of iterations has been reached.

For executing a classification task, a trained BPNN only needs to execute the feed-forward phase. The classification result can be achieved from the output layer of the network.

3.2. Time-Domain Simulation

The time-domain simulation of power system is modeled by means of differential algebraic equations (DAEs); the details of the model can be found in [25]. The outputs of the simulation, which are the status trajectories, can be utilized as the simulated PMU data for further analysis. In this study, an open-source package PST [26] is employed to simulate dynamic trajectories of concerned parameters for random faults in a certain interval of cycles.

3.3. BPNN Based Transient Stability Assessment

If a power angle difference between any two generators and exceeds a specified threshold, for example, 270 or 360 degrees, the status of the system is considered as unstable. Alternatively, the criterion using the center of inertia (COI) is usually applied to identify power system stability, which is expressed aswhere and represent rotor angle and inertia constant of generator , is the sum of , is the number of generators, and is instability threshold which is defined as 180 degrees in this paper.

The training phase of BPNN based TSA is illustrated in Figure 1.

In Figure 1 are the inputs of the network. The output is usually an integer value with 0 indicating instability while 1 indicates stability. After the training process is accomplished, if a fault occurs, the features obtained from a few cycles of the postfault trajectories will be fed into the trained network to extrapolate stability status within the subsequent several seconds. The majority of the existing works focus on improving accuracy of global stability prediction by improving the standalone BPNNs [8] as well as novel input features [27]. However, the stability margin, a value quantifying how far the current condition is from the loss of synchronization, is a crucial indicator that enables a clearer awareness of the dynamic impact level.

In this work, two trajectory based stability margin indicators, TSI and IS [28], are used as training targets, which are given as follows:where is the maximal power angle difference between any generator pairs during the period of and is power angle of generator at time point .

Although there exist a wide range of features in previous works, most of them share similar parameters. According to these studies, the combination of these features can achieve an adequate accuracy of stability prediction. Moreover, these features not only are related to stability status but also contain the inherent information of stable margins. Therefore, the same set of input features is selected for BPNN training.

3.4. CUG Identification

CUGs are defined as the first group of the generators whose rotor angle is different from the rest of the generators exceeding a given threshold. Actually, CUGs are the most potential candidates of generator tripping that can be utilized to reduce transient power mismatch in a timely manner [29]. Figure 2 shows the power angle trajectories of different CUGs in the IEEE 68-node testing system.

The unstable generators belonged to the CUGs, because their leading (or lagging) rotor angle against other units must exceed the given threshold which is usually set to be equal to or little smaller than the wide-accepted instability criterion. For example, Figures 2(a) and 2(b) illustrate rotor angle trajectories of the CUGs, which also contain all the unstable generators. In this situation, all the generators are determined as unstable ones at the end of observation time window, 150 cycles. But, before that, none of the generators reaches the CUG threshold criterion. Therefore, the strict two-cluster instability pattern corresponds to the situation that all the generators are CUGs, such as the case of Figure 2(d). However, unlike Figures 2(a), 2(b), and 2(d), Figure 2(c) offers the different pattern in which the CUGs only are part of unstable units. Although it belongs to the leading cluster, ahead of other leading generators, the two generators indicated in Figure 2(c) meet the CUGs identification criterion at the very beginning of time windows. These two units are considered to be the most effective objects for the further control strategy.

For this purpose, the cycles of postfault rotor angle trajectories are clustered to identify CUGs from unstable generators, which are used as the target outputs of BPNN in the training process:(1)Execute five seconds’ time-domain simulation for a permanent fault followed by a clearing action; then collect the output rotor angle trajectory of each generator.(2)Scan any two rotor angle trajectories cycle by cycle from the initial point of postfault duration. If there is an angle difference exceeding critical unstable threshold, the power system is considered to be critically unstable; meanwhile, record this time point .(3)Extract rotor angle trajectory for each generator, where refers to CUG validation interval. However, if taking as a relatively long period, such as 3 s, it is almost not possible to distinguish them from the subsequent unstable generators. According to the experience, is preferably set to be 50 cycles, that is, 1 s.(4)Perform k-means clustering to divide all trajectories into two groups. Then calculate the COI trajectory of the clustered rotor angles for each group with time interval using (8).(5)If the following constraint cannot be satisfied, the generators contained in group which breaks (10) are tagged as the CUGs with a binary integer of 1.

Following the above identification procedure, the CUGs of the 16-machine testing system illustrated in Figure 2 can be indicated as shown in Table 1.

In Table 1, the CUG status is tagged by using the binary values, one means CUG, and zero means non-CUG.

3.5. Parallelizing BPNN
3.5.1. MapReduce, Hadoop, and HaLoop

MapReduce is a distributed computing model in enabling big data processing. The model supplies two types of functions: Map and Reduce. Map operates the mapping functions for major computing tasks while Reduce operates the collecting and outputting operations. The data in the processing flow is modeled using (key -value ) pairs. Map processes each input key-value pair and outputs intermediate output . Reduce collects the output pairs with the same keys and executes merging, shuffling operations. Finally, Reduce outputs the final results .

Hadoop framework is an open-source implementation [11] of MapReduce. The framework offers scalability, fault tolerance, load balancing, and a series of benefits for parallel and distributed computing in both homogeneous and heterogeneous environments. HaLoop [14] is also based on MapReduce and reuses most of the source code of Hadoop but facilitates data intensive applications with iterations.

3.5.2. Bootstrapping and Majority Voting

Bootstrapping is a kind of sampling algorithm [30]. Benefiting from sampling with replacement, the bootstrapped samples are able to simulate the sample distribution of the original dataset. Therefore, in our parallelization work, although the original training dataset is divided into subsets, due to the employment of the bootstrapping, the generalization of the trained neural network can be maintained to some extent. Majority voting is able to indicate the major element from a dataset based on voting. It enables HBPNN to create a strong classifier using a number of weak classifiers so that the classification accuracy can be maintained.

3.5.3. HBPNN Design

Motivated by the previous work of MapReduce based BPNN proposed by Liu et al. [32], the algorithm contains two phases including the generation of the bootstrapped samples and the parallelization of the BPNN. Initially, HBPNN inputs the original training dataset and generates a number of bootstrapped samples according to the number of mappers employed. Each sample is saved in one data chunk in the HDFS. The data structure for each saved training instance in the data chunk is defined as below:,where instancei represents the th instance in a data chunk; represents the th class that instancei belonged to; instancetype field is filled a string “training” to inform the algorithm that instancei is a training instance.

Afterward, the parallelization phase starts. Each mapper firstly initializes the BPNN algorithm and then inputs one data chunk. Therefore the instances saved in the data chunk can be finally input into the mapper one by one. If the instance type is “training,” the BPNN in the mapper starts the training phase using the instance. In this case, instancei is employed to execute the feed-forward phase using (1) and (2) while is employed to execute the back propagation phase using (3) to (4). As long as all the instances marked as “training” have been processed, the BPNN has been trained. As a result, a number of trained classifiers (mappers) are created in the Hadoop cluster.

In the classification phase, each testing is input into all mappers. In each mapper, is classified by the BPNN using (1) and (2). And then the mapper outputs an intermediate output in form:,where denotes the classification result of of one mapper, so that mappers output outputs.

HBPNN starts one reducer to collect the intermediate outputs from mappers. After sorting and merging, a collection which contains classified results for the instancet is formed.

Inside the collection, majority voting is executed to select the final classification result which is ultimately output in the form of,where result represents the final classification result. The pseudo code of HBPNN is shown by Algorithm 1.

Algorithm 1 (HBPNN).

In the training phase(1)HBPNN generates a number of bootstrapped training samples which are saved in data chunks in HDFS.(2)Each data chunk is input into one mapper.(3)Each mapper initializes one BPNN.(4)For each mapper:BPNN inputs one instance .If is a “training” instanceBPNN trains its parametersUntil all the training instances are processed.

In the classification phase(5)For each testing instance :All the mappers input .BPNN in each mapper executes feed-forward to classify .Each mapper outputs .(6)One reducer collects the classified results of from all mappers.(7)In the reducer, a collection of is formed:(8)Majority voting is executed in the reducer to select the ultimate classification result for .(9)Until all the testing instances are classified, algorithm terminates.

3.6. Feature Selection

Assume that a PMU has been deployed on each generator bus; full parameter trajectories of generators as well as related indices proposed in previous literatures can be introduced as features. However, many features are strongly correlated with others. Therefore, the Pearson correlation coefficients (PCC) method [33] is used to reduce the redundancy of statistical index-based features. Any two features and satisfying condition are regarded to be highly correlated. Tables 2 and 3 illustrate the selected features fed to train HBPNN for the CUGs and global stability, respectively. Specifically, the size of the time window used to observe features is from the fault clearing time to the following 10 cycles represented as .

Beside the referred features, Tables 2 and 3 also include two defined indices, and , which can be formulated as follows:where and represent rotor speed of generator and COI at the time point , respectively, is the time point of fault clearing, and represents the time window used to observe the features.

3.7. Automated Sample Generation

In this work, a random fault simulator has been developed to generate massive samples [34]. Random fault refers to stochastic three-phase short circuits of any transmission lines. In addition, fault clearing time is randomly set to 0.1 s to 0.35 s. The samples generation is listed as below:(1)Load base case: if the initial outage exists, trip the component and calculate power flow.(2)Change and on each bus by multiplying a random number in the range of [0.8, 1.4] to simulate the load level, distributing unbalance load to all the generators in proportion to their base generation.(3)Implement three-phase fault on a randomly selected component at time , and clear fault at , where is a random decimal in [0.1, 0.35].(4)Perform time-domain simulation for the above randomly configured operation and fault scenario, and collect output trajectories to calculate features defined in Tables 2 and 3 as well as the related targets.

3.8. The Architecture of HBPNN

After random faults simulation is accomplished, the entire samples are stored in HDFS. HBPNN separates the training data into pieces and employs bootstrapping to generate bootstrapped samples. Each piece is saved in one data chunk. And then HBPNN initializes distributed neural networks in multiple mappers. These networks can be categorized into three types, the CUG identification, stability assessment, and margin assessment. Afterwards, each mapper inputs one data chunk and executes the training for the large-scale input data. As long as the stability, margin, and CUG networks are sufficiently trained, they can be utilized as the enhanced classifiers of TSA. When the testing data is fed into HBPNN, the parallel neural network can efficiently classify each instance and output into its final classification. Figure 3 shows the overall architecture of HBPNN.

4. Experimental Results

4.1. HBPNN Validation

In order to evaluate the performance of HBPNN, a number of experiments have been carried out in a physical Hadoop computer cluster with 1 Gbps network bandwidth. The cluster contains five nodes, in which 4 nodes are DataNodes and the other one is NameNode. The deployed frameworks are Hadoop and HaLoop. In addition, the cluster configurations and details of the generated dataset are listed in Tables 4 and 5, respectively.

As each input in input layer of HBPNN only accepts the value between 0 and 1, each instance is normalized before inputting into HBPNN. For one instance , let , , and denote the maximum element, minimum element, and normalized , respectively, and then

The precision can be calculated using where and represent the number of correctly classified and wrongly classified instances, respectively.

4.1.1. Precision Validation

In the experiments 1000 training instances and 1000 testing instances were generated. Ten mappers were employed and each of them processed the training instances varied from 10 to 1000. Figure 4(a) shows that the accuracy of HBPNN increases with an increasing number of training instances. Figure 4(a) also indicates that when the number of training instances is small, the HBPNN based on bootstrapping sampling outperforms the original BPNN in terms of accuracy.

Figure 4(b) shows the stability of HBPNN in processing small numbers of training instances for five times. This experiment focuses on the algorithm stability. In the tests, HBPNN and the original BPNN were trained by only ten instances. Although a low number of training instances leads to low accuracy, the results show HBPNN is more stable than BPNN in all the five cases. And even with such a low number of the training instances, HBPNN can also give higher accuracy than the standalone BPNN.

4.1.2. Computation Efficiency

A number of tests were conducted to evaluate the efficiency of HBPNN in computation using Hadoop and HaLoop, respectively. It can be observed from Figure 5(a) that, along with an increasing size of data, the parallel HBPNN performs faster than the standalone BPNN. It is worth noting that the HaLoop based HBPNN is slightly faster than the Hadoop based HBPNN due to the reduced computation overhead in dealing with iterations which is further illustrated in Figure 5(b).

4.2. HBPNN Application

HBPNN was applied in two power system cases. The first case is a 68-node testing system including 16 generators. The second case is a real power system of Sichuan Grid in China, which has 878 busbars, 1096 lines, and 109 generators. The details of the data samples are listed in Table 6. The configurations of HBPNN are shown in Table 7.

In this evaluation, the algorithm precision of the generators status prediction is tested. In terms of precision, when the number of training instances is large, the presented algorithm HBPNN has the same precision compared to that of the standalone HBPNN. Therefore, Figure 6 only lists the precision of the HBPNN without comparison with a standalone HBPNN algorithm.

Figure 6 recording the CUGs predicting precision of test systems indicates that HBPNN is of satisfactorily high precision in identifying the generators transient status during the postfault trajectories of the power system. The average precisions for all generators of the two test systems are 99.19% and 98.63%, respectively.

In order to validate the feasibility of HBPNN in these two cases, 2400 new samples including random multiple faults scenarios were simulated for each testing system. The details of the sample sets are shown in Table 8.

Figure 7 shows the two example scenarios of the Sichuan grid in the status of stable and unstable cases, respectively. The features related trajectories in 10 cycles were fed into the trained HBPNN, which is able to quickly provide predicted values of the concerned targets. Table 9 shows that HBPNN accurately classifies the two scenarios. In addition, Figure 8 illustrates the accuracy of HBPNN of processing 2400 samples generated by the respective testing systems. It can be observed that the accuracy of the algorithm is more than 90%.

Figure 9 shows that the parallel HBPNN is more efficient than the standalone BPNN in the two testing power systems when the size of data samples is large as shown in Figure 9(c). However, the parallel HBPNN is slower than the standalone BPNN when the size of data is small as shown in Figures 9(a) and 9(b) due to the fact that both Hadoop and HaLoop have extra system overheads. Nevertheless, the HaLoop parallelized HBPNN is always faster than the Hadoop parallelized HBPNN due to the reduced computation overhead in dealing with iterations.

5. Conclusion

In this paper we have presented HBPNN, a high performance distributed neural network algorithm for fast stability assessment in power systems. HBPNN is designed using Hadoop to train large-scale training data in parallel to speed up the training process. It further employs HaLoop to reduce the iterative overhead that occurred in the training process. HBPNN also employs ensemble techniques to maintain high accuracy in parallelized classification. The work in this paper is able to establish highly scalable computing architecture to enable comprehensive transient stability awareness technique, including global stability prediction, stable margin estimation, and CUGs detection.

Competing Interests

The authors declare that there are no competing interests regarding the publication of this paper.

Acknowledgments

This work was supported in part by the National Natural Science Foundation of China (NSFC Project, nos. 51207098 and 51437003).