#### Abstract

To locate the fault location accurately and solve the problem quickly is the key to improve the power supply capacity of power grid. This paper presents a fault location method based on SVM fault branch selection algorithm and similarity matching. Firstly, an SVM-based fault branch filter classifier was constructed based on the positive sequence component feature matrix data of each monitoring point, which can accurately select the branch where the current fault is located. Then, based on the positive sequence voltage distribution characteristics, the Euclidean distance and Pearson correlation coefficient (PCC) are used to establish the similarity objective function of fault location. And then, the fault is accurately located by the objective function. Finally, the proposed method is validated by using an IEEE-14 node network. The results show that the proposed method is effective and accurate.

#### 1. Introduction

Fast and accurate location of distribution network faults can effectively reduce the time of troubleshooting and blackout, reduce economic losses, and improve power supply reliability [1–5]. However, the distribution network has many branches with long and short lines and few measuring points, and it is difficult for protection to locate fault points accurately. Therefore, the method of locating fault areas and cooperating with manual patrol are mostly used in practice. With the improvement of automation and intellectualization of the distribution network in Jiangsu power grid, a large number of intelligent monitoring equipment for the distribution network have been put into use, such as FTU (feeder terminal unit) and PMU. These devices can measure the voltage and current of the distribution network in real time and provide data basis for accurate fault location [6, 7]. Therefore, the research in this paper is based on the fact that each node is equipped with monitoring instruments.

At present, scholars have done a lot of work on the accurate fault location of the distribution network. The main fault location methods are mainly divided into the traveling wave method and nontraveling wave method (including impedance method, node matrix method, and fitting optimization method). The traveling wave method [8–10] determined the fault distance by measuring the propagation time of voltage and current traveling wave to the fault point. However, due to the uneven distribution of line impedance and too many branches in the distribution network, the location accuracy of the traveling wave positioning method was greatly affected. Additionally, the monitoring devices required for traveling wave positioning are expensive, and the practical application of engineering is difficult. The impedance method [11] calculated the impedance of the fault branch by measuring the voltage and current at the fault point, and then calculated the fault distance. Like the traveling wave method, the multibranch of the distribution network affected the accuracy of the method. This method is widely used in the high voltage transmission network. Reference [12] constructed a fault distance function based on the node impedance matrix by using the information of bus voltage drop during fault, and used the matching degree between the calculated value of voltage drop and the measured value to locate the fault, but the robustness to load change was poor and susceptible to transition resistance. In reference [13], the least square method was used to fit the fault distance distribution function, but the error between the fault distance calculated by the fitting distribution function and the actual fault distance was large. The accuracy of the algorithm needs to be improved. In reference [14], the distribution function reflecting the change of fault distance was established by the node impedance matrix after fault, and the fault distance was determined by solving the corresponding location equation, but the location results are easily affected by the false fault point and measurement error. Reference [15] has been improved on the basis of reference [14]. Considering the influence of measurement error, the probability distribution curve of possible fault section was obtained by Monte Carlo simulation, and the fault location was determined according to the peak value of probability distribution. However, the precondition of this method is to identify the fault type and phase accurately, and the simulation calculation is too large to be applied in large-scale power grid.

In summary, most of the existing location methods either fail to overcome the influence of transition resistance on the location results or need to know the fault branch and fault type to locate, or the location results are greatly affected by measurement errors and fake fault points or need to traverse all locations of the network to search fault points, which results in a huge amount of calculation when the system scale is large.

To solve these problems, a fault location method based on the SVM fault branch selection and similarity model matching was proposed in this paper. The method utilized the distribution characteristics and their interrelationships of positive sequence voltage variations at each monitoring point to construct fault feature modes that were not affected by fault types and transition resistance. A fault branch selection database was established based on the simulation data and use this data to train SVM-based fault branch selecting classifier. And this SVM classifier is used to determine the branch where the current fault is located. Then, the fault location models of each branch were established with the fault distance parameter *λ* as the only variable. The concept of similarity index was defined based on the Euclidean distance and Pearson correlation coefficient (PCC) to measure the similarity between the current fault and the fault location model. When the similarity index obtained the optimal solution, the accurate fault location results *λ* can be obtained. The validity and accuracy of this method were verified by an example.

#### 2. Fault Characteristics Based on Positive Sequence Voltage Variation

The location, type, and transition resistance of faults are the three variables that determine the voltage of each monitoring point. The schematic diagram of failure in the system is shown in Figure 1.

In this paper, the symmetrical component method is used to decompose the positive sequence component, and the positive sequence component is used to analyze the fault electrical characteristics. In three phases, A phase is 120 degrees ahead of B phase, B phase is 120 degrees ahead of C phase, C phase is 120 degrees ahead of A phase, and the components with the same amplitude of three phases are called positive sequence components. The positive sequence driving point impedance between monitoring point *M* and fault point *F* can be calculated from the impedance matrix of positive sequence nodes, as shown in the following formula:

In formula (1), is the positive sequence driving point impedance between monitoring point *M* and node A. is the positive sequence driving point impedance between monitoring point *M* and node *B*. Superscript (1) indicates positive sequence.

Assuming that the positive sequence voltage before system failure of monitoring point *M* is , the positive sequence component of short-circuit current at fault point *F* is . According to the superposition theorem, the positive sequence voltage after system failure of monitoring point *M* can be expressed as

In formula (2), is the positive sequence voltage after system failure of monitoring point *M*. is the positive sequence voltage before system failure of monitoring point *M*. is the positive sequence driving point impedance between monitoring point *M* and fault point *F*. is the positive sequence component of short-circuit current at fault point *F*.

Positive sequence voltage variation at monitoring point is defined as

It can be seen from the formula above that the change of positive sequence voltage at the monitoring point is only related to the positive sequence impedance between the monitoring point and the fault point and the positive sequence fault current. The positive sequence impedance between the monitoring point and the fault point represents the relative position information between them. For any fault position in the branch, the positive sequence impedance between the monitoring point and the fault position corresponds to it one to one and is not affected by the fault type and transition resistance.

It is assumed that there are two monitoring points in the system and two faults take place at point F successively, as shown in Figure 2. The fault type and transition resistance of fault 1 and fault 2 are different.

According to formula (3), it can be obtained that

In formula (3), is the positive sequence voltage variation at monitoring point *M*_{1} when fault 1 occurs at *F*. is the positive sequence voltage variation at monitoring point *M*_{2} when fault 1 occurs at *F*. is the positive sequence voltage variation at monitoring point *M*_{1} when fault 2 occurs at *F*. is the positive sequence voltage variation at monitoring point *M*_{2} when fault 2 occurs at *F*.

From the formulas above, it can be seen that the ratios of positive sequence voltage variations of *M*_{1} and *M*_{2} at monitoring points are equal under two different faults. The characteristics of positive sequence voltage distribution of faults can be generalized: no matter the type and the transition resistance of faults, the positive sequence voltage variations of each monitoring point vary proportionally when different faults occur at the same location. That is to say, the change of positive sequence voltage is only related to the fault location.

#### 3. Fault Location Method Based on SVM Fault Branch Selection and Similarity Matching

##### 3.1. Fault Distance Model of Each Branch

Based on the above analysis, a fault location model based on network monitoring point information can be established according to the change characteristics of monitoring points. Assuming that the first and last nodes of the branch where the fault is located are *l* and *r*, the positive sequence voltage variations of all monitoring points can be obtained according to formula (1) and (3). The sequence of positive sequence voltage variations of monitoring points is as

and represent the positive sequence impedance from each monitoring point to the head node and the end node of the branch, respectively.

The calculation formula of positive sequence short-circuit current is standard positive sequence voltage before fault divided by positive sequence driving point impedance at fault position:

In formula (6), , the positive sequence driving point impedance at fault position *F* can be expressed as follows:where and are the sequence driving point impedances at nodes *l* and *r*, respectively, is the sequence driving point impedances between nodes *l* and *r*, and is the sequence line impedances between nodes *l* and *r*.

From the data information uploaded from the monitoring point and the network topology parameters, it can be seen that the fault location *λ* is the only variable in formula (5)–(7).

In order to avoid the influence of data amplitude, the sequence of positive sequence voltage variation is standardized in this paper. The standardized processing formula is

Among them, *ΔV* ^{(1)N} is the standardized result, *E* (*ΔV* ^{(1)}) is the expectation of *ΔV* ^{(1)} array, and *D* (*ΔV* ^{(1)}) is the variance of *ΔV* ^{(1)} array.

After standardizing the positive sequence voltage variation sequence of the monitoring points, the fault location model (*LM*) of the branch with the fault distance *λ* as the only variable can be obtained:

In the above formula, *b* represents the number of branch, *λ* represents fault distance, and represents positive sequence voltage variation at *n* monitoring point when a fault occurs at *λ* of standardized branch *b*.

##### 3.2. SVM Fault Branch Selection Method

According to the characteristics of positive sequence voltage distribution, the proportional relationship between positive sequence voltage variations at each monitoring point is fixed for faults occurring at the same location. Therefore, a unique corresponds to faults occurring at different locations of the branch. The closer the fault location is, the more similar the value of is. For faults occurring on the same branch, the corresponding has the highest similarity. Therefore, this paper introduces the SVM-based fault branch selection method for the primary selection of the branch where the fault is located, in order to reduce the amount of calculation needed for fault location.

Support vector machine (SVM) makes the linear nonseparable samples in the input space project to the high-dimensional space through nonlinear mapping and becomes the linear separable samples by introducing inner product kernel function [16].

Define the category tag of the sample . In the case of two classifications, if , then ; if , then . The training sample set is . From the classification interface , we can get

The distance from the sampling point to the classification interface is

The optimal classification interface should satisfy the idea of maximum separation, i.e., minimization:

By using the extreme value method of inequality constraints, we can get

The partial derivatives of ** ω** and

*b*are obtained, respectively, and the result is 0. The dual principle is used to transform the problem into and under the condition ofsolving the maximum value of the following functions for

*α*

_{i}:

If is the optimal solution,

The samples that with are support vector. In conclusion, the optimal classification function iswhere is the classification threshold. The inner product kernel function is used to replace the inner product of the training samples and the samples to be classified. At this point, the objective function becomes

The kernel classification function is expressed as

For the linear nonseparable case, SVM introduces the relaxation variable *ζ*_{i} and penalty factor *C*, and the objective function becomes

The kernel function used in this paper is sigmoid function:

Based on this, this paper regards the failure of each branch as the same type. At 25%, 50%, and 75% of each branch, the virtual fault points are set up, respectively. The fault simulation is carried out at the virtual fault points. The fault data are processed according to formula (5)∼(9). A sufficient number of fault samples are generated as training samples of the SVM-based fault branch selection algorithm. The trained SVM classifier is used to judge the branch of the network short-circuit fault, so as to narrow the fault location range and lay the foundation for accurate fault location.

##### 3.3. Fault Location Method Based on Similarity Matching

After determining the branch where the fault is located, this paper uses the similarity matching method to solve the fault location parameter *λ*. Using the Euclidean distance and Pearson correlation coefficient to establish the target optimization model, the enumeration method is used to obtain the value of the fault location parameter *λ* that makes the objective function optimal, so as to determine the fault occurrence position.

###### 3.3.1. Euclidean Distance

The Euclidean distance transform is useful for a variety of applications including image processing, computer vision, pattern recognition, shape analysis, and computational geometry [17–19]. There are two vectors *X* = [*x*_{1}, *x*_{2}, *x*_{3},⋯,*x*_{n}] and *Y* = [*y*_{1}, *y*_{2}, *y*_{3},⋯,*y*_{n}]. Their Euclidean distance iswhere *x*_{i} is the *i*-th component of *X*, *y*_{i} is the *i*-th component of *Y*, and *n* is the number of elements. In this paper, *n* denotes the number of monitoring points in the network. For the two faults at the same position, the Euclidean distance between their is extremely small or even zero, and the similarity between them is high. Conversely, when the two faults are far apart in the network. Then, the Euclidean distance is very large, and the similarity is small.

###### 3.3.2. Pearson Correlation Coefficient (PCC)

PCC is a parameter used to measure the linear relationship between distance variables [20, 21]. The larger the absolute value of PCC, the stronger the correlation: the closer the PCC is to 1 or −1, the stronger the correlation. The closer the PCC is to 0, the weaker the correlation. The PCC is positive for positive correlation and negative for negative correlation.

The PCC between *X* and *Y* is defined aswhere *N* is the dimension of *X*, and *X* and *Y* have the same meanings as in formula (22).

Combined with the above Euclidean distance and PCC algorithm, the fault distance objective function established in this paper is shown in the following formula:where is the positioning information corresponding to the current fault, which is calculated by normalization of formula (5). is the similarity index between the current fault information and the faulty branch *b* positioning model . is the PCC between and , and is the Euclidean distance between and . has a value range of [−1, 1]. The higher the linear positive correlation between the two groups, the closer the is to 1; the higher the linear negative correlation between the two groups, the closer the is to −1. When they are identical, ; when the similarity between the two sets of information is lower, becomes closer to zero.

##### 3.4. Fault Location Method Execution Steps

In the fault location calculation process, the short-circuit fault is simulated at 25%, 50%, and 75% of each branch in the network, and a fault-based branch selection database is established. Use this database to train the SVM fault branch selection model. Then, based on the current fault information, , the SVM fault branch selection model is used to determine the branch where the fault is located. Based on the results of the faulty branch, the maximum value of the indicator is the final goal. Look for the that best matches . Thus, the optimal solution of the fault distance parameter *λ* is obtained. Determine the location of the fault at the branch. This paper uses the enumeration method to calculate the value of , the constraint is *λ* ∈ [0, 1], and the objective function is as shown in formula (11). The execution flow of the fault location algorithm is shown in Figure 3.

#### 4. Case Analysis

The case analysis is based on the IEEE-14 node typical distribution network as a simulation model. The model topology and the parameters of each branch are shown in Figure 4 and annex. The analysis of the cases in this chapter consists of three parts. Firstly, the effectiveness of the SVM fault branch selection algorithm is verified in Section 4.1. And then, in Section 4.2, the short-circuit fault is simulated on a branch to verify the effectiveness of the proposed algorithm. Finally, the fault simulation experiment is carried out on each branch, and the performance of the fault location algorithm proposed in this paper is verified in Section 4.3.

##### 4.1. Case 1: SVM-Based Fault Branch Selection Algorithm

Fault tests were carried out at 25%, 50%, and 75% of each branch. The fault types included single-phase short-circuit fault, two-phase short-circuit fault, and three-phase short-circuit fault. Table 1 shows the data of each monitoring point when single-phase ground fault occurs at 50% of all 14 branches of the network. The gray area in Table 1 is the monitoring point where the positive sequence voltage value of the node is not affected. The fault sample data of the same branch are classified into one class.

In order to know how many training samples should be sufficient to meet the expected accuracy, the relationship between training set size and test set accuracy was tested before the fault location experiment. The experimental results are shown in Table 2. According to Table 2, when the number of training samples reaches 540 or above, the accuracy of the test set can reach 100%. So, in the preparation of fault location experiment, we simulated each type of fault three times at 25%, 50%, and 75% of each branch and generated 540 training samples in total.

Then, in each branch, three fault points (60 in total, random position, and random fault type) are randomly selected as the test data of the SVM fault branch selection. The process of generating validation samples is as follows:(a)A random number from 0 to 1 is generated as the fault distance *λ* to determine the fault location.(b)The fault type is selected randomly, in single-phase fault, two-phase fault, and three-phase fault.(c)On branch 1, the failure can be simulated according to the result of random selection, and 1 verification sample is obtained.(d)Repeat the random fault simulation three times on branch 1, and then continue the random fault simulation on the next branch. There are 20 branches in total, so 60 verification samples can be obtained.

The correct rate of branch selection test results is shown in Table 3. It can be seen from Table 3 that the SVM-based fault branch selection method can accurately determine the fault branch and lay a foundation for subsequent fault location.

##### 4.2. Case 2：Fault Location Method Based on SVM Fault Branch Selection and Similarity Model Matching

Assuming that single-phase ground fault occurs in branch 10, *λ* = 0.8 from node 9 and 1 −* λ* = 0.2 from node 10. Name the fault *F*_{1}. The of fault *F*_{1} is obtained from each monitoring. is calculated by formula (7). Input the into the SVM classifier, and get the result as branch 10. The calculation results are shown in Table 4.

Establishing the fault distance location model of the No. 10 branch, , calculate the similarity index between and according to formula (11). Using the enumeration method to optimize the variable *λ*, with *λ*∈[0, 1] as the constraint. Step size is 0.001. is the objective function, seeking the optimal value of the fault distance parameter *λ*. In order to more intuitively observe the change of with *λ*. As shown in Figure 5, the similarity changes with the fault distance parameter *λ*. It can be seen from the figure that the obtains the maximum value 0.981583 when the fault distance *λ* = 0.788. The location result is consistent with the position of *F*_{1}, which verifies the effectiveness of the proposed algorithm.

##### 4.3. Case 3: Performance Verification of Fault Locating Algorithm

In each branch, 10 fault points (200 in total, including single-phase short-circuit fault, two-phase short-circuit fault, and three-phase short-circuit fault) are randomly selected for fault location this verification. Establish fault location models for 20 branches. Use the location error rate to describe the performance of the algorithm. As shown in formula (25). In formula (25), the line length is 1. The smaller the error rate, the more accurate the location method is. The results of the average location error rate of each branch are shown in Table 5:

The average location error rate shown by the experimental results can be well controlled within 4%, which satisfies the requirements for the accuracy of fault location results.

#### 5. Conclusion

The fault location algorithm proposed in this paper only deals with the positive sequence voltage variation of each monitoring point when the fault occurs. The location result is not affected by the fault source type and transition resistance. Firstly, the branch of the fault is determined by the SVM fault branch selection method, which avoids the traversal of the whole network branch, reduces the location range, and reduces the calculation amount. Then, the fault location parameter *λ* is solved by using the fault location similarity model combining Euclidean distance and PCC.

This paper builds an IEEE-14 node network to verify the effectiveness of the algorithm. The results of the case show that the method can locate various types of faults, and the average location error of each branch can be well controlled within 4%. The algorithm lays a solid theoretical foundation for the rapid processing of grid faults.

#### Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

The authors would like to thank the project “Research on Key Technologies of Compact Flexible Loop Closing Device for Medium Voltage Distribution Network” (J2020081) for their support in this research.