Accurate device parameters play a critical role in the calculation and analysis of power distribution networks (PDNs). However, device parameters are always affected by the operating status and influenced by manual entry. Besides, the distribution area of PDN is very wide, which brings more challenges to parameter identification work. Therefore, developing appropriate algorithms for accurately identifying PDN parameters has attracted much more attention from researchers recently. Most of the existing parameter identification algorithms are gradient-free and based on heuristic schemes. Herein, an adaptive gradient-based method is proposed for parameter identification in PDN. The analytical expressions of the gradients of the loss function with respect to the parameters are derived, and an adaptive updating scheme is utilized. By comparing the proposed method and several heuristic algorithms, it is found that the errors in both three criteria via our solution are much lower with a much smoother and more stable convergence of loss function. By further taking a linear transformation of the loss function, the method of this work significantly promotes the parameter identification performance with much lower variance in repeat experiments, indicating that the proposed method in this work achieves a more robust performance to identify PDN parameters. This work gives a practical demonstration by utilizing the gradient-based method for parameter identification of PDN.

1. Introduction

Obtaining reliable and accurate device parameters is one of the top priorities in the power distribution network (PDN) in many aspects, such as security analysis, system control, state estimation, line loss calculation, power flow calculation, protection setting, and fault analysis [1]. Nevertheless, due to the lack of in-situ measurement techniques, real-time information is hard to obtain directly for PDNs under security and stability situations. Some newly introduced PDN parameters, such as line resistance, line reactance, transformer resistance, transformer reactance, transformer conductance, and transformer electrical susceptance, are in general assumed to be static in a real situation. As a result, it usually leads to poor estimation for parameter identification in PDN [2]. Some new approaches focusing on improving numerical efficiency and error reduction have been developed in many fields of PDN, such as supervisory control and data acquisition, power management unit (PMU), power management unit, and advanced metering infrastructure. These methods can be classified into full-scale approach [3], PSOSR [4], normalized Lagrange multiplier (NLM) test [5], finite-time algorithm (FTA) [6], residual method, sensitivity analysis method, Lagrange multiplier method [7] and Heffron-Phillips method [8]. Furthermore, beneficial to the development of machine learning or deep learning techniques, some smart methods have been proposed recently, such as graph convolution network (GCN) [9], support vector machine (SVM) [10], multihead attention network [11], deep reinforcement learning [12], estimation using synchrophasor data [13], PSCAD simulation [14], and multimodal long short-term memory deep learning [15]. These methods are proved effective with simulation data. However, they are also required to assemble several specialized measuring devices in the grid.

The power flow model with identification equations in PDN provides a mathematical approach to solve the problems of lacking required data and measuring devices. It builds relationships between PDN’s parameters and the data which can be obtained easily, such as active power, reaction power, and voltage. Generally, some important parameters are lacking in practice, such as transformer electrical, transformer resistance, transformer conductance, transformer electrical susceptance, line resistance, and line reactance. The ranges of these parameters are always optimized by algorithms combined with the static parameters like active power, reaction power and voltage on the low-voltage side. The calculating values of voltage on the high-voltage can be computed by power flow model with the parameters mentioned before. The residuals between calculated and true values are always used to build loss functions for optimization algorithms. Therefore, the problems of parameter identification can be easily solved by model-free methods, such as least squares (LS) [16] and Markov Chain Monte Carlo (MCMC) [17]. Another kind of methods called heuristic algorithms can also be utilized to identify PDN’s parameters, such as particle swarm optimization (PSO), ant colony, and genetic algorithms (GA) combined with transient measurements [18]. Some algorithms based on global optimization have been published lately for parameter identification in the field of PDN calculation [19], including random search approach (RS) [20], tree-structured Parzen estimator approach (TPE) [21] and simulated annealing (SA) [22], which have shown satisfactory performance and provided novel ideas for PDN analysis. Moreover, some methods based on machine learning and deep learning have been published for parameter identification, such as support vector regression [23], and convolutional neural network (CNN) [24]. These methods are effective for simulation data with the precise values of voltage and voltage phase angle. Nevertheless, only part of the parameters can be obtained.

The methods mentioned above for parameter identification can be generally classified as gradient-free methods. In addition, the searching methods in these gradient-free algorithms are largely dependent on the initialization. For example, in the genetic algorithm, to evaluate the designed fitness function, it is first needed to initialize the population via the encoding of the individual. The initialization is usually randomly generated, and different initializations can lead to results with large differences. To tackle these problems in parameter identification, it is crucial to develop more robust and stable numerical methods. In this work, based on the analysis of the physical model in PDN, we derive an analytical theoretical model based on the gradients of the loss function with respect to the parameters. Based on the abovementioned points, this paper mainly has the following contributions:(1)The analytical expressions of the gradients of the loss function with respect to the parameters to be optimized in PDN are derived in detail, which are rarely studied and neglected by other investigations.(2)The adaptive scheme for parameter updating based on the loss function gradients is utilized in this work, and it is found that the error during the numerical experiments is remarkably reduced compared with several heuristic algorithms. In addition, the loss function decay during the optimization is much smoother and more stable compared with other algorithms.(3)The variance in the numerical calculation is much smaller than that of several heuristic algorithms, indicating that the methods proposed in this work are more robust in numerical performance.

This paper is organized as follows: Section 2 introduces the identification equations of the power flow model in PDN and proposes the gradient-based optimization method. The experimental data and calculation details are given in Section 3. The results and relevant discussions are given in Section 4. Finally, Section 5 gives a brief conclusion.

2. Materials and Methods

2.1. Power Flow Model Calculation

The basic theory of analysis in PDN can be found in ref [17, 25]. To simplify the computation, the three-phase is assumed to be balanced as the premise for calculating the power flow in this work. The schematic diagram of the power-flow calculation circuit model is shown in Figure 1.

In Figure 1, and represent the active power, reaction power, and voltage on the high-voltage side of the transformers at bus , respectively. These three parameters can be obtained directly by real-time measurements. Other parameters, such as transformer electrical , transformer resistance , transformer conductance , transformer electrical susceptance , line resistance and line reactance , are in general hard to detect in PDN calculation, and satisfy equations (1)–(3):where and in equation (3) are the longitudinal and transverse components of the transformer impedance voltage drop at bus in V. and represent the active power, reaction power, and voltage on the low-voltage side of the transformers at bus D, respectively. and can be expressed by equations (4) and (5), respectively.

The equation of bus can be expressed as equations (6)∼(8):where and in equation (6) are the longitudinal and transverse components of the transformer impedance voltage drop at bus in V. Once we have the above quantities, the final result is calculated by the following equation.

The parameters in the line and transformer can be calculated based on equations (1)∼(9) with the measured power and voltage data.

2.2. Gradient-Based Algorithm for Parameter Identification in PDN

To utilize the gradient-based algorithms for PDN calculations, first the surrogate function (or, namely, the objective function) should be defined. As mentioned above, the inputs of PDN can be defined as: representing the set of the active power, reaction power, and voltage on the high-voltage side of the transformers, and the subscript represents the sample points for . While other parameters, namely, and are seen as the initialized parameters, which are needed to be optimized. Herein, we consider a linear regression of the target value with the combination of the inputs as follows:where represents the bias term and represents the noise term. In this work, we assume that the noise term is subject to the distribution where the mean value and variance are 0 and 1, respectively, namely, . Therefore, the mean value and variance of the target value are

Therefore, the target values are subject to the distribution where its mean value is and the variance is 1. Then the probability of the sample point is

The likelihood probability of the whole sample sets is

Take the logarithm of the likelihood function above and then, we obtain

Extend the above logarithm of the likelihood function, and the loss function in this work is designed aswhere and represents the calculation value and true value, respectively. Herein, we also introduce a linear transform with parameter and represents the true value of voltage at the on the high-voltage side.

2.2.1. Adaptive Gradient-Based Optimization Methods

Up to now, we have known that the optimization mission in this work is to minimize the loss function mentioned above with respect to the following six parameters, namely, and :

To utilize the gradient-based optimization methods, it is generally needed to calculate the gradient of the loss function with respect to the parameters, namely, . In addition, the historical information during the optimization in each step can be utilized to accelerate the optimization, obtaining the first-order and second-order moments as follows:

Then, the parameters are updated with the following rule:where represents a smooth term to prevent the denominator equaling to zero. In equation (20), if we ignore the second-order moment and set it as (where is often known as the learning rate), then the optimization scheme is recovered as the stochastic gradient descent approach. The convergence of the stochastic gradient descent approach is usually slow and it is easy to oscillate at the saddle point. In addition, different parameters should adopt different updating schemes, which means some parameters may not update so frequently during the optimization, thus, it is expected to utilize a relatively large step to speed up the convergence. Oppositely, for those parameters updating frequently, generally it is needed to take a small step for achieving a more stable optimization process. Based on these analyses, the following update scheme is utilized:where is a diagonal matrix. Equation (21) indicates that for frequently updating parameters before the moment , its second-order momentum is relatively large and results in a small learning rate. The learning rate in the gradient-based optimization method is also called the hyperparameter, which can be dynamically changed with an appropriate schedule. To test the influence of hyperparameters such as the learning rate on the performance of parameter identification in PDN. To utilize the gradient-based method to optimize six parameters, namely, and , the gradient of the loss function with respect to these parameters needs to be derived in the next section.

2.2.2. Gradients of the Loss Function Based on Chain-Rule

Once we have the loss function, the next step is to calculate the gradients of the loss function with respect to the six parameters to be optimized. With the help of the chain-rule, the gradients can be derived. First, and are given as follows:where the derivatives is shown inthe both above two equations, and it is given as follows:

Also, the derivatives and are derived as

Substituting (19)–(21) into (17) and (18), then we obtain

Similar to the derivation of (22) and (23), the gradient of the loss function with respect to the other parameters is

Now, from (17)–(27), we have the six gradients of the loss function with respect to the parameters to be optimized.

Once we have the above gradients of the loss function with respect to the parameters, then the gradient-based optimization can be implemented. The pseudo-code of the optimization method of this work is shown in Algorithm 1:

Input: initial parameters
objective function to be optimized
decay rates for moment estimates
initialized first-order moment
initialized second-order moment
time step
learning rate
While is not converged do
gradient w.r.t parameters at time step
update first-order moment
update second-order moment
biased-corrected first-order moment
biased-corrected second-order moment
End while
2.3. Evaluation Functions of the Algorithm

The underlying three functions are employed to estimate the performance of the algorithm proposed in this work:(1)Mean absolute error (MAE):(2)Root mean square error (RMSE):(3)Mean absolute percentage error (MAPE):where and represent the ground true value and prediction value, respectively.

3. Data and Calculation Details

3.1. Raw Dataset Description

In this paper, 1499 samples in the raw dataset were collected by SCADA with a sampling period of 15 minutes [25, 26]. The three-phase first section voltages on the high-voltage side (denoted as and ) are shown in Figure 2, and the low-voltage sides of them (denoted as , and ) are displayed in Figure 3.

It can be found in Figures 2 and 3 that the high-voltage sides in the dataset are closed to the three-phase balance, and this dataset satisfies the requirements of the equations in Section 2.1. In addition, the active power (denoted as , and ) and reactive power (denoted as , and ) of three-phase on the low voltage side are shown in Figures 4 and 5, respectively.

The trend of changes of active power and reactive power on the low voltage side is consistent in Figures 4 and 5, and it indicates that the samples in this dataset are stable and can be used to perform parameter identification.

3.2. Evaluation and Calculation

In this paper, 75% of the samples (1124) are split randomly as a training set to identify PDN’s parameters. The best parameters are used to calculate voltage per unit in bus (denoted as ) by a power flow model. After that, the rest of the 25% samples (375) are used to evaluate the performance of parameter identification as a test set through the three metrics as equations (28)–(30). Instead of directly calculating these metrics, the linear regression should be applied in this paper, the values of and are regarded as dependent variable and independent variable, respectively. The output values of linear regression are signed as , and the final evaluations of parameter identification are gained between and as equation (31):where and are denoted as the slope and bias of linear regression, and they can also be optimized like other parameters by MCMC and GA. In the following discussion, the parameters of linear regression optimized by MCMC and GA are signed as MCMC-LR and GA-LR, respectively. The upper bounds and lower bounds of MCMC and GA should be determined before parameter identification, and they are listed in Table 1:

To avoid the impact of randomness on GA and MCMC, each method is repeated 25 times to guarantee the correctness and stability of results.

4. Results and Discussion

The parameter identification results of AGBO and three SMBO algorithms based on mean square error between and are discussed first in this section. The prior weight and number of started jobs are set as 1 and 20 for the TPE method. The rate of reduction in simulated annealing is 0.1 as a default value [25]. The learning rate and weight decay are 5e-4 and 0 in AGBO. The maximum iteration step is 1000. Their prediction performances are shown in Table 2, Figures 6 and 7, respectively.

It can be found in Table 2 that AGBO has the lowest values in MAE, RMSE, and MAPE compared with the results of RS, TPE, and SA. However, their results only have minor differences, and Figures 6 and 7 also show that the prediction results by these algorithms do not have remarkable differences. The reason can be attributed to the fact that calculating mean square error of and neglecting the statistical relationship between them and has a negative impact on parameter identification. Therefore, based on the previous study [26], the line transformation can be implemented to before calculating the loss function. The parameters of line transformation are optimized with other PDN parameters simultaneously, and their results are displayed in Table 3, Figures 8 and 9.

All algorithms perform better in Table 3 than the results in Table 2, which indicates the statistical property between and , such as linear relationship, has contributed to identifying PDN’s parameters. The prediction errors in Table 3 are almost one-tenth of those in Table 2. Moreover, it can be found that AGBO-LR has a significantly better prediction performance evaluated by all of the three metrics compared with RS-LR, TPE-LR, and SA-LR in Table 3, and the scatter points are closer to a line of 45 degrees than the others, which mean the prediction error of AGBO-LR is lower.

Learning rate and weight decay are two important hyperparameters in the optimization proposed in this work. Therefore, the performance for various values of these parameters is also investigated. The value of weight decay is set as 0 first, and the performances of AGBO with different learning rates are displayed in Table 4:

It is found in Table 4 that the variation of the learning rate has a large influence on the performance of optimization, and the value of the learning rate between 5e-3 and 5e-4 is suggested in this paper. The influence of weight decay on AGBO is investigated subsequently with the optimal value of learning rate; the results are listed in Table 5:

Compared with the learning rate, the values of MAE, RMSE, and MAPE of AGBO only have slight differences and the optimal values of weight decay are 1e-6. The convergence plot of AGBO with the optimal parameters is shown in Figure 10, and the loss value is converged to 0 after approximately 400 iteration steps. Compared with three SMBO algorithms, the convergence curve of AGBO is much smoother and stabler. This result can be attributed to the fact that in the gradient-based optimization method proposed in this work, the searching direction for parameter update is deterministic, while for other gradient-free methods, the searching direction is heuristic and largely dependent on the initialization, which usually leads to an oscillation in thelossfunction, making the convergence much more difficult.

Since, RS, TPE, SA, and AGBO all have certain degrees of randomness, each algorithm is repeated 25 times and then averaged to guarantee the correctness and stability of numerical performance, the results in three evaluation functions are shown in Table 6 and Figure 11fig11. The repetition result indicates that AGBO-LR not only performs better in PDN’s parameteridentification compared with gradient-free algorithms but also has better numerical stability and robustness with the lowest variants during repeating experiments.

5. Conclusions

Parameter identification plays a key role in PDN calculation and analysis; therefore, some methods have been proposed to improve the accuracy of parameters in PDN, such as LS, MCMC and sequential model-based global optimization. However, most of the existing algorithms are classified as gradient-free methods. Therefore, in this work, an adaptive gradient-based optimization method is proposed for parameter identification in PDN. The analytical expressions of the gradients of the loss function with respect to the parameters are derived, and an adaptive updating scheme is utilized. We compare the proposed method with several heuristic algorithms such as RS, TPE, and SA. It is found that the errors via adaptive gradient-based methods are lower in all three evaluation functions, namely, MAE, RMSE, and MAPE, with smooth and stable convergence of the loss function. By further taking a linear transformation of the loss function, the method of this work significantly promotes the parameter identification performance with a much lower variance in 25 repeat experiments. In addition, the variations in hyperparameters of optimization methods such as learning rate and weight decay are also investigated, indicating that the method proposed in this work achieves more stable and robust performance to identify PDN parameters. It should be noted that the gradient-based optimization method can also be further explored in future work, such as the updating schedule of the learning rate for more stable and faster convergence of the loss function, the interpretability of the gradient-based method for parameter identification of PDN.

Data Availability

The data used to support the findings of this study are currently under embargo while the research findings are commercialized. Requests for data, 6/12 months after publication of this article, will be considered by the corresponding author.

Conflicts of Interest

The authors declare that they have no conflicts of interest.


This work was supported by the Nanjing Institute of Technology Scientific Research Start-Up Fund for High-Level Introduced Talents, Grant no. YKJ202046, and the State Grid Jiangsu Electric Power Co., Ltd., Grant no. J2020097.