Abstract

In order to improve the reactive power optimization effect of the distribution network, this paper combines the multiagent deep reinforcement learning algorithm to analyze the reactive power optimization strategy of the distribution network and constructs an intelligent optimization model. Moreover, the simulation models of power conversion elements, power transmission elements, control elements, and measurement elements in the platform are described, and the program structure and interactive functions are analyzed. In addition, this paper proposes a reactive power optimization method for distribution networks based on data-driven thinking. Finally, by using historical data and an artificial neural network, this paper extracts electrical quantity data such as load power and distributed power output and environmental data such as temperature and wind speed to perform multiagent analysis. The experimental verification shows that the reactive power optimization effect of the distribution network based on multiagent and multiagent deep reinforcement learning proposed in this paper is very good.

1. Introduction

The active distribution network is a necessary stage for the development of the distribution network, which can be described as to use advanced information communication technology and power electronics technology to actively control, coordinate, and optimize various controllable resources such as distributed power supply, energy storage, reactive power compensation devices and controllable loads within its jurisdiction, so that it can actively participate in the system operation process [1]. Operation optimization technology is the core technology of an active distribution network, and it is also the key to realize active control of the distribution network. Compared with the operation mode of the traditional distribution network, the structure of the active distribution network is more complex, and the equipment and constraints that need to be considered in the operation control are more diverse. This is a great challenge for operators, and more targeted operation optimization strategies need to be formulated [2]. Reactive power optimization technology is an important part of active distribution network operation optimization technology, and reactive power optimization mainly achieves goals such as reducing network loss and improving power quality by optimizing reactive power flow. Under the background of an active distribution network, due to the uncertainty and two-way power flow brought about by the integration of high-penetration renewable energy into the grid, it is necessary to flexibly regulate various types of reactive power controllable resources, so as to realize active consumption of renewable energy, improve power quality, and ensure safe, stable, and efficient operation of distribution networks [3].

The method based on a mathematical model is also a kind of method that is widely used in distribution network operation optimization. As the objective function, the physical characteristics and safety requirements (such as power flow balance and voltage safety range) in operation are used as constraints, and the control equipment decisions are used as control variables (such as distributed power reactive power output and CB gear). The control decision is obtained by solving the mathematical model, and the optimal power flow of the active distribution network belongs to this kind of method. The advantage of this method is that its control strategy is obtained by solving the mathematical model, the optimal performance of the solution is guaranteed, and the control strategy can be considered to be the global optimal strategy [4]. Literature [5] designed a centralized distribution network reactive power compensation system for the planning and operation of low-voltage distribution networks and realizes the optimal control of the distribution network reactive power compensation equipment capacity and real-time compensation amount through the centralized control center. Literature [6] proposed a centralized power system operation optimization model and uses the interior-point algorithm to solve it. Literature [7] used the Benders decomposition to divide the original problem into a two-layer optimization problem, and the upper and lower layers are alternately iterated to ensure the control strategy. Literature [8] proposed a distribution network voltage control method for coordinating and optimizing the reactive power and verified the effectiveness of the proposed method through an urban distribution network and rural distribution network, respectively. Traditional algorithms (such as interior point methods) often have the problems of low solution efficiency and difficulty in optimality and stability of solutions. Artificial intelligence heuristic algorithms are widely used in solving nonlinear programming models due to their simple algorithm and easy programming [9]. Literature [10] proposed an active distribution network day-ahead scheduling scheme with the lowest operating cost of the distribution network as the objective function and the active power output of renewable energy as the control variable. Literature [11] considered solving the voltage over-limit problem. Unfortunately, the intelligent algorithm generally has the problem that the solution obtained is the local optimal solution, so the application of this kind of algorithm is limited to a certain extent. In the background of an active distribution network, the search for an excellent numerical analysis method for solving the operation optimization problem has attracted a great deal of attention [12].

In view of the excellent characteristics of the optimal solution of convex optimization [13] being the global optimum and high solution efficiency, the convex programming method that converts a nonconvex problem into a convex problem is more applied to the operation of the distribution network optimization. Second-order cone programming and semidefinite programming [14] are two typical representatives of convex programming applied to the optimization of active distribution network operation. The essence is to transform the original nonconvex model into a convex model through a certain relaxation method. The model is easy to solve. Among them, second-order cone programming has a simpler calculation and solution process than semidefinite programming, so it is more widely used in operation optimization. Literature [15] proved that the second-order cone relaxation is strictly equivalent under the conditions that the objective function is a strictly increasing function and the node load has no upper bound. It has wider applicability in the operation optimization. Literature [16] constructed a reactive power optimization model of an active distribution network and verified the optimization stability and computational efficiency of the model. Literature [17] built a dynamic optimal power flow model and evaluated the effectiveness and accuracy of the second-order cone relaxation. Literature [18] built a mixed integer second-order cone programming model for the unit combination problem, used the interior point cut plane method to solve the unit start-stop state regardless of the climbing constraint, and proposed a simple and easy unit start-stop state correction method. OLTC is important adjustable equipment for an active distribution network, which has a significant impact on the operating characteristics of the system, especially the voltage level. Literature [19] constructed an optimal power flow model for a distribution network with OLTC and used segmental linearization technology to achieve the accurate solution of OLTC gears. Literature [20] considered the operation characteristics of various pieces of equipment such as OLTC, CB, SVC, ESS, and DG and proposed a multiperiod operation optimization method, which provides effective guidance for the date scheduling plan of distribution networks. Literature [21] used second-order cone programming to perform convex relaxation of the original multiobjective model and used the constraint method to describe the Pareto efficient frontier.

2. Reactive Power Optimization Algorithm for the Distribution Network

2.1. Distribution Network Modeling and Data Preprocessing Based on the OpenDss Platform

In OpenDss, the modeling of the distribution network is mainly performed through programming. Firstly, the DSS file of a certain component is independently written, and then, the component is modeled so that the main program is modularized to clarify its logic. When the parameters or models of components need to be modified, they can be edited independently. For example, when modeling the load, only some variables about the load characteristics can be directly written, and these parameter variables can be edited and stored as an independent DSS file so that the main program can be directly called, which is simple and easy to implement. The modeling and programming structure diagram of OpenDss is shown in Figure 1.

The iterative cycle diagram of OpenDSS when calculating power flow analysis is shown in Figure 2.

2.2. Modeling and Analysis of Distribution Network-Related Models

The three-phase load of the distribution network has a balanced load and an unbalanced load. There are two connection methods for the load, the star connection method and the delta connection method, including single-phase or two-phase loads. The load of the distribution network is mainly divided into three basic types: constant current, constant power, and constant impedance. If the rated power of the three-phase load is known, according to the relevant information such as the node voltage and load type where the load is located, through the corresponding calculation, the corresponding values of the constant model parameters can be obtained, that is, the constant power parameter , the constant current parameter , and the constant admittance parameter . The star-shaped load is shown in Figure 3.

For star-connected loads, by giving the three-phase rated power, we obtain the following formula:

Then, it can be converted into corresponding constant model parameters according to the load type and the node voltage.(1)Load with constant P and constant Q typeConstant power parameters(2)Load with the constant current typeConstant current parameters(3)Loads with the constant impedance typeConstant impedance parameters

In OpenDSS, the way to define the load is the load characteristics, the load connection method (star, delta), the power value, and other parameters.

The definition of the photovoltaic model in OpenDSS is shown in Figure 4.

Therefore, the output of the photovoltaic power generation model established in OpenDss in this paper can be approximately expressed by the following formula:

In the formula, C represents the total correlation coefficient, which is a constant, represents the current light intensity, represents the rated output power of the solar panel at a specific temperature when the light intensity is 1kw/ , and represents the working efficiency of the inverter.

Growing energy and environmental concerns are driving the electric vehicle industry. The large-scale integration of random charging loads such as electric vehicles into the power grid will also be adverse. Compared with conventional loads, the spatiotemporal distribution of electric vehicle loads is characterized by large randomness and strong discreteness. These characteristics can cause voltage sags and voltage excursions at the EV’s access point, destabilizing the system voltage. When the user charges the electric vehicle, the electric vehicle charging power formula is as follows:

IIt is 0 when fully charged and 1 when fully charged.

2.3. Reactive Power Control Optimization of the Distribution Network Based on OpenDss

The IEEE37 node three-phase power distribution system is selected for simulation, and its original system topology is shown in Figure 5. The system has a total of 37 nodes and 35 feeder branches. Among them, the rated input voltage of the upper stage is 230 kV, the step-down voltage is 4.8 kV through the bus transformer on 799 nodes, and the final voltage entering the system is 4.8 kV. Through the transformer on the feeder between nodes 709 and 775, the step-down is 0.48 kV. For this node model, except for node 775, the rated voltage of the load on other nodes is 4.8 kV.

In order to make this study more in line with the actual reactive power optimization study, this system is transformed into an active distribution network. Distributed power sources or random load equipment such as wind turbines (WTs), photovoltaic (PV) generators, and electric vehicle (EV) charging piles are set up in the power distribution system. The transformed topology is shown in Figure 6.

2.4. Traditional Reactive Power Optimization Mathematical Model

The traditional reactive power optimization problem is to select the appropriate control variables (such as generator terminal voltage, transformer tap position, and switching capacitor bank) under the condition of known system load parameters, various other electrical component parameters, and grid topology. Moreover, in the case of satisfying various types of constraints of the system (such as power flow constraints, and voltage constraints), the obtained control variables are substituted so that a certain objective function of the system can be solved optimally. Its mathematical model can be expressed as follows:where u represents the control variable used for reactive power optimization and x represents the state variable in the optimization process.

The objective function of reactive power optimization is mainly considered in terms of technology and economy. According to the actual situation and the different problems to be solved, the formulation of the objective function and the focus are also different. This topic comprehensively considers factors such as distribution network operation economy and operation reliability and determines the optimization objective as the smallest distribution network active power loss, and the system voltage deviation is within the qualified range and as small as possible. Therefore, in this subject, the voltage quality is set as the subgoal, and a common objective function is constructed, as shown in the following formula:

The power emitted by the system is equal to the power consumed by the system. It is shown in the following formula:

In this paper, the control variables mainly select the transformer taps. The inequality constraints are as follows:

State variables

Control variables are as follows:

In the formula, represents the active power of the generator and its output upper and lower limits, respectively.

Each parameter and electrical quantity should strictly abide by the above equations and inequality constraints. For the comparison of the optimization effect of the optimized system, this paper defines two optimization indicators as the basis for the analysis of the optimization effect.

Among them, the system loss reduction rate and the overall voltage deviation at a certain time are defined as follows:

In the formula, is the system loss reduction rate at a certain time, which is as follows:

In the formula, is the rated voltage value of the i-th node of the system at this time and n is the number of system nodes.

In this paper, it is determined that the system voltage is within the acceptable range when the voltage deviation is within 5%.

According to the above-mentioned various operations and reactive power optimization based on traditional genetic algorithms, the main process of reactive power optimization based on the improved genetic algorithm in this paper is shown in Figure 7.

The specific formula is as follows:where l represents the number of hidden layer nodes and f represents the activation function of the hidden layer.

The output of the output layer is calculated, and the algorithm takes the obtained hidden layer output H as input. The specific formula is as follows:

The algorithm calculates the prediction error e of the network according to the network prediction result O. The following formula is shown:

The algorithm updates the network connection weights through correlation calculation according to the prediction error e of the network. The formula is as follows:

In the formula, η is the learning rate.

Similarly, the algorithm updates the threshold of each layer of network nodes.

The algorithm is judged whether it satisfies the iteration end condition. If it does not end, it continues to return to Step 2. The specific algorithm training flow chart is shown in Figure 8.

The system may be in different states during actual operation, represents the probability of each state appearing, and the definition of the system entropy value is shown in the formula:

In the formula, C represents a constant and m represents the number of states.

From formula (19), it can be seen that entropy mainly has the following properties: According to the correlation property. When the probability of each state of the system is an equal probability, the entropy of the system will reach the maximum value at this time, and it has extreme value. In addition, the order of the probability of occurrence of each state does not affect the system entropy, and the two are irrelevant.

The work of using neural networks for reactive power optimization control of distribution networks is mainly divided into two stages: training stage and operation stage, corresponding to offline and online conditions, respectively. In the offline training stage, the corresponding relationship between the expression and characteristics of the distribution system. This is similar to learning and imitating the research ideas and work experience accumulated by actual power workers when they face reactive power optimization problems. In the online operation stage, it first efficiently and quickly extracts the corresponding features of the system from a large amount of measurement data and uses the offline trained network model to input the current system state. Moreover, it quickly finds the optimal reactive power optimization strategy, which can be referred to and implemented on the corresponding equipment by the electric power workers.

The offline training and modeling process of the neural network is shown in Figure 9.

The specific steps are as follows:(1)Sample dataset sampling and preprocessing. Through OpenDss sampling according to the selected original features, in the unit of hours, 8760 h of sample data per year is obtained, and the free entropy method is used to calculate the five types of free entropy indicators corresponding to the feature variables, which are used as the input of the network.Due to the different dimensions between the feature indicators, there are also certain differences in the obtained input entropy values. In order to reduce the difference between various types of data, this paper adopts the maximum and minimum methods. First, the input features of the samples are normalized, so that the input of the neural network is all within the [0,1] interval, as shown in the following formula:Among them, represents the value of the i-th feature quantity in a single sample before and after normalization. is the data value corresponding to the i-th feature in a single sample, and the value of integer i ranges from [1, 5]. represents the maximum and minimum values of the i-th feature index in all sample sets, respectively.(2)Verification of the reactive power optimization effect of the distribution network is based on multiagent deep reinforcement learning.

This paper organizes and extracts multiple sets of historical data for one year and extracts five types of free entropy indicators: load entropy, fan entropy, PV entropy, EV entropy, and environmental entropy, as shown in Figure 10. These indicators can well characterize the operation characteristics of the distribution network.

The reactive power optimization effect is evaluated through multiple sets of simulation experiments, and the research results shown in Table 1 are obtained.

The above verification shows that the reactive power optimization of the distribution network based on multiagent and multiagent deep reinforcement learning proposed in this paper is very good.

3. Conclusion

The optimization of distribution network operation under deterministic conditions is based on the premise of accurate prediction of distributed power output and load output and does not consider the impact of short-term fluctuations and randomness. However, under the existing conditions, the output of renewable energy and the output of the load have certain uncertainty. If the uncertainty is ignored, the operation strategy often cannot meet the economical and efficient operation of the system, and even the safety of the system operation is difficult to guarantee. In this paper, the multi-agent deep reinforcement learning algorithm is used to analyze the reactive power optimization strategy of the distribution network, and an intelligent optimization model is constructed to improve the reactive power optimization effect of the distribution network. The experimental verification shows that the reactive power optimization effect of the distribution network based on multiagent and multiagent deep reinforcement learning proposed in this paper is very good.

Data Availability

The labeled dataset used to support the findings of this study is available from the corresponding author upon request.

Conflicts of Interest

The author declares that there are no conflicts of interest regarding publication of the paper.

Acknowledgments

This study was supported by the Hubei University of Technology.