Journal of Electrical and Computer Engineering

Volume 2019, Article ID 9406072, 16 pages

https://doi.org/10.1155/2019/9406072

## Continuous Reinforcement Algorithm and Robust Economic Dispatching-Based Spot Electricity Market Modeling considering Strategic Behaviors of Wind Power Producers and Other Participants

^{1}Beijing Key Laboratory of New Energy and Low-Carbon Development (North China Electric Power University), Changping District, Beijing 102206, China^{2}School of Economics and Management, North China Electric Power University, Beijing 102206, China

Correspondence should be addressed to Shuguang Yuan; nc.ude.upecn@nauygnauguhs

Received 17 June 2018; Revised 24 October 2018; Accepted 2 December 2018; Published 3 March 2019

Academic Editor: Daniele Menniti

Copyright © 2019 Zhenyu Zhao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

In a spot wholesale electricity market containing strategic bidding interactions among wind power producers and other participants such as fossil generation companies and distribution companies, the randomly fluctuating natures of wind power hinders not only the modeling and simulating of the dynamic bidding process and equilibrium of the electricity market but also the effectiveness about keeping economy and reliability in market clearing (economic dispatching) corresponding to the independent system operator. Because the gradient descent continuous actor-critic algorithm is demonstrated as an effective method in dealing with Markov’s decision-making problems with continuous state and action spaces and the robust economic dispatch model can optimize the permitted real-time wind power deviation intervals based on wind power producers’ bidding power output, in this paper, considering bidding interactions among wind power producers and other participants, we propose a gradient descent continuous actor-critic algorithm-based hour-ahead electricity market modeling approach with the robust economic dispatch model embedded. Simulations are implemented on the IEEE 30-bus test system, which, to some extent, verifies the market operation economy and the robustness against wind power fluctuations by using our proposed modeling approach.

#### 1. Introduction

Wind power is one of the fastest growing renewable power resources [1]. In the spot electricity market (EM) with wind power penetration, the fluctuating and random nature of this intermittent resource hinders the integration of wind power into EM and operation of power systems. Moreover, the strategic interactions among wind power producers (WPPs) and other market participants such as fossil generation companies (GenCOs) and distribution companies (DisCOs) have increased the complexity of EM modeling which is a necessary tool for market analysis, design, bidding decision-making, and every market modification [2].

The objectives of all participants bidding in EM are maximizing their own profits. Wind power and some other renewable power resources often participate in spot EM as “price takers” because of their low marginal costs. Therefore, the only bidding parameter a WPP needs to determine is its production level [3]. On the one hand, the limited predictability nature of wind power makes WPPs usually not meet the production level they bid, which increases the probability of system imbalances [4]. Relevant regulators in many countries have designed various penalty mechanisms to financially punish WPPs for their deviations of real-time productions from their bidding ones. Hence, if neglecting the marginal costs of wind power [5], maximizing a WPP’s profit means minimizing the deviation cost and maximizing the bidding revenue simultaneously. On the other hand, the fluctuating and random nature of wind power makes other EM participants to bid in this stochastically fluctuating EM environment in order to maximizing their own profits, which in turn affects the bidding revenues of WPPs mainly through locational marginal prices (LMPs) clearing by the independent system operator (ISO). Therefore, in this more complicated situation, developing fast and reliable market modeling approaches which contain bidding interactions among all kinds of participants has become considerably more important than before. One aim of this paper is to apply a new reinforcement learning algorithm based on the gradient descent continuous actor-critic (GDCAC) algorithm for solving double-side hour-ahead EM modeling containing strategic bidding interactions among WPPs and other market participants such as GenCOs and DisCOs.

Generally speaking, literatures relevant to our research can be divided into two categories: optimal wind power (or other renewable power) bidding in EM with wind penetration and EM modeling considering (or not considering) wind and some other renewable power penetrations. In the aspect of optimal wind power bidding in EM, methods for finding the optimal bidding strategy for a WPP have been introduced by many researchers. Vilim and Botterud [3] proposed two stochastic bidding models based on kernel density estimation (KDE) for a WPP to obtain the optimal day-ahead bidding strategy. Ravnaas et al. [6] proposed a seasonal autoregressive integrated moving average (SARIMA) algorithm for a WPP to obtain the optimal day-ahead bidding strategy. Sharma et al. [5] studied the behaviors of strategic WPPs in markets dominated by wind generators using the Cournot game model. In [7], Matevosyan et al. proposed an imbalance cost minimization bidding strategy for a WPP by forecasting the wind power probability distribution functions. Li and Shi [8] proposed a stochastic bidding model for a WPP based on the Roth–Erev reinforcement learning algorithm. Laia et al. [9] considered the uncertainty on the electricity price through a set of exogenous scenarios and solved the bidding problem of a price-taker thermal-wind power producer by using a stochastic mixed-integer linear programming approach. In [10], Chaves-Ávila et al. analyzed the impact of different balancing rules (penalty mechanism) on wind power short-term bidding strategies through a stochastic optimization model. Based on the Stackelberg game model, Xiao et al. [11] put forward a closed analysis on WPP’s optimal bidding strategy in day-ahead EM involving large-scale wind power. Lei et al. [12] studied, using a stochastic bilevel model, the optimal bidding decision for a WPP participating in a day-ahead EM that employs stochastic market clearing and energy and reserver cooptimization, in which only the wind generation uncertainty is considered. Similar researches on the optimal bidding strategy of a WPP can also be seen in [11, 13–18].

However, authors in [3, 5–18] only studied how to find the optimal bidding strategy for a WPP within EM environment, and the modeling methods of those literatures are either static game models (Cournot and Stackelberg game models) or bilevel stochastic optimization model which cannot simulate the impact of wind power on dynamic bidding process of other participants (GenCOs and DisCOs) in a spot EM considering wind power penetration.

In order to overcome those deficiencies listed above, researches on spot EM modeling methods considering or not considering wind and some other renewable power penetration have been proposed in many literatures.

In general, the main purpose of EM modeling approaches is to regard the EM as a whole system, in which the interactions among all market participants are investigated, and the bidding process or the equilibrium result is simulated. EM modeling approaches mostly lie within twofold [2]: game-based models and agent-based models. In [2], Salehizadeh and Soltaniyan have summarized that game-based EM models are inferior to agent-based models, and the reasons are as follows: (1) some game-based models often result in a set of nonlinear equations which cannot be easily solved or might yield no solution; (2) some game-based models need to repeatedly solve the multilevel mathematical programming approaches so as to depict the dynamic bidding process in EM, while the computational complexity limits the ability to simulate large EM systems with a game-based model; and (3) almost all game-based models are based on an assumption which is to take the known probability distribution function of the market clearing price (MCP) or other competitors’ bidding strategies as common knowledge, and the abovementioned assumption is not more applicable in a realistic situation [19]. Hence, many researches about the application of agent-based methods for EM modeling have been proposed recently. Rahimiyan and Rajabi Mashhadi [19] modeled and simulated the EM bidding process using the multiagent Q-learning algorithm considering discrete state and action sets and the game-based approach, respectively. Comparison of the agent-based model with the game-based model in [19] confirms the superiority of the agent-based model in this issue. Santos et al. [20] proposed an agent-based wholesale EM test bed (called MASCEM: multiagent simulator of competitive electricity markets) in which the variant Roth–Erev reinforcement learning (VRERL) algorithm was used to model the bidding behavior of the GenCOs agents. Similar researches on agent-based EM modeling can also be seen in [21–28], but none of researches in [19–28] is involved in considering wind and some other renewable power penetrations.

Shafie-khah et al. [29] proposed a multiagent EM model based on a heuristic dynamic algorithm to help analyzing the market powers of GenCOs in EM considering wind power uncertainty. Dallinger and Wietschel [30], based on an agent-based EM equilibrium model, have studied the impact of plug-in electric vehicle on EM with renewable power penetration. Reeg et al. [31] studied the policy design problem to foster the integration of renewable energy sources into EM by using an agent-based approach. Zamani-Dehkordi et al. [32] studied the impact of a proposed wind farm project on wholesale and retail electricity prices by using EM models based on nonparametric regression algorithms. In [33], by using the Q-learning algorithm, Haring et al. proposed a multiagent EM approach to analyze the effects of renewable power uncertainty on the spot EM bidding progress. Salehizadeh and Soltaniyan [2] modified the multiagent EM approach through the fuzzy Q-learning algorithm, by which the effects of renewable power uncertainty on the spot EM bidding progress was also studied within a continuous market state (wind power) space, but discrete action spaces. Paschen [34] analyzed the dynamic behavior of day-ahead EM prices in Germany due to structural shocks in wind and solar power by using a dynamic structural vector autoregressive model. Similar studies can also be seen in[35, 36], but researches in [29–36] regard the wind power or other renewable powers as an exogenous random variable so that strategic bidding behaviors of wind or other renewable power producers as well as impact of the EM bidding process on WPPs are neglected in those literatures.

So far as we know, there is no relevant research containing the following three points simultaneously:(1)To construct a multiagent-based EM model which contains not only the impact of WPPs’ uncertain output on strategic bidding behaviors of other market participants but also the impact of the EM bidding process on WPPs’ bidding decision-making(2)To construct a multiagent-based EM model in which both the EM environment state space and bidding strategy (action) spaces of all kinds of market participants such as WPPs, GenCOs, and DisCOs are continuous(3)To construct a multiagent-based EM model in which the market clearing model of ISO is propitious to promote the wind power accommodation capacity of the power system, which is another aim of this paper

This paper applies a new modified reinforcement learning algorithm, namely, GDCAC algorithm, for hour-ahead EM modeling. In our proposed EM approach, all kinds of participants such as WPPs, GenCOs, and DisCOs are regarded as interactively strategic bidding agents who, during the bidding process, must select their optimal bidding strategies from their continuous strategy spaces based on the EM environment state they learned within a continuous state space, respectively, and without causing troubles of “curse of dimensionality.” The market clearing model of ISO in our approach is a robust economic dispatch model (REDM) [37] which can optimize the permitted real-time wind power deviation intervals based on WPPs’ bidding power output. By using our proposed approach, the dynamic interactions among all kinds of participants as well as the Nash equilibrium (NE) results of EM can be simulated and obtained. On the one hand, our proposed approach can provide a bidding decision-making tool for WPPs, GenCOs, and DisCOs to get more profits in EM. On the other hand, our proposed approach can also provide an economic and operational analysis tool for promoting the development of renewable resources. Moreover, in our simulation, the proposed approach is implemented on the IEEE 30-bus test system. Other than testing and verifying the feasibility and rationality of our proposed approach such as reaching NE results after enough iterations and being superior to other agent-based approaches, comparison of our proposed market clearing model with that in [12] under the same bidding approach based on the GDCAC algorithm is also implemented, which indicates the necessity of adopting the REDM for promoting wind power accommodation in EM.

The rest of this paper is organized as follows: in Section 2, the multiagent double-side hour-ahead EM modeling containing strategic bidding interactions among WPPs, GenCOs, and DisCOs are explained. Sections 3 and 4 describe the detailed procedure of applying the GDCAC algorithm for EM modeling. Section 5 conducts the simulations and comparisons. Section 6 concludes the paper.

#### 2. Multiagent Hour-Ahead EM Modeling

##### 2.1. Participants’ Bidding Models

In our proposed double-side hour-ahead wholesale EM model, we consider every WPP, GenCO, and DisCO as an agent. An agent has the ability of learning through its bidding experiences in order to maximize its own profit. For the sake of simplicity and without the loss of generality, we assume that every WPP and GenCO has only one generation unit. In each hour, every GenCO and DisCO solves its own bidding problem and sends its price-quantity bid curve for the next hour to the ISO. Moreover, every WPP, because of its “price taker” role in EM, solves its own bidding problem and sends its bidding power output to the ISO. ISO, after receiving all bid curves from GenCOs and DisCOs as well as all bidding power outputs from WPPs, performs the process of robust economic dispatch management and sends the scheduled power results as well as LMPs to all market participants (WPPs, GenCOs, and DisCOs).

For WPP *i* (*i* = 1, 2, …, *N*_{w}), the only bidding parameter for hour *t* is its planed (bidding) power output . WPP *i* can adjust its bid by changing this parameter. In power systems of many countries, wind power is given priority to be scheduled by ISO comparing with other nonrenewable resources [37], which is to say prior-scheduled wind power for hour *t*, namely, , is equal to . However, because of the high variability and random nature of this intermittent resource, the (predicted) real-time output power of WPP *i* for hour *t*, namely, , which is actually a random variable [12], usually tends to deviate from the scheduled one, which is harmful to the secure operation of the power system and tends to cause system imbalance. Hence, penalty mechanisms to financially punish WPPs for their deviations of real-time productions from their bidding ones must be involved. Taking the penalty method of [12] into consideration, the expected profit of WPP *i* for hour *t* can be described as follows:where represents the hour-ahead nodal price (LMP) for hour *t* at the bus connecting WPP *i*. is a random variable, which is used to describe the scenarios of wind power uncertainty. represents the envelope space of wind power scenarios. represents the probability of occurrence of the scenario . and represent the (predicted) real-time power output and penalty price of WPP *i* for hour *t* in scenario , respectively. In this paper, we involve that the penalty price of WPP *i* is related to the (predicted) real-time LMP at the bus connecting WPP *i* [12].

Moreover, there is a difference between the (predicted) real-time power output and the (predicted) natural power output (namely, ) of WPP *i* in hour *t*. WPP *i* can determine whether its (predicted) real-time power output is equal to the natural one by conducting pitch control or using storage equipment [37]. The functional relationship between these two random variables can be formulated as follows [37]:where represent the permitted upper and lower bounds of power output of WPP *i* that the can be accepted by system for hour *t*. In this paper, we consider the (predicted) real-time natural wind power outputs of all WPPs as common knowledge.

For GenCO_{j} (*j* = 1, 2, …, ), the formulation of its bid curve for the next hour *t* is a supply function based on its real marginal cost function [28]:where represent the power production (MW) and bidding strategy ratio of GenCO_{j} for hour *t*, respectively. GenCO_{j} can adjust its bid curve by changing its parameter .

The marginal cost function of GenCO_{j} iswhere and represent the slope and intercept parameters of GenCO_{j}’s marginal cost function, respectively.

Moreover, we assume every GenCO is an AGC (automatic generation control [37]) unit which can automatically undertake the real-time power imbalance of system with a certain proportion (namely, ). Therefore, the expected profit of GenCO_{j} can be described aswhere represents the hour-ahead nodal price (LMP) for hour *t* at the bus connecting GenCO_{j}, represents the (predicted) real-time nodal price (LMP) for hour *t* at the bus connecting GenCO_{j} in scenario , and represents the GenCO_{j}’s hour-ahead scheduled output power result for hour *t*.

For DisCO_{m} (*m* = 1, 2, …, *N*_{d}), the formulation of its bid curve for the next hour *t* is a demand function based on its real marginal revenue function [28]:where represent the power demand (MW) and bidding strategy ratio of DisCO_{m} for hour *t*, respectively. DisCO_{m} can adjust its bid curve by changing its parameter .

The marginal revenue function of DisCO_{m} iswhere and represent the slope and intercept parameters of DisCO_{m}’s marginal revenue function, respectively.

Profit of DisCO_{m} can be described aswhere is the hour-ahead nodal price (LMP) for hour *t* at the bus connecting DisCO_{m} and represents the DisCO_{m}’s hour-ahead scheduled power demand (load) result for hour *t*.

##### 2.2. ISO’s Market Clearing Model

In the traditional dispatching mode considering wind power penetration, ISO sends the scheduled values of wind power to WPPs and WPPs are required to strictly follow the scheduled values in the case of their generation capacities. This traditional mode has the following two obvious defects [37]:(1)In the case of low precision of wind power prediction, the traditional dispatching mode is not conducive to the wind power accommodation. It can lead to extreme operating conditions, which may seriously threaten the system security when the wind power violently fluctuates.(2)It may lead to frequent pitch control when wind turbines strictly track the scheduled values of output power, which would affect the lives of the wind turbines.

The main reason for those two defects listed above is that in the traditional dispatch mode, the uncertainty of wind power is not taken into account. Hence, ISO does not know the maximum permitted wind power output fluctuation range in the premise of ensuring system security and cannot optimize wind power accommodation capacity of the power grid. Therefore, nowadays, more and more attentions have been paid to the REDM [37] which aims to promote the wind power accommodation in considering wind power uncertainty. According to [37], the robust hour-ahead economic dispatch model for hour *t* can be mathematically described as follows:where and in equation (8) represent the deviation penalty coefficients of permitted upper and lower bounds of the wind power output of WPP *i*, and equations (9)–(15) represent the hour-ahead system constraints including power balance constraint (equation (9)), DC power flow constraints in each transmission line *l* (equations (11)–(13)), and load and power production of every DisCO and GenCO (equations (14) and (15)). The hour-ahead LMPs of system can be calculated by using dual variables of equations (9)–(13). Formulations for hour-ahead LMP are in Appendix A. Equations (16)–(19) represent the (predicted) real-time system constraints including power balance constraint (equation (16)), DC power flow constraints in each transmission line *l* (equations (17) and (18)), and power production of every WPP (equation (19)).

From equations (16)–(18), it is obvious that (predicted) real-time DC power flow in each transmission line *l* is the linear function of (predicted) real-time power output by every WPP. From equation (2), (predicted) real-time power output of WPP *i* (*i* = 1, 2, …, ) must satisfy to say we can solve the abovementioned REDM by replacing with and , respectively (Appendix B) [37] and generating new (predicted) real-time balancing and transmission constraints as follows:

The (predicted) real-time LMPs (RTLMP_{1}s) of system when (predicted) real-time power output of every WPP increases to its (scheduled) permitted upper bound can be calculated by using dual variables of equations (9) and (21)–(23), and the (predicted) real-time LMPs (RTLMP_{2}s) of system when (predicted) real-time power output of every WPP decreases to its (scheduled) permitted lower bound can be calculated by using dual variables of equations (9) and (24)–(26). Therefore, RTLMP_{1}s and RTLMP_{2}s represent 2 extreme real-time dispatching results caused by real-time wind power deviations of all WPPs. For the sake of simplicity and without loss of generality, we approximately consider the mean value of RTLMP_{1} and RTLMP_{2} at bus *z* as the (predicted) real-time LMP at bus **z** and neglect the impact of different on (predicted) real-time LMPs.

#### 3. Agent-Learning Mechanism

For an agent in our proposed approach, all the other agents together constitute the EM environment it faces. Therefore, interactions between an agent and all the other agents are equivalent to interactions between this agent and the EM environment it faces. An agent has the ability of learning through repeated interactions with the EM environment for finding its optimal action (bidding strategy or bidding power output), which can maximize its (expected) profit in face of whatever the EM environment state is. In this paper, in order to clearly describe our proposed approach, we use the definitions which are organized as follows:(1)*Iteration*. Since the market is assumed to be cleared in hour-ahead basis, we define each market round as an iteration.(2)*State Variable*. For WPP_{i} and in iteration *t*, the hour-ahead and (predicted) real-time LMPs at the bus connecting WPP *i* calculated in iteration *t* *−* 1, namely, , , are defined as the EM environment state variables; for GenCO*j*, the hour-ahead and (predicted) real-time LMPs at the bus connecting GenCO*j* calculated in iteration *t* *−* 1, namely, and , are defined as the EM environment state variable. For DisCO_{m}, the hour-ahead LMP at the bus connecting DisCO_{m} calculated in iteration *t* *−* 1, namely, , is defined as the EM environment state variable. Hence, the state vectors and scalar for WPP_{i}, GenCO_{j}, and DisCO_{m} can be formulated as follows [28]:where , , and are continuous, closed, and bounded state spaces for WPP_{i}, GenCO_{j}, and DisCO_{m}, respectively.(3)*Action Variable*. For WPP_{i}, the hour-ahead bidding power output, namely, , is defined as the action variable of this agent in iteration *t*. For GenCO_{j} or DisCO_{m}, the hour-ahead bidding strategy rate, namely, or , is defined as the action variable of GenCO_{j} or DisCO_{m} in iteration *t*. Hence, the action scalars for WPP_{i}, GenCO_{j}, and DisCO_{m} can be formulated as follows:

Obviously, from equations (28)–(30), we can see that the action spaces for WPP_{i}, GenCO_{j}, and DisCO_{m} are continuous, closed, and bounded intervals.(4)*Reward*. In iteration *t*, similar to what was mentioned in [28], every agent learns from the state of the EM environment () and then selects its action which in turn forms its bidding power output or curve for sending to the ISO. After receiving all bidding outputs and curves, hour-ahead LMPs permitted upper and lower bounds of (predicted) real-time power outputs by WPPs, as well as hour-ahead power supply and demand schedules are determined by ISO with our REDM represented by equations (8)–(19). Rewards of WPP_{i}, GenCO_{j}, and DisCO_{m} can be depicted as equations (1), (5), and (8), respectively.

Based on experiencing these received rewards over enough iterations, an agent in EM can gradually learn to know how to take the corresponding optimal hour-ahead actionwhich brings the most profit in face of any state () of the EM environment. Hence, and (*i* = 1, 2, …, ; *j* = 1, 2, …, ; *m* = 1, 2, …, *N*_{d}) are changing dynamically over iterations, which may be or not be constant after enough iterations.

#### 4. Methodology

Inspired by the studies in [19–26], the dynamic bidding process in spot EM can be realized via table-based reinforcement learning algorithms (TBRLAs) such as Q-learning, fuzzy Q-learning, Roth–Erev learning, and SARSA algorithms. As mentioned in [28, 38], TBRLAs can only rapidly solve the Markov decision-making problems with discrete state and action spaces. When one of the state and action spaces becomes continuous, the problem called “curse of dimensionality” will be caused, and the learning speed of TBRLAs becomes so slow that the agent cannot find its optimal action under any given state of environment over iterations.

As mentioned in Section 3, actually both the state and action spaces of every agent in EM are continuous, closed, and bounded space or interval, which guarantees the process of global optimization. Therefore, it is improper to model and simulate the dynamic bidding process in our proposed hour-ahead EM containing strategic bidding interactions among WPPs, GenCOs, and DisCOs by using TBRLAs. Method in this paper is to apply a modified reinforcement learning algorithm, called the GDCAC algorithm [28, 38], for modeling and simulating our proposed EM.

Because the mathematical principle and pseudocode of the GDCAC algorithm have been described in [28], we only propose the step-by-step procedure of implementing the GDCAC algorithm for hour-ahead EM modeling containing strategic bidding interactions among WPPs, GenCOs, and DisCOs as follows:(1)*Input*. For the whole EM, input common knowledge is such as every WPP’s reduced (predicted) real-time wind power output scenarios (WPOSs) with corresponding probabilities and all WPP’s joint real-time WPOSs with corresponding probabilities. For WPP*i* (), input the basic function : for formulating its value function , and its optimal policy function , time step length parameter series and where and and and . For GenCO_{j} (), input the basic function : for formulating its value function and its optimal policy function , time step length parameter series and , where and and and . For DisCO_{m}, () input the basic function : for formulating its value function and its optimal policy function , time step length parameter series and where and and and . Moreover, input the discount, standard deviation, as well as the maximum training and decision-making iterations parameters, namely, , and *T*_{1} and *T*_{2}, for every WPP, GenCO, and DisCO.(2)*t* = 0.(3)Initialize the linear parameter vectors and for WPP_{i}, linear parameter vectors and for GenCO_{j}, and linear parameter vectors and for DisCO_{m}.(4)If *t* < *T*_{1}, then in iteration *t*, WPP*i* selects and implements an action () from state , GenCO_{j} selects and implements an action () from state , and DisCO_{m} selects and implements an action () from state . If *T*_{1} < *t* < *T*_{1} + *T*_{2}, then in iteration *t*, WPP_{i} selects and implements an action () from state , GenCO_{j} selects and implements an action () from state , and DisCO_{m} selects and implements an action () from state . After action selecting and sending it to ISO by every agent, ISO implements the REDM represented by equations (8)–(19) by which the EM environment state vector variables are updated from to and the immediate reward , , and are generated.(5)WPP *i* observes the immediate reward by using equation (1) and the new EM environment state ; GenCO*j* observes the immediate reward by using equation (5) and the new EM environment state ; and DisCO_{m} observes the immediate reward by using equation (8) and the new EM environment state .(6)*Learning*. In this step, and for WPP *i*, and for GenCO_{j}, and and for DisCO_{m} are updated by using the *TD* (0) error (namely, , , and ) and gradient descent method.

WPP *i*:

GenCO_{j}:

DisCO_{m}:(7)*t* *=* *t* *+* 1.(8)If *t* *<* *T*_{1} + *T*_{2}, return to step (4).(9)*Output*. For WPP_{i}, and and . For GenCO_{j}, and and . For DisCO_{m}*, * and and .

According to [28, 38], we choose Gaussian radial basis function as , , and .

#### 5. Simulation Results and Discussions

##### 5.1. Data and Assumptions

In this section, our proposed approach is implemented on the IEEE 30-bus test system with 2 WPPs, 6 GenCOs, and 20 DisCOs [2]. The schematic structure of this test system is shown in Figure 1. The output power of the WPP connected to bus 7 (marked as WPP 1) and 10 (marked as WPP 2) lies within the ranges of [0 80] MW and [0 50] MW, respectively. According to [39, 40], we assume both of the real-time wind power outputs of these two WPPs follow the Weibull distribution independently and respectively. Then, the (predicted) real-time WPOSs of these two WPPs can be generated by using the Monte Carlo method, and method of real-time WPOS reduction is referred to [39, 40]. Table 1 shows the reduced 10 (predicted) real-time WPOSs and their corresponding probabilities of these two WPPs which can be used as exogenous parameters in our proposed approach.