Abstract

Security issue against different attacks is the core topic of cyberphysical systems (CPSs). In this paper, optimal control theory, reinforcement learning (RL), and neural networks (NNs) are integrated to provide a brief overview of optimal robust control strategies for a benchmark power system. First, the benchmark power system models with actuator and sensor attacks are considered. Second, we investigate the optimal control issue for the nominal system and review the state-of-the-art RL methods along with the NN implementation. Third, we propose several robust control strategies for different types of cyberphysical attacks via the optimal control design, and stability proofs are derived through Lyapunov theory. Furthermore, the stability analysis with the NN approximation error, which is rarely discussed in the previous works, is studied in this paper. Finally, two different simulation examples demonstrate the effectiveness of our proposed methods.

1. Introduction

With the development of cloud computing, artificial intelligence, and 5th-generation, the power systems regarded as the primary infrastructures in society become typical CPSs [1, 2]. Since there are numerous physical sensors, complex interaction mechanisms, and massive signals in cyberphysical power systems [3], the security of CPSs is inevitably threatened. For example, a large-scale blackout caused by cyberattacks in Ukraine had disrupted the normal lives of many people [4, 5]. Despite there are many advanced control strategies in CPSs, the imperfection of security has not been sufficiently addressed. Meanwhile, the power system composed of distributed energy and multiple loads is multidimensional [6]. It is urgent to further strengthen the security of CPSs.

Generally, the security of CPSs is threatened by attacks from the perception layer, cyberlayer, and decision layer. In particular, the attacks at the perception layer and cyberlayer, known as cyberattacks, severely disrupt the system. In recent years, reliable control strategies against various cyberattacks, such as false data injection attacks, time-delay switch attacks, and denial-of-service attacks, have been presented by many scholars. Denial-of-service attacks, which can jam information transmission channel, are an aggressive threat to CPS security [79]. A novel control strategy based on the game theoretic approach was proposed to resist the attacks in discrete systems [7]. Similar to the theory of literature [7], Seo et al. [9] primarily solved jamming attack in the communication between sensor and network, where an adaptive scheduling with energy constraints was presented. Using an evaluation function that quantitatively analyzes the impact of attacks, the optimal attack strategy was investigated under energy constraint in a wireless network, which can maximally destroy the stability of the system [10]. Besides, false data injection attacks, which generally cause state estimation errors, have received widespread attention because these attacks can send inaccurate control signals to the executor [1114]. Moreover, the critical detection technology for unknown attacks and unpredictable attack areas was proposed in the previous works [9, 1520]. For instance, considering the characteristics of the network topology and transmission media, a risk prediction method based on a predictive model was proposed to accurately obtain the characteristics of the physical system, which can judge the fault area in the CPSs [16]. In [20], focusing on undetectable attacks, a dynamic attack detector was proposed.

With the integration of multiple energy sources, the control platform and information transmission are extremely complicated [21]. Thus, other irresistible attacks, sensor, and actuator attacks are a topic of research. For example, in [22], a reliable control with the attack compensator was investigated which can withstand sensor and actuator attacks. In [1], the resilient control strategies were proposed to ensure that the variables converge to the equilibrium point in presence of sensor and actuator attacks.

This paper concentrates on the study of an optimal robust control strategy, where the designed unified control method makes the power system immune to the actuator and sensor attacks. We use optimal control theory, reinforcement learning (RL), and neural networks (NNs) to design the controller under the assumed attacks of multiple characteristics. The main works and contributions can be summarized as follows:(1)Optimal control theory, RL, and NNs are integrated to address the security issue of a benchmark power system.(2)A unified way is proposed to deal with the sensor and actuator attacks via the optimal control design.(3)The stability analysis with the NN approximation error, which is rarely discussed in the previous works, is studied in this paper.

The rest of this paper is arranged as follows.

First, the benchmark power system models with actuator and sensor attacks are formulated. Second, the optimal control issue for the nominal system is investigated, and the state-of-the-art RL methods along with the NN implementations are reviewed. Third, several robust control strategies are proposed for different types of cyberphysical attacks via the optimal control design, and stability proofs are derived through Lyapunov theory. Then, two different simulation examples demonstrate the effectiveness of our proposed methods. Finally, a brief conclusion is given.

2. Problem Statement for Power System

Let us consider the following benchmark power system:where , , and represent the deviations of frequency, turbine power, and governor position value, respectively; , , and denote the time constants of turbine, governor, and power system, respectively; and represent the gain of power system and the speed regulation coefficient, respectively; and is the control input.

Let . The nominal system (1) can be rewritten aswhere and .

However, the attacks on the system are generally inevitable, which may affect the control performance. System dynamics (2) suffers from the actuator and sensor attacks, which can be, respectively, described bywhere is the robust control policy, which will be designed later. denotes the system uncertainties. In this paper, we will consider different types of attacks.

Due to the existence of unknown attacks, it is difficult or even impossible to investigate the systems (3) and (4) directly. Inspired by the idea of classical works [2326], we convert this robust control issue of the systems (3) and (4) into the optimal control problem of the nominal system (2). The main idea is that, with the system data and models, we can first attain the optimal control policy through ADP algorithms. Subsequently, based on the optimal control form, we can develop different robust control strategies for the systems with various attacks.

3. Optimal Control for the Nominal System

Define the performance index function aswhere with positive definite symmetric matrices and . Given the admissible control policy , the value function is expressed as

The optimal value function can be defined as

According to the stationarity condition [27], the optimal control policy is derived bywhere and should satisfy the following Hamilton–Jacobi–Bellman (HJB) equation.

Thus, the key point to obtain the optimal control policy is to solve the HJB equation.

ADP is a powerful tool to solve the optimal control problems. Traditional ADP methods include two iterative algorithms: policy iteration (PI) and value iteration (VI). Afterwards, two noniterative RL methods are developed.

3.1. Online RL method

The aforementioned iterative ADP methods belong to the offline learning field because the value function and control policies are updated with the iteration index. Quite different from offline algorithms, online RL methods [27, 28] do not involve any iteration processes, and the value function and control policies are updated in real time.

3.2. Event Trigger-Based RL Method

In the online RL methods, the update and delivery of information must be continuous, which causes a waste of communication resources. For this phenomenon, the event trigger-based RL methods [29, 30] are developed. Here, the value function and control policies are updated only once when the system state error reaches the set point, which reduces the communication burden.

By using the aforementioned ADP methods, we can obtain the optimal control form of the nominal system, which will be employed in the following sections.

To implement the proposed algorithms, a critic NN and an actor NN are employed to approximate the iterative value function and control policy:where and denote the NN activation functions and and represent the NN weights.

Hence, the optimal value function and control policy have NN representation aswhere and denote the ideal NN weights.

In previous works, the NN approximation error was rarely discussed. In this paper, we attempt to consider its effect in the stability analysis.

In Figure 1, the sensor attacks, tampering the state values collected by sensors, occur in the sensor and communication network. Meanwhile, the actuator attacks, which generally modify the control instructions in actuator, occur between the decision and physical layer. The changed system state and control command can be eliminated by the robust control strategy which is calculated by RL based on the performance index function. Ultimately, the power system can work at the scheduled operating point under the sensor and actuator attacks.

4. Robust Control Strategies for Actuator Attacks

First, let us consider the system (3) with , where . The robust controller is designed bywhere the parameters for generating will be determined later.

Theorem 1. If the positive definite matrices and are selected appropriately, then system (3) is asymptotically stable under the robust controller (14).

Proof. Choose the Lyapunov function candidate as follows:which, according to (9), impliesSubstituting (8) into (16) yieldswhere denotes the minimum eigenvalue of a matrix.
To guarantee , one should choose the parameters and to satisfy the following inequalities:The proof is completed.

Remark 1. By using ADP methods, one can obtain the approximate optimal control policy. However, these ADP methods are finally implemented by NNs or other universal approximators, which will bring approximation errors. In the previous works, NN approximation errors were rarely discussed. In this paper, we attempt to present the corresponding error analysis.
When NNs finish learning, NN weights will achieve convergence. Based on (11), the NN-based approximate optimal control policy, which is actually applied to the system, is expressed aswhere is the estimation of the ideal NN weight . Let the approximation error be with .
By means of (13) and (19), one gets

Corollary 1. If the positive definite matrices and are selected appropriately, then the system (3) is asymptotically stable under the NN-based approximate optimal controller (19) as the NN weight approximation error goes to zero.

Proof. Utilizing the Lyapunov function candidate (15) yieldsThrough the result of (17), equation (21) becomeswhere . If , the following condition should be satisfied:It can be observed that if the NN weight approximation error goes to zero or is small enough, condition (23) can be easily realized with the chosen parameters. That is, the NN-based approximate optimal control policy can stabilize system (3).

5. Robust Control Strategies for Sensor Attacks

In this section, the proposed robust control schemes are modified and extended to deal with sensor attacks [31].

5.1. Extension to Nonlinear Sensor Attacks

Consider the system (4) with nonlinear sensor attacks:

The robust controller for (24) is designed the same as (14), i.e., .

Corollary 2. If the matrix is selected appropriately, then the system (24) is asymptotically stable under the robust controller .

Proof. Choose the Lyapunov function candidate as (15). Then, one attainswhere .
To guarantee , select the matrix to satisfy the following inequality:The proof is completed.
Note that the robust load frequency control problem is a special case of sensor attacks.
Let (24) be rewritten aswhere denotes the disturbance caused by the load demand change with .

Corollary 3. If the matrix is selected appropriately, then the system states of (27) are uniformly ultimately bounded under the robust controller .

According to (15), one gets

Let with the identity matrix , and . If there exists a matrix which guarantees to be positive definite, then (28) can be rewritten as

From (29), it can be observed that if . That is, the system states are uniformly ultimately bounded according to the Lyapunov extension theorem [27, 32, 33].

5.2. Extension to Constant Sensor Attacks

Consider the system (2) with constant sensor attacks:where .

Let and add an attack compensator to (30). Then, (30) becomeswhere .

Theorem 2. If the positive definite matrices and are selected appropriately, then the system (31) is asymptotically stable under the optimal controller and the attack compensator.

Proof. Construct a Lyapunov function candidate as follows:where . Then, one hasAfter some mathematical derivation, equation (33) becomeswhere and .
To ensure , one should set the parameters and to satisfy the following inequalities:This completes the proof.
When NNs finish learning, the approximate optimal value function can be acquired:where is the estimation of the ideal NN weight . Let the approximation error be with .
Based on (31) and (36), the NN-based robust control scheme should be designed bywhere

Corollary 4. If the positive definite matrices and are selected appropriately, then the NN-based robust control scheme can stabilize the system (37) as the NN weight approximation errors and go to zero.

Proof. Employing the Lyapunov function candidate (32) yieldsFrom (39), in the limit as the NN weight approximation errors go to zero, i.e., and , one can easily set the parameters and to guarantee . If the NN approximation errors are not small enough or NNs fail to approximate optimal values, the result may not be asymptotically stable and the robust control scheme may be invalid. Therefore, the design of ADP learning algorithm is the key point.

6. Simulation Example

In this section, to verify the proposed robust control strategy, two simulation examples of power systems are presented for two different types of attacks, respectively.

6.1. Design against Actuator Attacks

In this case, the actuator attack affecting the controller is considered in power system. The values of system parameters for this simulation are given in Table 1. Then, we can obtain system matrices and .

Let the initial system state values be . When we insert the actuator attacks into the system, the states of power systems become unstable, which is shown in Figure 2.

In this case, the parameters are selected as and , respectively. One can attain the optimal control via Matlab command CARE or other RL methods. By using the robust controller (14), the system states under actuator attacks can be stabilized within 6 seconds in Figure 2. The 2D plot of state convergence trajectory is given in Figure 3, which indicates the nice performance of our control design.

6.2. Design against Constant Sensor Attacks

In this case, the designed controller is proved by numerical simulation results that it can effectively resist the sensor attacks. The values of system parameters for this simulation are given in Table 2.

Then, we can obtain system matrices and . The controller parameters are selected as and . Let the initial system state values be and constant sensor attack be .

First, we present the simulation result without the attack compensator in Figure 4, which implies the system states affected by constant attacks converge to unexpected values. Then, we employ the attack compensator-based robust control scheme, and simulation results are obtained in Figure 5, which shows the proposed scheme can quickly stabilize the system after the attacks occur. Figure 6 displays the dynamics of the attack compensator. Compared with the given attack, the compensator can estimate the attack value in a short time, which indicates the compensator can successfully get rid of the impact caused by the constant sensor attacks.

7. Conclusions

This paper has integrated optimal control theory, RL, and NNs to address the robust control issues of a benchmark power system. The optimal control theory for nominal systems and state-of-the-art RL methods along with the NN implementations have been reviewed. Multiple types of attacks in power systems, such as actuator attacks, nonlinear sensor attacks, and constant sensor attacks, are discussed. Then, several robust control schemes have been designed for different types of attacks, respectively. The control parameters have been derived through the Lyapunov stability theory. Furthermore, the stability analysis with the NN approximation error, which is rarely discussed in the previous works, has been presented in this paper. Simulation results have demonstrated the effectiveness of our proposed schemes.

Data Availability

Data are available upon request to the corresponding author.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by the Science and Technology Foundation of SGCC (SGLNDK00DWJS1900036), the Liaoning Revitalization Talents Program (XLYC1907138), the Doctoral Scientific Research Foundation of Liaoning Province (2020-BS-181), the Natural Science Foundation of Liaoning Province (2019-MS-239), the Key R&D Program of Liaoning Province (2020JH2/10300101), the Technology Innovation Talent Fund of Shenyang (RC190360), and the Science and Technology Project of State Grid Liaoning Electric Power Company Limited (SGLNSY00HLJS2002775).