Complexity

Volume 2018, Article ID 5950678, 15 pages

https://doi.org/10.1155/2018/5950678

## A Stable Distributed Neural Controller for Physically Coupled Networked Discrete-Time System via Online Reinforcement Learning

^{1}School of Electronic and Information Engineering, Southwest University, Chongqing, China^{2}Chongqing University Key Laboratory of Networks and Cloud Computing Security, Chongqing, China^{3}State Grid Chongqing Electric Power Co. Electric Power Research Institute, Chongqing, China

Correspondence should be addressed to Jian Sun; moc.361@nusj_qc

Received 28 July 2017; Revised 21 November 2017; Accepted 21 December 2017; Published 7 February 2018

Academic Editor: Christopher P. Monterola

Copyright © 2018 Jian Sun and Jie Li. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

The large scale, time varying, and diversification of physically coupled networked infrastructures such as power grid and transportation system lead to the complexity of their controller design, implementation, and expansion. For tackling these challenges, we suggest an online distributed reinforcement learning control algorithm with the one-layer neural network for each subsystem or called agents to adapt the variation of the networked infrastructures. Each controller includes a critic network and action network for approximating strategy utility function and desired control law, respectively. For avoiding a large number of trials and improving the stability, the training of action network introduces supervised learning mechanisms into reduction of long-term cost. The stability of the control system with learning algorithm is analyzed; the upper bound of the tracking error and neural network weights are also estimated. The effectiveness of our proposed controller is illustrated in the simulation; the results indicate the stability under communication delay and disturbances as well.

#### 1. Introduction

The increasing interconnection of physical systems through cybernetworks or physical networks has been observed in many infrastructures, such as power grid [1, 2], transportation networks, and unmanned systems. One critical issue of these called cyberphysical systems is complexity of the system when it grows very large, especially the control problem. Consequently, distributed schemes are suggested for reducing the communication and computational cost compared with centralized control scheme [3]. However, the coupling of subsystems and nonstatic environment in both cybernetworks and physics networks bring many challenges, such as physical interference among subsystems, time-varying plant parameters, communication delay, and expansibility of the cyberphysical system.

To increase expansibility of the cyberphysical system, the multiagent concept is usually introduced. The cyberphysical system can be divided into many agents. Each agent has its own control policy and a unified framework for pursuing its target [4]. The expansion of the cyberphysical system turns into simply duplicating agents without accommodating control policy. To deal with the physical coupling of networked system, one common approach is to decouple subsystems in control design [5–8]. Each subsystem may utilize state information of neighbored subsystems for mitigating their physical interference, or the designer treats their physical interference as random disturbance [9, 10]. On the other hand, for addressing nonstatic environment with time-varying plants, online supervised learning, adaptive control, and reinforcement learning algorithm are suggested; they all enable adaptively adjusting their control parameters online, while the combination of neural network and reinforcement learning usually leads to better control performance compared with conventional supervised learning and adaptive control scheme [11]. Reinforcement learning constructs a long-run cost-to-go function to predict the consequence cost;, each control action takes the estimated future result into account [12], while, compared with adaptive control, the adaptive ability is limited in the number of time-varying parameters; the number of time-varying parameters of plant model may very large in practice.

Recently, many researches are focused on reinforcement learning with neural network. These researches are classified into two categories. The first category is to simply utilize neural network to approximate unknown part about system model or control strategy, such as cost-to-go function and optimal control law. Prokhorov and Wunsch discussed three families of reinforcement learning control design [13], Heuristic dynamic programming (HDP), dual heuristic programming (DHP), and globalized dual heuristic programming (GDHP) and their application in optimal control. Xu et al. focus on experimental studies of real-time online learning control for nonlinear systems using kernel-based ADP methods [14]. Lee et al. focus on a class of reinforcement learning (RL) algorithms, named integral RL (I-RL), that solve continuous-time (CT) nonlinear optimal control problems with input affine system dynamics [15]. The second category is to combine the approach in the first category with supervised learning algorithm for guaranteeing convergence of the learning system; the supervised reinforcement learning also reduces a large number of trials by employing the error signal with domain knowledge [16–18]. It generates instinct feedback for correcting the control actions. Xu et al. suggest a novel adaptive-critic-based neural network (NN) controller which is investigated for nonlinear pure-feedback systems [19]. Liu et al. were concerned with a reinforcement learning-based adaptive tracking control technique to tolerate faults for a class of unknown multiple-input multiple-output nonlinear discrete-time systems with less learning parameters [20]. Besides these, researchers try to employing multilayer/deep neural network for approximating the functions in control, so that the precision of model is enhanced and the performance can be improved in a consequence [21, 22]. However, it is hard to analyze its stability of learning algorithm. Moreover, the learning rate may be slow as the number of tuned parameters is very large in the deep neural network [23].

In this paper, we suggest a distributed neural controller for the physically coupled networked discrete-time system via online reinforcement learning. We model each subsystem as an agent; each agent can obtain its state and some physical neighbored subsystem state information to figure out optimal control action. One-layer adaptive critic neural network and action neural network are proposed for modeling the cost function and optimal action law. With deterministic learning algorithm, we incorporated supervised learning into our reinforcement learning algorithm for accelerating convergence rate. The stability of the learning algorithm is analyzed and the boundary of each parameter is also estimated. The contribution of this paper is two-fold.

(1) We propose a distributed online reinforcement learning algorithm for controlling physically coupled networked discrete-time system.

(2) Sufficient condition for guaranteeing learning algorithm stability and system stability are derived and the upper bound of parameters is estimated.

The rest of the paper is organized as follows: We model the physically coupled networked system and control system in a mathematical dynamic equation in Section 2, and some assumptions are made for simplifying the analysis; then, control system design via online reinforcement learning algorithm is depicted in Section 3; the stability analysis is detailedly discussed in Section 4; simulation results for illustrating the effectiveness and advantage of our algorithm are elaborated in Section 5. Section 6 is the conclusion part.

#### 2. Physically Coupled Networked Control System and Problem Statement

In the physically coupled networked system, their subsystems may physically interfere with neighbored subsystems and change its state trajectory or dynamic. The structure is shown in Figure 1. In order to improve the control system performance, some cyberconnections of communication infrastructures are installed for exchanging the states of neighbored subsystems [3]. The topology of cyberconnections and physical connections may not be the same for probably practical constraints in cyberresources.