Discrete Dynamics in Nature and Society

Volume 2018, Article ID 4184805, 11 pages

https://doi.org/10.1155/2018/4184805

## Online Self-Organizing Network Control with Time Averaged Weighted Throughput Objective

Department of Industrial Engineering, Dongguan University of Technology, Dongguan, China

Correspondence should be addressed to Zhicong Zhang; moc.liamg@8991nehpets

Received 16 June 2017; Revised 9 December 2017; Accepted 6 February 2018; Published 4 March 2018

Academic Editor: Francisco R. Villatoro

Copyright © 2018 Zhicong Zhang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We study an online multisource multisink queueing network control problem characterized with self-organizing network structure and self-organizing job routing. We decompose the self-organizing queueing network control problem into a series of interrelated Markov Decision Processes and construct a control decision model for them based on the coupled reinforcement learning (RL) architecture. To maximize the mean time averaged weighted throughput of the jobs through the network, we propose a reinforcement learning algorithm with time averaged reward to deal with the control decision model and obtain a control policy integrating the jobs routing selection strategy and the jobs sequencing strategy. Computational experiments verify the learning ability and the effectiveness of the proposed reinforcement learning algorithm applied in the investigated self-organizing network control problem.

#### 1. Introduction

Queueing network optimization problems widely exist in the fields of manufacturing, transportation, logistics, computer science, communication, healthcare [1], and so on. With the rapid development of the Internet of Things, large-scale logistics distribution network, wireless sensor network [2–4], new generation wireless communication network, and other network technologies, more and more new network structures and new network optimization problems emerge. Optimization of network control is an important factor to affect the efficiency of network operation.

Self-organizing networks are a kind of new queueing network system. In self-organizing networks, each station or node can establish a link with its adjacent stations or nodes, receive jobs from other stations or nodes, and transfer them to other stations or nodes. Due to the complex link relationship of stations or nodes, the paths and the sequence of the jobs to go through the network are very complicated. Consequently, the control problem of this kind of networks is very complicated. In literature, researchers concentrate on the control of multihop network, which is a kind of network with self-organizing characteristic. The research methods of multihop network control mainly include two categories. The first one is to decompose it into a series of single-station queueing problems or tandem queueing network problems [5]. The second kind of methods is to simplify the multihop network control problem into link scheduling problem [6] or queue management problem [7]. The main task of link scheduling is to establish a link between the stations and select the appropriate paths for job transferring. He et al. [8] proposed a load-based scheduling algorithm to optimize the link scheduling between stations so as to achieve the load balance of each station and reduce the degree of paths congestion. Pinheiro et al. [9] studied link scheduling and path selection by fuzzy control. Augusto et al. [10] simultaneously optimized link scheduling and routing planning. Nandiraju et al. [11] studied the problem of restricting the length of transmission path and improved the efficiency of long-path transmission. In order to enlarge the network capacity, Gupta and Shroff [12] optimized link scheduling and path selection by solving the maximum weighted matching problem subject to the -hop interference constraints. The main task of queue management is to classify the jobs to the job groups and to determine the transmission order of the job groups. Fu and Agrawal [7] focused on the problem of jobs classification in queue management and improved the efficiency by batch processing of the jobs. Nieminen et al. [13] and Wang et al. [14] studied optimization of energy management and queue management in multihop networks. Liu et al. [15] reduced the transmission delay and shortened the queue length by modeling and analysis based on Markov chain. Kim et al. [16] considered the fairness of customer services and improved the efficiency of the network while reducing the difference of customers’ waiting time. Vučević et al. [17] and Zhou et al. [18] used a reinforcement learning (RL) algorithm to optimize queue management that allocates the data packets to the queues.

In this paper, we study an online multisource multisink queueing network control problem limited by the queue length. We consider the inherent self-organization characteristic of the queueing network problem, transform the problem into Markov Decision Processes (MDP), and then construct an RL system to deal with them. An optimized control strategy and a global optimized solution are obtained by the proposed RL system. The rest of this paper is organized as follows: we introduce the self-organizing queueing network control problem in Section 2, formulate the problem into an RL model in Section 3, present the detailed RL algorithm in Section 4, conduct computational experiments in Section 5, and draw conclusions in Section 6.

#### 2. Problem Statement

The online self-organizing network control problem concerned in this paper is described as follows. There are stations in the network and types of jobs arrive at the network. Let denote the set of stations in the network and let denote the th job of type . Take the network in Figure 1 as an example. As shown in Figure 1, the self-organizing queueing network is composed of three types of stations. The first type of stations is arrival stations. Each type of jobs has a specific arrival station. The specified arrival station for type jobs is , where denotes the set of arrival stations. The second type of stations is transfer stations, which receive jobs and send them to other transfer stations or destination stations. The set of transfer stations is denoted by , where denotes the th transfer station, denotes the number of transfer stations, and . The third type of stations is destination stations. Each type of jobs has a specific destination station and the jobs of the same type aim to arrive at the same destination station. The specified destination station for type jobs is , where denotes the set of destination stations. Once a job is processed by its specified destination station, it passes through the entire network.