Abstract

Recently, unmanned aerial vehicles (UAVs) enhance connectivity and accessibility for civilian and military applications. A group of UAVs with on-board cameras usually monitors or collects information about designated areas. The UAVs can build a distributed network to share/exchange and to process collected sensing data before sending to a data processing center. A huge data transmission among them may cause latency and high-energy consumption. This paper deploys artificial intelligent (AI) techniques to process the video data streaming among the UAVs. Thus, each distributed UAV only needs to send a certain required information to each other. Each UAV processes data utilizing AI and only sends the data that matters to the others. The UAVs, formed as a connected network, communicate within a short communication range and share their own data to each other. Convolution neural network (CNN) technique extracts feature from images automatically that the UAVs only send the moving objects instead of the whole frames. This significantly reduces redundant information for either each UAV or the whole network and saves a huge energy consumption for the network. The UAVs can also save energy for their motion in the sensing field. In addition, a flocking control algorithm is deployed to lead the group of UAVs in the working fields and to avoid obstacles if needed. Simulation and experimental results are provided to verify the proposed algorithms in either AI-based data processing or controlling the UAVs. The results show promising points to save energy for the networks.

1. Introduction

Autonomous UAV networks have been deployed in many applications in both military and civil fields. With the ability to handle large data as well as maneuverability, UAVs are capable of completing a wide range of applications such as oil and gas facilities for security [1], surveillance [2], emergency response, and seaport [3]. They are dynamic and effective for sensing and monitoring surveillance purposes [4], and especially, they can be the core technology in Internet of Things vision in which the distributed UAVs can collect sensing data and exchange the data to each other [5, 6].

An UAV network consists of sensing devices, control algorithms, and communications. UAVs in the network cooperatively work together to complete specific missions. Each UAV can obtain visual sensing data by its equipped camera. The sensing data is then exchanged throughout the network for mission purposes. There are two main structures of information sharing: centralized and distributed [7]. In centralized networks, a central processor performs all tasks include collecting, computing, and delivering commands to other nodes in a network. The centralized scheme has a single point failure of the central processor, and other nodes must maintain a connection with the central node. In distributed networks, information is exchanged between nodes, and the computation and decision-making strategies are performed in each UAV itself. Usually, UAV networks operate in a distributed fashion to improve robustness and reduce communication burden as an UAV only needs to connect with its neighbors. Besides information sharing, another consideration is control algorithms for multiple UAV formations. In UAV-based surveillance systems, UAVs have to encounter numerous obstacles because they normally operate at low altitudes in urban environments due to policy restrictions [8].

Control algorithms should be able to drive an UAV formation to targeted areas without collisions with obstacles as well as other UAVs. In [9], a control algorithm for a team of micro-UAVs based on a leader-follower approach was proposed. The above-proposed methods have shown a good performance in terms of formation shape keeping and smooth maneuvering. However, the obstacle avoidance has not been considered. The Artificial Potential Field (APF) method has been investigated to deal with obstacle avoidance problems [10, 11]. In the papers, the impulsive and attractive forces are generated by the potential field for an agent to avoid collision and remain the desired distance in a formation. However, the APF method possesses limitations due to local minimal problems. At these points, the total force due to attractive and repulsive forces is zero, which prevents the UAV to reach targets. In addition, APF methods have shown poor performance in handling obstacles that have convex and concave shapes [12]. Another powerful approach for controlling swarm robots is flocking control which was first proposed by Olfati-Saber [13]. In flocking control, agents in a group only need to keep a certain distance from their neighbors, which is different from formation control algorithms where agents maintain a rigid position respecting their neighbors. Flocking control algorithms allow the formation to effectively change formation shapes when encountering obstacles. This feature makes flocking algorithms become suitable approaches for UAV-based surveillance systems.

UAV networks have been providing the most successful application for surveillance systems. An UAV-based platform for drought mapping of agricultural crops is presented in [14]. In [15], multiple UAVs are used to monitor and detect traffic congestions. A framework for wildfire monitoring based on multiple UAV system is developed in [16]. Surveillance tasks often require rapid ability to monitor multiple interested points. As UAVs operate in aerial environments, they have a broader vision and encounter few obstacles than other kinds of robots. These features make UAVs become appropriate approaches in surveillance systems.

The intelligent surveillance system (ISS) is a surveillance system with strong data analysis capabilities. An ISS can not only detect or track objects but also analyze data to anticipate behaviors of objects or upcoming events. These kinds of work have been done with minimal intervention from human. Numerous applications of ISS can be found in literature like traffic monitoring [17, 18] or home security [19]. The ISS is a modern technology that makes use of knowledge from various technical fields such as sensing devices, communications, signal processing, and artificial intelligence (AI) [20]. However, due to a large number of cameras deployed in practical surveillance systems, the collected sensing data from the cameras are also large. This leads to numerous issues in terms of system accuracy, time, data complexity, etc.

The development of AI technologies has been rapidly increased in recent years. In [21], motion information is combined with a convolution neural network (CNN) to classify and to track a crowd of people. Sultani et al. [22] develop specific classification models to recognize events and correctly identify various activities of human. In [23], the authors propose a knowledge representation framework for describing patterns in video sequences. The proposed framework has shown more advantages in the ability to rapidly detect objects on screen compared to deep learning techniques. AI techniques have been also used in managing network traffic. Ant Colony Optimization (ACO) is applied to improve the performance of software-defined networks (SDN). The quality of experience of SDN increased 24.1% by applying ACO on the weight graph of the SDN controller. Most AI algorithms usually require powerful hardware to process a huge amount of data. This feature limits applications of advanced AI-based signal processing algorithms in practice.

The hardware constraints are more strict in UAV network surveillance systems. An UAV can only bring a finite amount of batteries. Equipping more onboard processing devices will increase the weight of UAVs that reduces their operating time. Commercial UAVs can operate within 20-40 minutes per charged cycle [4]. Most of the energy consumed comes from propulsion [24], which can be solved by optimizing total flight time in case of data collection and analysis tasks in wireless sensor networks application on a single charge [25]. In surveillance applications, the UAVs often perform tasks at a certain altitude and position until the energy runs out, optimizing the flight time may not be appropriate. The monitoring or sensing data could be images or videos that may cost a big amount of memory storage in each UAV. This also consumes a lot of energy consumption in case of UAVs transmitting data to server sides or between UAVs via wireless data transmission. As mentioned in [26], the power consumption of wireless data transmission is proportional with the package size; thus, the smaller the size of transmitted data, the smaller the energy consumption.

As shown in Figure 1, in surveillance application, each UAV monitors a certain area. Data from the UAVs can be exchanged between neighboring UAVs. The data collection in the form of a video format of UAVs may cost a big amount of memory storage in each UAV and also the transmission bandwidth. In addition, while performing surveillance, the UAVs often fly in a fixed position; hence, most of the scenes do not change over time, and only moving objects are noticeable. The transmission of redundant data such as background frame and overlapped area is a waste of resources [27]; however, further analysis tasks are only concerned with moving objects.

In this work, a framework for high-energy-efficient UAV surveillance networks is proposed. A group of UAVs is deployed to cover an area that needs to be observed. A flocking algorithm is used to drive a group of UAVs moving to sensing areas. The algorithm guarantees that the UAV team can safely travel to required locations and forms an appropriate shape to cover the sensing areas. Then, an AI-based method is proposed with the aim of reducing redundant data for the UAVs while performing surveillance tasks of collecting data. The data processing algorithm can be divided into three main steps: (i) background modeling which removes all moving objects in scenes and background stitching that combines the background modeled from each UAV, (ii) noticed object extraction of each frame captured by UAVs, and (iii) data reconstruction of combined background modeling in step (i) and noticed objects from step (ii). The methodology can be referred as a kind of compress sensing technique which is aimed at saving power consumption by removing such redundant data of sensors [28, 29].

The rest of this paper is organized as follows: Section 2 provides briefly the system models that describe either the UAV network deploying in the sensing field or the AI-based methods to process the surveillance data collected from the UAVs. In Section 3, the whole problems are addressed. The flocking control algorithm and the AI-based data processing method are provided in detail. Section 4 presents both simulation and experimental results following all the steps modeled in Section 3. Finally, conclusions and future research directions are provided in Section 5.

2. System Model

In this section, the system models are presented. First is a model of an UAV network with the ability to travel, to avoid obstacles, and to collect video streaming data. Each distributed UAV can also be able to exchange the data with neighbors to construct the completed information of sensing regions. The AI-based data processing framework for enhancing an energy-efficient approach of a UAV network is analyzed.

2.1. Multiple UAV Systems

Considering a team of UAVs, the team is deployed in a ground center. After receiving a mission request task, the UAV team will move to a target location. The target location is defined as a virtual leader to be able to lead the UAVs in a flocking control algorithm. The collaborative algorithm in [30] is chosen to drive the UAV team. The UAV formation can safely reconfigure formation shapes to avoid collisions with obstacles while migrating. When UAV team arrives at the target place, the team gradually forms quasilattice formation to fully cover sensing areas.

Assuming each UAV can obtain its global position by sensors such as GPS. A downward camera is mounted on an UAV, which provides each UAV a constant sensing range of RS. UAVs are equipped with short-range wireless communication devices that allow them to wirelessly communicate with the others if the Euclidean distance between them is smaller than a constant , noted as the communication range. Different from [30], the sensing range is not required to be smaller than the communication range for ensuring nonoverlapped regions. In this work, overlapped regions are acceptable to guarantee coverage performance. While processing, the overlapped data is handled by an AI-based data processing algorithm, which is proposed as follows.

2.2. AI-Based Data Processing Method

The structure of the UAV system is given in Figure 1. An UAV monitors a distinct area, and the areas handled by different UAVs might be overlapped with each other. UAVs form a distributed network and share their local sensing information with others to reconstruct global sensing data.

In the first step, background modeling is performed on the UAVs. Captured videos are processed to create backgrounds that consist of only nonmoving objects. Then, the backgrounds are sent to the neighbors at the begging and only updated as there are any changes in backgrounds. The individual backgrounds are then stitched together to form a complete background of the sensing area. In the case of overlapped images, an overlapping detection algorithm is presented. In detecting keypoints and local invariant descriptors, then matching descriptors of overlapping images, a random sample consensus (RANSAC) algorithm is utilized to obtain homography. The obtained homography matrix is then used to warp and stitch overlapped pictures.

Secondly, UAVs perform object extraction functions where moving objects are detected by comparing differences in the continuous sequence of frames. If there are motions which is detected, details of moving objects are determined by a convolution neural network (CNN). These useful data are also shared among UAV networks.

Finally, the reconstructed images are built based on extracted data sent by other UAVs. Reconstructed processes can be performed on an UAV. As sensing data are reduced by the proposed method, a burden on transmission bandwidth and computational resources is greatly diminished.

3. Problem Formulation

This section presents an overview of approaches to the problems in multiple UAV-based surveillance systems. First, the flocking algorithm to drive the UAV formations to navigate sensing areas is presented. Next, the AI-based data processing method is given. Three steps as shown in Figure 2 are presented in details.

3.1. Flocking Control for Multi-UAV Systems

In this section, the flocking algorithm for controlling a formation of UAV is presented. The network of a N-UAV team is modeled by graph , where the vertex set represents UAV members. The edge set is the communication link between two UAVs. Let denote the position and velocity vectors of the th. The dynamics of an UAV is described by double integrators model as where is the control input vector for the UAV th. Equation (1) can be used to model distributed UAVs having omnidirectional motion capacity.

Consider each UAV has a communication range . The number of neighbors of the th UAV is defined by where is the Euclidean distance. The superscript indicates the actual neighbors of UAV th. To provide the obstacle avoidance ability, the term “virtual neighbors” is introduced. The virtual neighbors of UAV th are defined as where is the obstacle detection range, is the set of obstacles, and is the position of UAV th projecting on the th obstacles. The virtual neighbors are utilized to generate the repulsive forces to avoid the collision between UAVs and obstacles.

A team of UAVs forms a certain formation structure to navigate a large sensing field. Each UAV must avoid collisions with other members as well as obstacles. The distributed flocking algorithm consists of three components, namely, formation control , obstacle avoidance , and navigation , which is given as

The first term is the formation control term to generate the attractive and repulsive forces for UAV members to form a formation. This term is also used to regulate the velocity matching of UAVs in group. The term is designed as [21] where are positive constants, is the action function [19], is the vector connecting and , and the is defined as with constant . The is differential everywhere and is utilized to construct smooth colective potentials.

The second term is introduced to prevent UAVs from collisions with obstacles in sensing environments. This term is designed by where is the repulsive action function [19] and is the vector along the line connecting and . The adjacency matrices and are defined by a graph [31].

The UAVs may be deployed in a ground center, and they have to travel to certain places depending on sensing missions. The last component, navigation term, is introduced to provide navigating abilities for UAV formations. The component is given as where are the positive constants, is defined as , and are the neighbors of UAV th including itself. The target location is where the UAV formations have to navigate.

3.2. Background Modeling

In the background modeling process, there are two sub-steps: background modeling and background stitching captured by different UAVs.

The median filter technique is used to perform background modeling. The main idea of the median filter is to run through the signal entry by entry, replacing each entry with the median of neighboring entries. In this work, thanks to the idea of median filter technique, a number of frames are chosen randomly from the video captured by UAVs and background modeling tasks are performed. The number of frames chosen may vary through experiments to choose an appropriate number of frames to achieve better results of performance.

After UAVs perform their background modeling processes, the significant data will be sent to neighbors for reconstruction. However, in case of overlapped areas monitored by UAVs, it is not necessary to stitch those areas. In order to do that, the algorithm is called overlapped area detection in which the key points and its corresponding between backgrounds perform the background stitching. Algorithm 1 represents the overlapped area detection steps as follows.

Input: Background images with overlapped areas.
Output: Stitched background image without overlapped areas.
1. Compute the sift-key points and descriptors for images.
2. Compute distances between every descriptor in one image and every descriptor in the other image.
3. Select the top best matches for each descriptor of an image.
4. Run RANSAC to estimate homography.
5. Warp to align for stitching.
6. Finally stitch them together.
3.3. Noticed Object Extraction

Object detection task in aerial images is a challenging and interesting problem. With the cost of drones or UAVs decreasing, more aerial devices could be deployed. Hence, there is a surge in the amount of aerial data being generated. It will be very useful to have models that can extract valuable information from aerial data. However, since most objects are only a few pixels wide, some objects are occluded and objects in shade are even harder to detect. Thus, a hybrid noticed object extraction system that is a combination of existing method and custom object classification model to extract a valuable information from aerial data is proposed in this work.

Firstly, frame difference and thresholding technique are applied in each frame to estimate the moving areas that can be referred as the noticed area. For surveillance tasks with UAVs, the objects are often very small, so that directly applying object detection algorithms can result in missing or incorrectly detected objects. Therefore, in this paper, the first step is to determine the motion area by comparing frame with frame , for to find the difference and thus is motion area as shown in Algorithm 2.

Input: Two consecutive frames, t and t+k, of the input video.
Output: Motion area in each frame.
1. The video is segmented into frames, (consider each frame as an image).
2. Two images / frames (A & B) that is background modeled and frame t
3. Converting the images A & B into gray scale.
4. Computing a difference between these two gray scale images.
5. If, significant difference is detected between these A & B frames, it can conclude that some motion has occurred.

After that, for those areas, a custom convolution neural network (CNN) object classification model is built to perform the classified task whether the area contain objects. If the area contains objects that are predefined in the training model process, object tracking task is executed and extracted the significant data.

Because of the difference of image size of training data, at the first step, all the images used for training classified model are resized into squares in size. The choice of image size that affects the performance of the model can be done through experiments. Because the training images are extracted from the dataset, they are small in size, so a lightweight classified model is built. From the resized images of size , is the width, is the height, and 3 is the number of channel layer. The first one is the convolution layer with 64 feature maps with kernel size and the stride of 1. After that, these feature maps are downsampled by max-pooling layer and at the end of stage 1 are downsampled by a dropout layer as shown in Table 1.

After classifying the areas with objects, the object tracking algorithm is implemented to track and extract the information of the objects towards the central UAVs or between the UAVs.

3.4. Data Reconstruction

The data reconstruction can be performed between UAVs. Objects that are extracted from previous UAVs will be sent to the central UAVs or nearby UAVs along with their locations in the frame. Based on the information received, UAVs can process and perform data reconstruction. The most important point is that each distributed UAV can achieve the whole data collected from the network with a minimal storage. This shows the effectiveness of the proposed method.

4. Simulation and Experimental Results

In this section, the proposed control algorithm known as the flocking algorithm is simulated with 10 UAVs. Then, the experimental results show all the steps mentioned in Section 3. The experiment results are performed by using Python 3.6 with TensorFlow and Keras library on Intel Xeon E3, 16GB RAM, Windows 10 platform. The dataset is selected from Stanford Drone Dataset [32].

4.1. Simulation Results with Flocking Algorithm

In this section, a group of 10 UAVs is deployed to perform surveillance tasks in a vast area. We define a unit to calculate Euclidean distances among UAV and obstacles in the sensing field. Each UAV has a constant communication range units. A virtual leader represents a target location for the group. The group is led by a control algorithm as mentioned in Equation (4).

In Figure 3, as the UAVs in the group encounter obstacles, they separate and move around obstacles. Connections among members may be interrupted without affecting control performance. When the group reaches the target, UAV members gradually reconnect to form a quasialpha lattice shape. The shape covers an area around a target point. Figure 4 illustrates the quasialpha lattices shape formed by the UAV group. All 10 UAVs will stay around the virtual leader and start their mission as surveillance tasks. Based on a fixed communication range , they keep their connections to their neighbors as a grid network as shown in the figure.

4.2. Stanford Drone Dataset

Stanford Drone Dataset is a massive dataset of aerial images collected by drone over the Stanford Campus. The dataset is ideal for object detection and tracking problems. It contains about 60 aerial videos. For each video, we have bounding box coordinates for the 6 classes as “Pedestrian,” “Biker,” “Skateboarder,” “Cart,” “Car,” and “Bus”. The dataset is very rich in pedestrians and bikers with these 2 classes covering about 85%-95% of the annotations.

4.3. Background Modeling

By using multiple frames received from UAV surveillance, the backgrounds are modelled by using a median filter technique. The number of frames needed to perform the background can be selected depending on the scene and location of the monitoring task. In this paper, because the UAVs perform the task of supervising the school staff, the traffic volume of vehicles and people is low. Therefore, 20 frames were randomly selected to perform the background as shown in Figure 5.

After the background from different scenes is captured and sent by UAVs to other UAVs, in case of overlap, the overlapping areas will be handled by the algorithm presented in the previous section as shown in Figure 6.

4.4. Noticed Object Extraction
4.4.1. Moving Area Estimation

After background modelling, moving area detection can be done through the background itself and following frames that may contain moving objects. The different area based on the predefined threshold will transform into . By using this technique instead of the traditional differencing frame method, the result is better as shown in Figure 7.

4.4.2. Object Classification Training Performance

Once the motion areas are detected, the object classification model classifies those areas into different predefined classes. The object classification model trained by the Stanford Drone Dataset includes objects such as vehicles, pedestrians, and cyclists. The model consists of the stages and layers shown in Table 1. After many experiments select the best hyperparameter, the model classifies the object for the best performance with the number of and . The training accuracy at is 0.965, , , and , as shown in Figure 8.

To evaluate the performance of the custom CNN object classification model, F1-score, accuracy, and recall metrics are used, as shown in Table 2. Due to the smaller size of the object of dataset, the performance of the proposed network could be enhanced in future improvement. In contrast, the proposed network shows a very good computational efficiency with a smaller number of operations in both training and classifying stages.

4.4.3. Moving Object Tracking and Extraction

The model object classification can be classified into two types of objects or nonobjects. If the moving area is an object, the object tracking algorithm will be implemented, and the system will be ignored. Figure 9 shows some images extracted from a categorized frame including object and nonobject with a different size.

4.4.4. Data Reconstruction

After moving objects are tracked and extracted, based on position information, , where are the starting location and is the width and height of the size of rectangle bounded region of interest, are achieved. To measure the effectiveness of the proposed method, the videos in the Stanford Drone Dataset are split into frames and compared to the total capacity of the background and the moving object extracted. The videos in the Stanford Drone Dataset have , 30 fps, and data rate =50890 kbps. 300 frames are taken from any video with a capacity of 138 MB. After processing, the total remaining capacity is approximately 14 MB. Thus, the percentage reduction is 90% but still ensures the quality of the image. Indeed, the proportion of objects in each frame is extremely small compared to that of whole frames. If we can eliminate most of the unnecessary data, this can significantly reduce a huge amount of data. The results either save energy for data transmission among UAVs or save storage capacity for UAVs.

Overall, the distributed network of UAVs can significantly reduce a huge data transmission for the video surveillance purposes based on the AI data processing methods. In addition, the flocking control algorithm also helps the UAVs working in the fields that are suitable to the working tasks. The energy-efficient approach is presented and solved completely.

5. Conclusions and Future Developments

This paper proposes new methods either to control multiple UAVs or to process video surveillance data based on AI techniques with CNN. The flocking control algorithms are applied into distributed UAVs to lead the UAVs travelling on the working fields and avoiding collision and obstacles. The AI-based data processing method that reduces significant redundant data streaming among UAVs is proposed. The method also reduces the training time and classification time compared to existing methods, such as YOLO detection. The overall proposed methods help reducing the storage capacity, transmission bandwidth, and performance in surveillance application of UAVs. Indeed, the proportion of objects in each frame is extremely small, and the transmission of redundancy in each frame is not necessary. The application of the method helps to reduce approximately 90% of the excess data capacity but still ensures the quality of the image. This significantly reduces the energy consumption for UAVs in their tasks.

Future research can be done to enhance the proposed solution. In order to improve the system performance, the crucial process is an object classification task that will classify the wrong area detected from the previous step, thereby improving the efficiency of the method. Moreover, when applied to more complex applications such as traffic surveillance and agriculture, more types of object should be considered.

Data Availability

All the data used to support the findings of this study are included within the article.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

The authors would like to thank Thai Nguyen University of Technology (TNUT) and Ministry of Education and Training (MOET), Vietnam, for the support.