Abstract

We propose a systematic framework for Intelligence Video Surveillance System (IVSS) with a multicamera network. The proposed framework consists of low-cost static and PTZ cameras, target detection and tracking algorithms, and a low-cost PTZ camera feedback control algorithm based on target information. The target detection and tracking is realized by fixed cameras using a moving target detection and tracking algorithm; the PTZ camera is manoeuvred to actively track the target from the tracking results of the static camera. The experiments are carried out using practical surveillance system data, and the experimental results show that the systematic framework and algorithms presented in this paper are efficient.

1. Introduction

Moving target detection and tracking has a variety of applications in the field of computer vision, such as intelligence video surveillance, motion analysis, action recognition, environmental monitoring, and disaster response. Normally, it is quite easy and intuitive for humans to see and track targets and recognize their actions. However, establishing an automatic system without any intervention by humans is very challenging. Especially, as the size of the camera network grows with the development of the safe and smart city, it becomes infeasible for human operators to manually monitor multiple video streams and identify all events of possible interest, nor even to control individual cameras in performing advanced surveillance tasks, such as actively tracking a moving target of interest to capture one or more close-up snapshots. Therefore, an important task of the Intelligence Video Surveillance System (IVSS) is to design multicamera sensor networks capable of performing visual surveillance tasks automatically or at least with minimum human intervention. The design of an autonomous visual sensor network as a problem in resource allocation and scheduling can be found in [1]. Existing camera networks generally consist of fixed cameras covering a large area. This results in situations where targets are often not covered at desirable resolutions or viewpoints, thus making it difficult to analyze videos, particularly when there are special requirements on the targets, such as detection and tracking precision, target positioning, and target identification. Since the total number of cameras is usually restricted by various factors, for example, costs and placement, in order to solve this problem, works have been introduced by designing a combination of a pan, tilt, zoom (PTZ) camera with multiple PTZ or fixed cameras in a master-slave manner to complete some practical tasks [27]. A typical system contains multiple static and PTZ cameras, the static cameras can cover a large area, the moving target detection and tracking is done by the static cameras, and the PTZ cameras are manoeuvred to actively track the target from the tracking results of the static cameras with a central supervisor unit. For this purpose, it is necessary to determine the geometrical relations between cameras by the camera calibration technology [811].

To detect the moving targets from video frames of the static cameras, one of the widely used algorithms is background subtraction approach [12, 13]. When the video camera is stationary, the background scene does not change, and, thus, it is very easy to construct a background model [14, 15]. It is critical that background images are efficiently and accurately estimated for any robust background subtraction algorithms. A model of the recent history is built for each pixel location. The classification of new pixel values is achieved by comparing each of them with the corresponding pixel models. Background modeling techniques can be divided into two categories: one is the parametric techniques that use a parametric model for each pixel location and another is the sample-based techniques that build their model by aggregating previously observed values for each pixel location [16]. A well-known method presented by Stauffer and Grimson [14] uses an adaptive strategy for parametric background modeling. In this technique, each pixel is modeled using a separate Gaussian mixture, which is continuously learnt by an online approximation. Target detection at the current frame is then performed at the pixel level by comparing its value against the most likely background Gaussians determined by a threshold. However, since its sensitivity cannot be accurately tuned, its ability to successfully handle both high- and low-frequency changes in the background is debatable. To overcome these shortages, sample-based techniques [17] circumvent a part of the parameter estimation step by building their models from observed pixel values and enhance their robustness to noise. They provide fast responses to high-frequency events in the background by directly including newly observed values in their pixel models. However, their ability to successfully handle concomitant events evolving at various speeds is limited, since they update their pixel models in a first-in-first-out manner, as its adaptive ability to deal with the concurrent events with different frequencies is limited. In order to address this issue, a random background modeling that improves sample-based algorithms can be found in [16].

Moving target tracking is an important processing step in the field of computer vision and has been widely applied in some practical applications, such as video surveillance [1], intelligent transportation [18], and multiagent systems tracking and control [19]. The purpose of target tracking is to estimate the position and the shape of any foreground region in subsequent image frames. The termination of a track occurs when a target can no longer be detected, because it leaves the field of view, stops, and becomes static or can no longer be distinguished from the background. The challenges in designing a robust target tracking algorithm are caused by occlusion, varying viewpoints, background clutter, and illumination changes. During target tracking, a target is accurately tracked by correctly associating a target detected in subsequent image frames with the same identified track. There are various approaches to completing this task. Classic approaches include the multiple hypothesis tracker [20] and the joint probabilistic data association filter [21]. These methods and their variations commonly make use of the one-to-one assumption, namely, that a target can generate at most one measurement in each frame and a measurement can originate from at most one target. However, the one-to-one assumption is always difficult to hold in practical applications due to the splitting and merging processes as well as the existence of multiple targets in the practical application scenes. In recent years, several approaches have been proposed for multiple targets tracking [2225], and some applications related to multitarget tracking have been realized by using distributed cameras [13, 7, 2628].

In this paper, we focus on the real time surveillance system with a multicamera network, which includes static and PTZ cameras, and the control system of active cameras. The target detection and tracking is done by fixed cameras using a moving target detection and tracking algorithm. Target coordinates are transformed to appropriate pan and tilt values using geometrical transformation, and then camera is moved accordingly. The contribution of this paper lies in that we design the real time control strategy of active cameras based on the target information obtained by detection and tracking algorithms.

The paper is organized as follows. The system framework is presented in Section 2. The low-cost PTZ camera control strategy based on target information is presented in Section 3. The test results of target detection and tracking with a multicamera network are detailed in Section 4. Finally, we draw some conclusion and shed light on future work in Section 5.

2. System Framework and Problem Statement

The work presented in this paper originates from a research project on video surveillance applications in the Digital Navigation Center (DNC) at Beihang University. The primary goal of the project lies in the development of an IVSS platform. Intelligence video surveillance in a large or complex environment requires the use of distributed multiple cameras. Since the focal length of static cameras is fixed, they cannot be used to realize some advanced surveillance tasks, such as capturing high-quality videos of moving targets of interest, actively tracking one or more moving targets of interest, and capturing close-up image. For this reason, plenty of researches have been dedicated to designing the combination of a PTZ camera with multiple PTZ or fixed cameras in a master-slave manner to complete some practical tasks [27, 2528]. In this paper, we focus on some problems confronted by the real time surveillance system with a multicamera network in practical applications, in which the surveillance system includes low-cost static and PTZ cameras as well as algorithms. The target detection and tracking is done by fixed cameras using a moving target detection and tracking algorithm, and the target of interest is actively tracked by a PTZ camera using a simple feedback control strategy. The whole structure diagram is depicted in Figure 1.

3. Multicamera Target Tracking and PTZ Camera Control

3.1. Multicamera and Multitargets Tracking

In this paper, we focus on the design and application of a practical IVSS with a multicamera network which consists of low-cost static and PTZ cameras as well as algorithms. The low-cost static cameras are placed at the perimeter, indoor and outdoor areas, and used to realize targets detection and tracking by using moving target detection and tracking algorithm.

An experiment is carried out by using the video data. The Gaussian mixture model [14], the random background model [16], and an improved algorithm of tracking moving targets under occlusions [29] are used for multitarget detection and tracking, and the video data 1 is the evaluating data from PETS database with the video image resolution of 768 × 576 pixel and the frame rate of 25 frame/s; the video data 2 is practical surveillance system data from the DNC of Beihang University, with the video image resolution of 352 × 288 pixel and the frame rate of 25 frame/s.

The experimental results are shown in Figures 2, 3, 4, and 5. As can be seen from Figures 2 and 3, the tree which is swinging in the wind is classified as foreground motion by the Gaussian mixture model but is detected as the background by the random background model. As can be seen from Figures 4 and 5, the target tracking algorithm is effective and has good performance under occlusions.

3.2. Low-Cost PTZ Camera Control Strategy

The feedback signal is unavailable for low-cost PTZ cameras which can only implement one instruction within a certain time interval. In addition, the relationship between time and the variety of pan, tilt, and zoom is indeterminate. In order to solve this problem, we propose a PTZ control algorithm based on the target information feedback. The principle diagram of the acquisition of feedback signal is illustrated in Figure 6.

The feedback signal of the PTZ control algorithm based on the target information feedback is obtained by computing the distance (e.g., the horizontal direction distance and the vertical direction distance ) and the orientation between the centers of an image and the area of the interesting target. Here, the area of the interesting target can be computed as , where and denote the width and height of the area of the interesting target. The PTZ will receive a zoom instruction when the target is smaller than the threshold. The position of the target in the next frame would be estimated by a Kalman filtering. Then the PTZ control instruction for the first frame can be calculated.

3.2.1. Determination of Directions

When calculating the offsets of the centroid of target (COT) to the center of image (COI) and , in order to adapt the direction with large offset, we choose the larger values of and as the rotational direction of the PTZ camera.

3.2.2. Determination of Velocity

Based on the rotational speed of the PTZ camera, we adopt a linear approximation to map the relationship between the speed and the central offsets and . In real applications, all the 16-level rotational speeds are calibrated off line.

3.2.3. Realization of the Moving Prediction Based PTZ Camera Control

The distance between COT and COI is chosen as the feedback and the corresponding up-down, left-right, and zoom in-out control instructions are sent according to the calibrated rotational speed.

In order to adjust the target to the COI in the first frame, after the position of the interesting target is obtained, the average speed of the target can be obtained by its historical moving information as follows: where and denote the average speed on and directions at time ; denotes the update rate of the speed; and and denote the positions on and directions at time .

The position can be estimated by using the Kalman filtering in the next frame in order to obtain the relative offset between the target and camera as follows:

The state vector and observation vector for the Kalman filtering can be represented as [29] where and denote the horizontal and vertical coordinates of the centroid of the moving target; and denote the width and height of the external rectangle of the moving target; and and denote the speeds of the target.

According to the result of (2), one can find the most approximate integer value which can be the rotational speed of the PTZ in the first frame,

Regarding the zoom control of a PTZ camera, in order to alleviate the difficulties of detection and tracking in the process of rotation control due to the changing size of targets, we first realize the P/T rotation control of the PTZ camera and then realize zoom control only if the distance between the COT and COI is less than a predefined threshold.

In the process of the zoom control, the size of targets may change intensively if the camera zooms intensively. It brings great challenges for the algorithm of target matching and tracking. In order to solve the problem of zooming, we adopt a gradual type of control strategy. The control signal is sent every time in the minimal unit and the control process is repeated until the zooming time is satisfied. The feedback signal is computed by , where and denote the target area and the area of the field of view, respectively. If is smaller than the threshold, then send an instruction to zoom-in image. If equals the threshold, the zoom-in operation will be terminated. If is larger than the threshold, the instruction of the zoom-out image will be sent.

During the continuous frame tracking, PTZ will adopt a slightly adjusted tracking plan and recalculate the shift of the target and , and then one can obtain the corresponding values of and . The tolerance of the COI to the COT is set as 10 pixels and the direction of the rotation will be determined by the sign of , as shown in Table 1.

Once the system sends a control instruction, the PTZ will respond within a certain interval. A whole package of the PTZ control needs 3 instructions at most and the response time is about 40 ms. Hence, the PTZ tracking system can be run in real time.

The control algorithm is tested in this paper. The parameters of the PTZ camera are listed in Table 2. When the P/T rotational control is finished, the results of zoom = 7 to zoom = 28 are shown in Figure 7, and the active tracking results of moving targets are illustrated in Figure 8.

From the experimental result, we can find that the zooming is smooth and the visual effect is in accordance with the law of human vision. The PTZ control can guarantee the camera to rotate with the moving target and keep the target in the center of the field of view. In the control process of the PTZ camera, the performance of detection and tracking algorithm strongly affects the result. If the detection and tracking algorithm performs unsatisfactorily, one will lose the target, which makes the PTZ camera have no feedback for the sampled video that hinders the control for the PTZ camera.

4. Experimental Test

The system presented in this paper is tested, and the parameters of the PTZ cameras are shown in Table 2. All cameras that include the static and PTZ cameras are calibrated, and the coordinates of cameras are unified into the world coordinate system. The target detection and tracking is done by static cameras using a moving target detection and tracking algorithm; the target of interest is actively tracked by the PTZ cameras using a simple feedback control strategy.

In the area of surveillance, we set up some important regions, entrances and exits, and design a joint tracking system consisting of the PTZ cameras and static cameras. The regions of entrance and exit are the regions where the target arrives and departs. In order to track targets in the first time, those regions are set as the initial regions for the PTZ camera, as illustrated in Figure 9.

The area of surveillance is between the office building A and the wall. The regions “a,” “b,” “c,” “d,” and “e” are the regions covered by the static cameras (the corresponding number of cameras is camera 1, camera 2,…, camera 5), where region “a” is the start of the road which connects the gate and other roads and also the region that targets must cross when they are entering or departing. Therefore, the region “a” is set as the entrance region and is set as preset 1 with the initial preset of the PTZ camera. The region “c” is more important than other regions, and thus it is set as the important preset, that is, preset 2. The important preset possesses a higher monitoring authority than the other preset.

When a target enters the area of surveillance and turns up in the entrance region “a,” the static camera covering region “a” will detect the target and track it. Meanwhile, a “Call initial preset 1” instruction will be sent to the control system of the PTZ camera. The PTZ camera will turn to the initial preset 1 and the target will be actively tracked by the active control algorithm; following that the channel for the instruction of “Call initial preset 1” will be cut off to prevent the circumstances of unclear targets. When a target enters region “c,” the static camera covering region “c” will be in charge and a “Call preset 2” instruction will be sent to the control system of the PTZ camera. The PTZ camera will turn to preset 2 and the target will be actively tracked.

The relay tracking results of a single walking man in cameras 1 and 2 are illustrated in Figure 10, and those in cameras 2 and 3 are shown in Figure 11. From the experimental results, we can find that the system is capable of continuously tracking targets in different camera views.

When a target enters the entrance region of the surveillance area, the static camera will detect the target and the PTZ camera will be adjusted from patrol state to initial preset 1. When the target turns up in the view of camera 3, namely, the important region, it will be detected, and corresponding instructions will be sent. The PTZ camera will be shifted to preset 2. The feedback instruction will be formed by the target information and the PTZ camera will be controlled to track the targets. The test result is shown in Figure 12.

5. Conclusion and Future Work

In this paper, the comprehensive design and implementation of the IVSS platform based on a multicamera network were presented. The system is composed of the low-cost static and PTZ cameras, the target detection and tracking algorithms, and the low-cost PTZ camera feedback control algorithm based on target information. The target detection and tracking is done by static cameras using a moving target detection and tracking algorithm; the PTZ camera is commanded to track actively the target from the tracking results of the static cameras, and the target information is transformed to the appropriate pan and tilt values using the geometrical transformation, such that the camera is moved accordingly. The test results of the target detection and tracking, active target tracking algorithm, and multicamera target tracking system were reported. Although the development of the multiple target active tracking based on a multicamera network is still challenging when there are more targets to be monitored in the scene than PTZ cameras, we believe that the developed low-cost PTZ control algorithm and scheduling strategy can be widely applied to IVVS and extended to other visual analysis systems.

The multicamera system that can realize the multitarget tracking and active target tracking was verified by a practical IVVS. In addition, the low-cost PTZ camera control algorithm and scheduling strategy were preliminary realized too. However, the algorithm of controlling and scheduling multiple PTZ cameras is undeveloped. Further research works will be required to develop and test these algorithms, and the tests of these algorithms in the practical IVVS will be carried out as well.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This project is supported by the key program of the National Natural Science Foundation of China (Grant no. 61039003), the National Natural Science Foundation of China (Grant no. 41274038), the Aeronautical Science Foundation of China (Grant no. 2013ZC51027), the Aerospace Innovation Foundation of China (CASC201102), and the Fundamental Research Funds for the Central Universities.