Abstract

We propose a systematic framework for moving target positioning based on a distributed camera network. In the proposed framework, low-cost static cameras are deployed to cover a large region, moving targets are detected and then tracked using corresponding algorithms, target positions are estimated by making use of the geometrical relationships among those cameras after calibrating those cameras, and finally, for each target, its position estimates obtained from different cameras are unified into the world coordinate system. This system can function as complementary positioning information sources to realize moving target positioning in indoor or outdoor environments when global navigation satellite system (GNSS) signals are unavailable. The experiments are carried out using practical indoor and outdoor environment data, and the experimental results show that the systematic framework and inclusive algorithms are both effective and efficient.

1. Introduction

The theory of navigation and positioning has been used in various application fields such as positioning equipment to monitor its working state and positioning a car or people to guide them to a certain place. In these applications, it is required that targets are positioned and tracked, which can be realized using the global navigation satellite system (GNSS) or GNSS aided inertial navigation system (INS). However, the GNSS system is subject to various limitations, the most critical one of which is jamming. GNSS signals are not always available due to the blockage of high buildings, canyons, and forests, among others. For this reason, a number of alternative technologies, including optical [1], radio [24], RFID [5], and acoustic [6], have been proposed for indoor and outdoor positioning systems over the years. Most efforts were focused on WiFi based localization which takes the advantage of WiFi infrastructures. By using the user’s smartphone to measure the signal strength from multiple WiFi access points, the user’s location can be constrained within a relatively small region in a large indoor environment. However, these systems rely on prepared infrastructures installed beforehand, and their accuracy critically depends on the number of available access points, which results in certain restrictions [5]. In addition, many results related to the application of the camera based positioning systems have been reported in the last few years [1], including simultaneous localization and mapping (SLAM) [7] and visual odometry [8]. Although SLAM is becoming a standard technique for indoor robotic applications, it is still challenging to apply SLAM in large outdoor environments.

In recent years, the size of camera networks grows quickly with the development of building safe and smart city. These cameras can provide complementary positioning information for moving target positioning in both indoor and outdoor environments when GNSS signals are unavailable. In this paper, we will deal with the moving target positioning based on a distributed camera network in a three-dimensional space. In existing camera networks, static cameras are generally deployed to cover a large region, and moving targets can only be detected and tracked via certain algorithms running in a central supervisor unit, but their positions are not determined [912]. In some practical applications, the total number of cameras is usually restricted by various factors such as the cost and placement of cameras. To address this problem, multiple pan, tilt, zoom (PTZ) cameras or the combination of PTZ cameras and static cameras can be deployed to fulfill some practical tasks [1318].

The target detection, target tracking, and camera calibration are key to the moving target positioning process. To extract moving targets from a video frame of a static camera, background subtraction is the most widely used approach [19, 20]. When the camera is stationary, the background scene is unchanging, such that it is convenient to construct a background model [21, 22]. The capability of efficiently and accurately estimating background images is critical for any robust background subtraction algorithm. A well-known method presented by Stauffer and Grimson [21] uses an adaptive strategy for modeling background. Therein, each pixel is modeled using a separate Gaussian mixture, which is continuously learnt based on online approximations. Target detection at the current frame is then performed at pixel level by comparing its value against the most probable background Gaussians. However, the adaptive Gaussian mixture algorithm suffers from a low convergence speed in the learning process, especially in complicated environments. For this reason, an improved adaptive Gaussian mixture learning algorithm was introduced in [23].

Moving target tracking is an important component in the field of computer vision and has been widely used in many applications, such as video surveillance [16], intelligent transportation [24], and multiagent systems tracking and control [25]. Target tracking aims to estimate the position and the shape of a target or a region in subsequent frames. During target tracking, a target is continuously tracked by correctly associating a target detected in subsequent frames with the same identified track. These methods and their variations commonly make use of the one-to-one assumption in the sense that a target can only generate at most one measurement in each frame and a measurement can only originate from at most one target. However, the one-to-one assumption rarely holds in practical applications due to the splitting and merging processes as well as multitargets existing in a common scene. In order to overcome these shortcomings, several approaches have been proposed for multitargets tracking in recent years [2629].

Camera calibration is an essential procedure in distributed multitarget positioning and determines the mapping between 3D world coordinates and 2D image coordinates in practical applications. The basic task of camera calibration is to compute the camera extrinsic and intrinsic parameters which determine the imaging model and the relationship between multiple camera coordinates. With respect to different applications, the corresponding calibration algorithms include the direct linear transformation (DLT) algorithm [9], Tsai algorithm [10], vanishing point algorithm [11], and Zhang algorithm [12]. These algorithms have respective advantages and disadvantages in various practical applications. In this paper, we will focus on a fast calibration algorithm based on the vanishing point theory, which overcomes the defects of traditional measurements.

The paper is organized as follows. The system framework is presented in Section 2. Section 3 focuses on the target detection and tracking. The fast calibration algorithm is presented in Section 4. The test results of target positioning based on a distributed camera network are reported in Section 5. Finally, we draw some conclusions and shed light on future work in Section 6.

2. Systematic Framework and Problem

The work presented in this paper originates from a research project of moving target tracking and positioning in the Digital Navigation Center (DNC) at Beihang University. The primary goal of the project is to develop a target positioning platform to realize monitoring and positioning targets in a large region. The systematic framework of the moving target positioning based on a distributed camera network is shown in Figure 1. Due to the field of view and price limitations, a mass of static cameras are installed in a practical application environment. In order to realize moving target positioning in a large region, it is necessary that the system supports targets detection and tracking. Since a target can no longer be detected, because either it leaves the field of view, it stops and becomes static, or it can no longer be distinguished from the background, it is reasonable to take target splitting and merging into account or detect multiple targets. Therefore, the performance of the target detection, tracking, and association algorithms will influence the reliability of the target positioning, and it is necessary that the target positioning results between cameras are fused into a world coordinate system .

In this paper, we tackle several problems including the targets detection, tracking, and association, as well as the fast camera calibration and target positioning in the moving target positioning system based on a distributed camera network in practical applications.

3. Target Detection and Tracking

3.1. Target Detection

Target detection is the basis of target tracking, target positioning, target recognition, action recognition, and so forth. There are some common algorithms such as the optical flow algorithm [30], the frame difference algorithm [31], and the background subtraction algorithm [32] in practical applications. The most well-known and the most widely used one is background subtraction for static cameras because it is convenient to construct a background model and extract moving targets.

The background modeling techniques can be divided into two categories: the parametric techniques that use a parametric model for each pixel location and the samples-based techniques that build their model by aggregating previously observed values for each pixel location [33]. The most popular parametric technique is based on the Gaussian mixture model (GMM) presented by Stauffer and Grimson [21]. This algorithm relies on the principle that the pixel value for the same location in the image sequences satisfies a Gaussian distribution, as illustrated in Figure 2.

While updating the background model, each pixel of a scene image is independently modelled by a mixture of at most Gaussian distributions and employs an adaptive strategy, with the result that the algorithm is adaptive and able to deal with multimodal backgrounds in a dynamic environment (e.g., changing time of day, clouds, swaying tree leafs, and etc.). However, since its sensitivity cannot be properly tuned, its ability to successfully handle high- and low-frequency changes in the background is debatable. To overcome these shortages, samples-based techniques [34] circumvent a part of the parameter estimation step by building their models from observed pixel values and enhance their robustness to noises. They provide fast responses to high-frequency events in the background by directly including newly observed values in their pixel models. However, since they update their pixel models in a first-in first-out manner, their ability to successfully handle concomitant events evolving at various speeds is limited, similarly to the limitation in its adaptive ability of dealing with the concurrent events with different frequency. In order to address this issue, random background modeling that is intuitively an improved samples-based algorithm is found in [33]. This algorithm assumes to be the value of the pixel at time and imposes the constraint that the influence of a value on the polychromatic space is restricted within the local neighborhood. Then, a set of sample values is used as a pixel model to classify a value to be either a background or a foreground pixel value.

An experiment was carried out using video data and compared with the Gaussian mixture model presented by Stauffer and Grimson [21], in which the video data set 1 comes from the evaluating data from Performance Evaluation of Tracking and Surveillance (PETS) database with the video image resolution of 768 × 576 pixels and the frame rate of 25 frames per second (f/s); the video data set 2 comes from a practical surveillance system data in the DNC of Beihang University, with the video image resolution of 352 × 288 pixels and the frame rate of 25 f/s. The experimental results are shown in Figures 3 and 4. As can be seen from Figures 3 and 4, since trees were swinging in the wind, such movements were classified as foreground motions by the Gaussian mixture model, while the random background model effectively detected the trees as the background. However, both algorithms do not take into account the shadow of the target and thus are severely damaged in terms of the reliability and robustness of the target detection and tracking, as illustrated in Figure 5, in which the used video data set 3 comes from a practical surveillance system in the DNC of Beihang University and is with the video image resolution of 352 × 288 pixels and the frame rate of 25 f/s.

In order to remove the damage from the shadow on target detection and tracking, we propose an algorithm by combining the random background model and the frame difference algorithm, and the mathematical model is described as follows: where denotes the mask image of the background differencing; denotes the dilation operation of the target region block; denotes the erosion operation of the target region; denotes the mask image of the difference between the dilated and eroded image operations.

Suppose that the number of the pixels with their values equal to 1 in is and the number of the pixels that are detected as foreground from differencing image and the values of which at in the template equal to 1 is . If , where denotes a threshold, then the target region block is the foreground target; otherwise, it is the shadow of the target.

An experiment is carried out using the video data set 3 and compared with the GMM. The experimental results are shown in Figures 6, 7, and 8. As can be seen, the algorithm presented in this paper is effective to remove the shadow of the target.

3.2. Target Tracking

Once moving targets are detected, the track initialization event is triggered, such that the moving targets can be continuously tracked by the tracking algorithm in the living period of a track (which starts from its initialization to its termination [35]). The termination of a track occurs when a target can no longer be detected because it leaves the field of view, it stops and becomes static, or it can no longer be distinguished from the background. Detected targets are not confirmed to be true moving targets until they have been consistently tracked for a period of time before their target tracks are initialized. We create a dynamic list of potential tracks using all detected targets. Associations will be established between targets detected in a new image frame and potential tracking targets. When a potential target is tracked in several continuous frames, it is recognized as a true moving target and a track will be initialized.

Compared to a single target tracking, the multitarget problem poses additional difficulties: data association needs to be solved; that is, it has to be decided which observation corresponds to which target; constraints between targets need to be taken into account. Multitarget tracking algorithms can be roughly divided into two categories: the recursive algorithms and the nonrecursive algorithms. The recursive algorithms base their estimate only on the state of the previous frame such as Kalman filtering [36] and particle filtering [27], in which different strategy is used to obtain an optimal solution over multiple frames and can thus better cope with ambiguous, multimodal distributions. The nonrecursive algorithms seek optimality over an extended period of time [37, 38].

In practical applications, target tracking takes target splitting and merging into account because of the factors such as illumination changes and occlusion. Since the one-to-one tracking assumption rarely holds, multitarget tracking problem is still challenging. In order to realize multitarget tracking, we propose a solution by combining the pyramid Lucas-Kanade feature tracker [39] and Kalman filter. The mathematical model of Kalman filtering is described as follows: where denotes the state; denotes the state transition matrix; denotes the system noise; denotes the measurement matrix; denotes the measurement value; denotes the measurement noise; and denote the horizontal and vertical ordinates of the target centroid; and denote the width and height of the target envelope rectangle; , , , and denote the speeds of the , , , and , respectively.

To realize multitarget tracking across multiple cameras, we construct a similarity function of target matching to realize target association and target tracking across multiple cameras. The similarity function is described as follows: where and denote targets which will be matched; denotes their position similarity (the larger the is, the closer their positions are); denotes the similarity of their sizes (the larger the is, the closer their sizes are); denotes the similarity of their heights (the larger the is, the closer their heights are); , , and denote the weight coefficients and satisfy ; and denote the horizontal and vertical ordinates of the target in the world coordinate system; and denote the horizontal and vertical ordinates of the target in the world coordinate system; and denote half of the widths of the targets and , respectively; and denote half of their heights, respectively; denotes the absolute difference between and ; denotes the absolute difference between and ; and denote their heights in the world coordinate system; (the larger the is, the higher their matching similarity is), and vice versa. In practical applications, , , and can be adjusted according to the accuracy of the , , and .

An experiment is carried out by using the video data sets 3 and 4 which come from a practical surveillance system in the DNC of Beihang University, with the video image resolution of 352 × 288 pixels and the frame rate of 25 f/s. The experimental results are shown in Figures 9 and 10, in which the rectangle with the dotted blue lines denotes the overlapping area between two cameras. As can be seen, the target tracking algorithm presented in this paper is effective and can track multiple targets across the multiple cameras.

4. Fast Camera Calibration and Target Positioning

4.1. Fast Camera Calibration

Camera calibration is a key technology in determining the mapping between 3D world coordinates and 2D image coordinates for various computer vision applications. A schematic diagram describing the mapping between 3D world coordinates and 2D image coordinates is shown in Figure 11. is the image coordinate of if a perfect pinhole camera model is used. is the actual image coordinate which deviates from due to lens distortion. The distance between them is termed as the radial distortion. Therefore, the mathematical model from 3D world coordinates to 2D image coordinates is expressed by [40] where denotes the element of the rotation matrix from the world coordinate frame to the pixel coordinate frame; and denote the focal lengths of the camera in the and directions; and denote the photogrammetric distortions; denotes the coordinate of the camera in the world coordinate frame; denotes the coordinate of the target in the world coordinate frame.

Traditional calibration algorithms, for example, DLT algorithm [9] and Tsai algorithm [10], utilize a series of mathematical transformations and algorithms to obtain parameters of the camera model. They have been widely used because of their simple mathematical model and theory. However, these algorithms require a large amount of work to record and check calibration points during calibration and thus are inefficient in practical applications. For this reason, a fast calibration algorithm based on the vanishing point theory was introduced in [11, 41], in which the photogrammetric distortions and are unconsidered.

According to the mathematical model of the fast camera calibration presented in [11], we develop software to calibrate cameras fast and can promptly check the accuracy of camera parameters by (4). The calibration process and results of a practical camera are shown in Figure 12. As can be seen, the focal lengths of the camera are 3998.3535 pixels, the calibration error of the line segment is 3.98 mm, and the rotation matrix and translation vector (unit: mm) are as the following:

4.2. Target Positioning

Once targets are continuously tracked, the space coordinates of the targets in the camera coordinate frame can be computed by imaging model with camera parameters as follows: where and denote the principle point in the pixel frame and and denote the pixel sizes of the camera in the and directions.

When the targets are across multiple cameras, the space coordinates of the targets in each camera coordinate frame are computed, respectively, by (6) and then are unified into the world coordinated system, as illustrated in Figure 13. The mathematical model is described as follows: where denotes the th camera and denotes the number of cameras.

5. Experiment Test

The target positioning system presented in this paper is tested in indoor and outdoor environments, in which all the low-cost static cameras are calibrated, and the coordinates of cameras are unified into the world coordinate system. Targets detection and tracking are done by the target detection and tracking algorithm. As a result, the targets are positioned by imaging model and camera parameters in real time, and their trajectories are displayed in a three-dimensional scene. The test results are, respectively, shown in Figures 14 and 15 for indoor and outdoor environments. As can be seen from Figure 14, when a target is continuously moving within an indoor corridor, the positioning system consisting of six distributed cameras is able to position this target in real time and display its trajectory in a three-dimensional space model. Likewise, as can be seen from Figure 15, when a target is continuously moving outdoors, the positioning system consisting of seven distributed cameras is able to position this target and display its trajectory in a three-dimensional space model in real time as well. The experimental results confirm that the systematic framework and inclusive algorithms presented in this paper are both effective and efficient.

In this paper, we assume that the ground is flat, which rarely holds in practical applications given a large region. In order to solve this problem, it is necessary to use digital elevation model (DEM) to describe topographic relief in large regions.

6. Conclusion and Future Work

This paper presented the comprehensive design and implementation of a moving target positioning system based on a distributed camera network. The system is composed of low-cost static cameras, which provide complementary positioning information for moving target positioning in indoor and outdoor environments when GNSS signals are unavailable. In this system, static cameras can cover a large region, moving targets are detected and then tracked using corresponding algorithms, target positions are estimated by making use of the geometrical relationships among those cameras after calibrating those cameras, and finally, for each target, its position estimates obtained from different cameras are unified into the world coordinate system. The experimental results of the target detection, tracking, and positioning system were reported based on real video data.

Targets positioning and tracking with multiple static cameras were verified in both indoor and outdoor environments. However, the reliability and accuracy of target tracking and positioning suffer from several environment factors. Hence it is necessary to fuse information from various sensors, such as radar, infrared camera, inertial measure unit (IMU), and wireless location system. Regarding future work, it is meaningful to develop and test these algorithms in the practical applications.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

Acknowledgments

This project is supported by the Key Program of the National Natural Science Foundation of China (Grant no. 61039003), the National Natural Science Foundation of China (Grant no. 41274038), the Aeronautical Science Foundation of China (Grant no. 2013ZC51027), the Aerospace Innovation Foundation of China (CASC201102), and the Fundamental Research Funds for the Central Universities.