#### Abstract

This paper presents a novel surveillance system named thermal omnidirectional vision (TOV) system which can work in total darkness with a wild field of view. Different to the conventional thermal vision sensor, the proposed vision system exhibits serious nonlinear distortion due to the effect of the quadratic mirror. To effectively model the inherent distortion of omnidirectional vision, an equivalent sphere projection is employed to adaptively calculate parameterized distorted neighborhood of an object in the image plane. With the equivalent projection based adaptive neighborhood calculation, a distortion-invariant gradient coding feature is proposed for thermal catadioptric vision. For robust tracking purpose, a rotational kinematic modeled adaptive particle filter is proposed based on the characteristic of omnidirectional vision, which can handle multiple movements effectively, including the rapid motions. Finally, the experiments are given to verify the performance of the proposed algorithm for human tracking in TOV system.

#### 1. Introduction

With the developing of computer vision and artificial intelligent, automatic surveillance system becomes a hot research topic in this decade. Conventionally, most surveillance systems [1, 2] adopt the traditional visible spectrum camera for particular monitor purpose. However, this kind of system has limited application as it relies on the proper illumination and has a narrow field of view. This paper proposes to introduce a novel TOV surveillance system. Compared to the conventional sensor, the proposed system can work in total darkness with a global field of view.

In computer vision community, visual tracking [3–5] is an important research topic for automatic surveillance system [6]. Many intelligent vision systems [5, 7] have been developed during this decade. However, most of them focus on the conventional imaging system. In [8], the authors adopted support vector machine (SVM) [9] for classification and use Kaman filter to integrate with mean shift for tracking pedestrian in thermal imagery. Yasuno et al. presented a system for pedestrian detection and tracking in far infrared images. They employed the P-tile method to detect the pedestrian firstly. Then, the detected pedestrian becomes the template for matching to realize tracking purpose [10]. In [11], the authors presented a two-stage template-based method combined with an Adaboosted classifier for pedestrian detection in thermal image. In [12], a generalized expectation-maximization (EM) algorithm is used to separate infrared images into background and foreground layers first, and they incorporated with SVM for pedestrian classification. Then, they presented a graph matching-based method for the tracking purpose. A vision based approach to track the human on a mobile robot using thermal images is presented in [13]. The approach combines a particle filter with two alternative measurement models for tracking.

To enable surveillance with a wide field of view, a catadioptric omnidirectional sensor is adopted. The omnidirectional camera as a novel imaging sensor has drawn lots of concerns in computer vision community in these decades. Compared to the conventional vision sensor, omnidirectional camera can provide a 360° view of the environment in a single image with a compact system configuration. Therefore, it may have a great promise for a wide range of applications [14], especially if in the situation requires a wide field of view. In [15], a fisheye omnidirectional tracking system is presented. They used the optical flow to detect the target and employ color histogram integrating with kernel based particle filter to realize single target tracking in omnidirectional vision. In [16], the authors presented a catadioptric omnidirectional surveillance system which uses multibackground modeling and dynamic thresholding to make a target tracking in the clutter field to spot the sniper at the battlefield. In addition, some algorithms utilize the color information to integrate with the particle filter for tracking in omnidirectional vision [15, 17, 18].

Thermal vision presents a temperature field distribution of the surrounding environment using single channel of gray level intensity. Color or texture information may be unstable to be used in thermal image. However, temperature field makes the contour information become salient. Therefore, contour information should be considered as an important clue that can be used to distinguish the object from the other in thermal vision. However, it is difficult to directly apply the most conventional contour features to the catadioptric omnidirectional vision due to its nonlinear distortion [19]. A common solution is to unwarp the distorted omnidirectional image to a panoramic image or transform the coordinate of local area of omnidirectional image into a rectified image followed by using of conventional algorithm [2]. However, the computational load of this method is extensive as the interpolation is involved. Furthermore, it may introduce noise in the image, which will degrade the performance of the algorithm. Moreover, underlying distortion still exist in the rectified image. Nowadays, it is more and more considered that the rectangular window and template matching commonly used in traditional images are not adapted for catadioptric vision due to their serious nonlinear deformation. To solve the problem of distortion, this paper adopts the equivalent theory proposed by Geyer and Daniilidis to model the single viewpoint catadioptric sensor with a two-step projection via a unitary sphere centered on the focus of the mirror (Geyer and Daniilidis 2001) [20]. We define a spatial gradient coding template on the equivalent sphere and achieve a distortion adaptive coding template in the image plane through model back-projection. With the modeled distortion-adaptive neighborhood, a distortion-invariant gradient coding feature is developed for TOV. For robust tracking, we propose to develop a rotational kinematic model based particle filter based on the characteristic of our system. Compared with the zero-velocity model, the proposed tracking algorithm should be able to handle more challenging situations, including rapid movement. Due to the involvement of kinematic model, the proposed tracker is able to predict the state of target more reasonably with a small number of particles. Although occlusion is an extremely challenging situation in thermal vision, the proposed tracking algorithm is able to handle the short term occlusion effectively based on the distinct kinematic state of a target during tracking process. Finally, a series of experiments are given to verify the effectiveness of the proposed algorithm. The schematic diagram of proposed tracking approach is shown in Figure 1.

The remainder of this paper is organized as follows. Section 2 introduces the principle of the equivalent sphere projection and the proposed distortion-invariant neighborhood adaptive gradient feature. Section 3 presents the proposed rotational kinematic model based adaptive particle filter for omnidirectional vision. A series of qualitative and quantitative analyses are given in Section 4 to verify the performance of proposed algorithm. Finally, Section 5 concludes this paper.

#### 2. Equivalent Projection Modeled Gradient Coding Feature

##### 2.1. Equivalent Projection Based Adaptive Neighborhood Definition

The adaption of the neighborhood is essential to guarantee the accuracy of visual tracking. Conventional neighborhood of a given point for the perspective images is usually simply defined as the square region centered at this point. Central catadioptric omnidirectional vision exhibits serious nonlinear distortion due to the involvement of a quadratic reflection mirror. Therefore, conventional neighborhood definition is not appropriated for catadioptric images because it does not take into account the distortion of the image.

Catadioptric omnidirectional vision which can be modeled by a unified projection model was introduced by Geyer and Daniilidis [20] who have demonstrated the equivalence with projection via a unitary sphere centered on the focus of the mirror. This two-step projection consists first in projecting a 3D point to sphere from the center of the sphere . The next step consists in projecting the point on the sphere to the image plane from a point placed on the optical axis to obtain a pixel point (Figure 2). The equivalence is very interesting since it allows performing image processing in a new space in which deformations are taken into account. In order to deal with distortions, we suggest working in the equivalent sphere space. This sphere surface can be represented using spherical angles: the azimuth and the elevation . The localization of a point with spherical coordinates is defined by two parameters .

Let us define a point on the sphere at and its corresponding point in the image plane is . Then, the spherical neighborhood of , noted , is defined aswhere is the set of spherical points contained in the surface patch centered at and whose ranges along and directions are and , respectively. Correspondingly, the neighborhood of a point in the image plane is defined as the pixels that lie in the projection of the spherical neighborhood of its spherical point onto the image plane.

##### 2.2. Equivalent Projection Modeled Gradient Coding Feature

Tracking in the TOV is difficult as limited information can be utilized in thermal image coupled with serious nonlinear distortion of catadioptric vision. A thermographic camera is a device that forms an image using infrared radiation. Different to the conventional visible image, thermal image reflects the temperature field distribution of the object or the environment. Therefore, only one channel of gray level pixel represents the intensity of the temperature range; that is to say, fewer features can be employed in thermal image. However, contour information is salient over the temperature distribution image, and it is a stable clue which can be used to distinguish the object from the others in thermal image. Coupled with catadioptric sensor, the contour feature presented in the image is seriously deformed. This paper presents an adaptive neighborhood modeled gradient coding feature for TOV based on the equivalent sphere theory to realize a distortion-invariant target representation for visual tracking. Before applying the feature coding, the algorithm should calculate the gradient over the image samples as (2) to generate the gradient map. With the equivalent projected neighborhood model, the distortion-adaptive coding template can be obtained. To enable an even coding, we uniformly define a series of spatial conics in the spherical coordinate system with the specific angle interval in the azimuth and elevation directions, respectively. The interface between the unit sphere and spatial conic on the sphere is the defined spatial coding template which can be used to back-project in the image plane for the distortion involved coding template generation (Figure 3). As shown in Figure 3, the area of the coding units varies due to the effect of distortion. To eliminate the effect of area difference between the coding units, the gradient information inside the coding units is averaged for unit-normalization. This step can transform the distorted contour feature inside the units to a normal space. Through the distortion normalization, the proposed distortion-invariant gradient information can be obtained. The normalized gradient features inside the coding units are concatenated to formulate the final contour coding distortion-invariant feature, and it will be classified by the support vector machine (SVM). For training purpose, we can directly extract the gradient feature from the conventional rectangle images. Then, the trained classifier can be applied over to the distortion-invariant gradient feature for classification. Consider

#### 3. The Adaptive Particle Filter

Particle filter [21–23] is also known as Sequential Monte Carlo method (SMC), which has been widely used in nonlinear/non-Gaussian Bayesian estimation problems. In Bayesian framework, the aim of particle filter is to recursively estimate the hidden state , given a noisy collection of observations, up to time (). Suppose that posterior at time is available; the posterior can be obtained recursively by prediction and update. The prediction stage makes use of the probabilistic state transition model to predict the posterior probability of time instant as . When observation is available, the state posterior can be updated using , where is characterized as the observation model. Therefore, state transition model and observation model are two important components to enable the tracking performance of particle filter.

##### 3.1. Observation Model

Observation model characterizes the observation likelihood of the particle filter. It is an important component to measure the probability confidence of the observed data for state updating. In this paper, we employ the possibility confidence of the classifier to effectively calculate the observation likelihood. Accordingly, a parameter is defined to measure the similarity between a sample candidate and a standard positive sample (equation (3)). Then, the observation model can be obtained by (4), where is the variance as follows:With the given observation model, the weight of particles (equation (5)) can be calculated to effectively guide the particles for tracking purpose.

##### 3.2. Adaptive Rotational Kinematics Based State Transition Model

The state transition model characterizes the kinematics of target in tracking process. With a fixed system noise variance , zero-velocity Gaussian state transition model could well handle the random work if the system variance can cover the unit translation of target. However, it may have a limited performance when the system variance is less than the unit displacement of target, such as rapid movement. Although its performance can be improved by increase of variance but it also may result in computational inefficiency as many more particles are needed to accommodate the large noise variance. Particularly in the thermal vision, a state transition model with a high noise variance is very easy to involve much interference.

Based on the characteristics of the omnidirectional image, this paper proposes to apply the polar coordinate system in the image plane. For dynamic tracking application, we import a rotational kinematic model [24] into the particle filter. According to the rotational kinematics in polar coordinate, we decompose the kinematic model into angle and radial directions. With the proposed adaptive particle filter, the system kinematic state in motion vector can be effectively predicted based on the motion history of target. The defined rotational kinematic model is shown as follows:where is the estimated state (). is the noise variance at time instant in state direction. In fact, it is hard to predict the motion status of target in advance at the most practical applications. Therefore, a manual presetting control factor is difficult to achieve a satisfactory performance in a compound movement. In this paper, an adaptive control factor is proposed as shown in (6) and (7), which could scale the kinematic model adaptively based on the motion history of target. It is defined as follows:where is the noise variance in component at time instant .

Based on (9), the adaptive control factor will have a response in accordance with the changing of previous kinematics parameters () of target. If the system tracks the target with the rapid movement, a higher value of will be generated and its acceleration component also will respond according to the motion trend of target. In other words, when the unit displacement of target is beyond the range of system noise variance, proportionally scaling up the kinematic model could effectively assist the tracker to estimate the state of target close to its true solution. Therefore, kinematic model will be activated significantly during the rapid movement. Oppositely, when the unit displacement of target is less than the system noise variance, kinematic model will be greatly suppressed by the control factor as the excessively amplified kinematic model will cause the overestimation which will lead to vibration of system. Therefore, we can employ the control factor to scale the kinematic model properly based on the history motion status of target. With the adaptive control factor , the system state transition model is able to estimate the state direct to the true solution properly but avoid the overestimation. With the restraint of inertia of object, its motion state is impossible to make a very sharp change in a short unit time. Therefore, the proposed adaptive control factor should be able to timely adjust the kinematic model for quick response of the changing motion status of target.

With the adaptive adjustment of control factor , the proposed rotational kinematic model based adaptive particle filter should be able to robustly handle more comprehensive movements, including rapid movements. Also, the embedded rotational kinematic model will not affect the stability of the tracking system in the normal-speed movement (the system noise variance can cover the unit displacement of target). To verify the performance of adaptive particle filter in normal-speed movement, we present an experiment to test its tracking accuracy with the zero-velocity modeled particle filter due to the good performance of zero-velocity tracker in normal movement. For a fair comparison, both trackers are implemented with some identical parameter settings, such as particle number and noise variance . Here, the scale state follows the random Gaussian distribution. As shown in Figure 4, the adaptive particle filter has a comparable performance with zero-velocity modeled particle filter in this experiment and their RMSEs are around 1.78. Therefore, it can be verified that the proposed adaptive kinematic model based tracker has a stable performance on the normal-speed movement.

#### 4. Experiments

In this section, we present a series of experiments to verify the effectiveness of the proposed algorithm on human tracking in TOV. Since there is no available TOV dataset in public, we build a thermal omnidirectional sensor for data collection, which consists of a FLIR Therma CAM PM 695 camera and a hyperboloid catadioptric omnidirectional mirror (Figure 5). The established TCO database contains several image sequences with different ambient conditions. Each set of image sequences contains hundreds of TOV frames that are sampled with 20 Hz in a resolution of 320 × 240. To verify the performance of the proposed algorithm, the detailed experiments are shown as follows.

##### 4.1. Accuracy Analysis of Adaptive Neighborhood Modeled Gradient Coding Feature

Unlike the conventional vision, thermal vision reflects the temperature distribution. Due to the difference of temperature distribution, we can roughly distinguish the object from the others based on the clue of contour information. As the involvement of catadioptric sensor, the contour distribution of object is seriously distorted in TOV. To effectively handle the nonlinear distortion, an equivalent projection based gradient coding feature is proposed for this system. To ensure a satisfactory performance of the proposed feature, a suitable sampling density for coding template is necessary. If the sampling density is dense, it may result in data redundancy. Oppositely, it may lead to undersampling if the sampling is too sparse. For that purpose, this paper selects three groups of configuration for coding template in a reasonable range to test their performances. This paper sets the neighborhood’s aspect ratio of a human target as 1/2. We define three templates with 12, 16, and 20 units in the height direction and they are represented as EP12, EP16, and EP20 for short. Correspondingly, these templates have 6, 8, and 10 units in the width direction, respectively. In this experiment, we compare the performance of our algorithm with the local coordinate transform [2] based histogram of oriented gradient (HOG) [25]. For a fundamental comparison of their performance, we use the zeros-velocity standard particle filter with the Gaussian random scale distribution for tracking testing.

Figure 6 shows that equivalent projection based trackers achieve better performance than the local coordinate transformed HOG based tracker. The RMSEs of EPs-G are less than 3.5 but the RMSE of LCT-HOG is around 5.3. Therefore, it can be concluded that the equivalent projection based features perform much better than local coordinate transform based feature. Analyzing the performance of the algorithms from the level of coding complexity, HOG integrates the gradient information with its orientation into a whole framework, which should perform better than the method with only gradient feature integrated; also a comparison to verify this phenomenon has been presented in [26]. However, EPx-Gs achieved more stable performance than LCT-HOG because equivalent projection could effectively model the nonlinear distortion of omnidirectional vision but local coordinate transform just supplied a linear projection model which is apparently not suitable to the catadioptric vision. In addition, the EP16-G obtains the best performance (RMSE = 1.3962) when 300 particles are being applied. Therefore, this paper employs the feature configuration of EP16 for adaptive particle filter in the following experiments to further discuss the human tracking in thermal catadioptric vision.

##### 4.2. Performance Analysis of Adaptive Particle Filter

On the basis of characteristics of the proposed system, this paper presented a rotational kinematic modeled adaptive particle filter for tracking purpose. To verify the effectiveness of the proposed algorithm, a series of analysis and experiments are given in Figure 7.

To analyze the performance of the proposed adaptive particle filter, we compare it with the method proposed in [27] which presented a motion estimation based adaptive particle filter for face tracking. In [27], the authors are required to manually preset the scaling factor of motion model in advance. In practice, a presetting motion model is difficult to meet the requirement of the whole experiment, especially for the compound movement. If the motion model is being excessively used, it very easily causes system vibration that must lead to the tracking accuracy decline. Here, we give an experiment to compare the RMSE of the proposed algorithm and the whole motion modeled method in [27]. For a fair testing, both trackers are implemented with the same system parameters, such as the number of particles. As shown in Figure 7, the RMSEs of M-PF are around 2.18 and they achieved the lowest RMSE equal to 2.0635 that is still higher than all the RMSEs of P-PF. Therefore, the tracking accuracy of M-PF is lower than that of P-PF obviously. On the other hand, if the half of motion model is implemented, the tracking accuracy of system should be improved but it may be difficult to handle some challenging rapid movements. For comparison, we test the above trackers on a rapid movement experiment that depicts a target move with a high speed which is 6 to 7 times higher than that of the normal situation. As shown in Figure 8, method [27] fails to track the target at the early stage of the experiment due to the shortage of motion model. Therefore, it can be concluded that a fixed preset motion model is hard to flexibly accommodate the multiple movements. In contrast, our proposed adaptive tracker could achieve a satisfactory performance since the adaptive kinematic model of system can be adjusted automatically based on the motion status of target.

To further analyze the effectiveness of the proposed kinematic model, we present a compound movement experiment which describes a rapid movement mixed in a normal speed walk from a single target. At the early stage of this experiment, a person walks around the omnidirectional sensor slowly and the system variance can just cover the unit displacement of the person. During this process, the kinematic model is adaptively suppressed by the control factors to ensure the stability of system. As shown in Figure 9, the control factors keep small in angle and radial directions. Accordingly, the predicted kinematic parameters are suppressed (Figure 10). From Frame 42, the target suddenly accelerates in angular direction and keeps the high speed movement with a few frames. Following the changing of motion status of target, the system quickly responds that the velocity factor in angle direction is stimulated to a peak near to the maximum (Figure 9(a)). Accordingly, the predicted velocity in angle direction is scaled up close to the true value at that moment (Figure 10(a)). For the acceleration, the acceleration factor is activated significantly (Figure 9(c)), and the predicted acceleration is also being amplified accordingly (Figure 10(c)). With the involvement of velocity factor , the predicted velocity could catch up the true value effectively during the rapid movement. A few frames later, the target decelerates sharply to recover the low speed movement. Therefore, the predicted velocity falls timely since the velocity factor recovers to a small value. Because of the sharp changing of velocity in angular direction, the acceleration factor and predicted acceleration have the significant responses. Then, the control factors and kinematic parameters in angular direction are suppressed in the low-speed movement. Likewise, the motion status of target in radial direction has little change during the rapid movement. Accordingly, the control factor and kinematic parameters in radial direction have the correct but not drastic responses (Figures 9(b), 9(d), 10(b), and 10(d)) at that moment. Through this experiment, the performance of our proposed algorithm has been further verified which could robustly track the target throughout the entire compound movement.

**(a)**

**(b)**

**(c)**

**(d)**

**(a)**

**(b)**

**(c)**

**(d)**

##### 4.3. Occlusion Handling

Occlusion is a challenging topic in computer vision. Particularly for thermal vision, multi-targets tracking is extremely challenging since very limited features are usable. In this paper, we propose to employ the kinematic characteristic of the object to decrease the influence of occlusion to a great extent in our system. Technically, occlusion may be caused by the obstacle or the target. In our system, the occlusion caused by obstacle may be activated if the mean weight of particles decays sharply but their mean radial state is still in a reasonable value range (, ). In this case, the whole kinematic model will be implemented and the motion states of particles will be kept with a few frames until the target shows again.

In the meantime, system sampling is maintained for target searching, and the system noise variance and particle number will be magnified proportionally to broaden the searching area. For multitarget tracking in TOV, we centrally manage the states of target to handle the occlusion from the targets. If any of two targets getting are closed and the angle between them is less than a threshold (, ), it declares occlusion from targets is going to happen. For this situation, the motion states of targets will be kept with a few frames until their intersection angle is bigger than the predefined threshold again. During this process, the sampling of particles will be closed in case of the interference of undistinguishable contour caused by the overlapping. Through the experiments, it can be verified that the proposed adaptive particle filter can effectively handle the short term occlusions in TOV (Figures 11 and 12).

This section presented a series of experiments to validate the effectiveness of the proposed algorithm for TOV. With the involvement of equivalent projection model, a distortion-adaptive gradient coding feature is proposed and its performance has been proved by a tracking accuracy experiment. Moreover, the experiments verified that the proposed rotational kinematic model based adaptive particle filter can achieve a satisfactory performance even in the complex movements. Finally, our system is implemented in Matlab on a PC of an Intel Pentium 2.7 GHz with 2 G RAM, and we achieved around 0.65 seconds with 200 particles per frame without optimization. Therefore, the proposed algorithm should have a great potential for real-time application in surveillance if it is implemented in C/C++ and taking advantage of GPU processing.

#### 5. Conclusion

In this paper, we introduced a novel thermal omnidirectional sensor that can work in total darkness and can achieve a global field of view in a single image. With the effect of distortion, conventional contour features are hard to be applied over to the proposed omnidirectional surveillance system directly. Based on the equivalent projection theory, an adaptive neighborhood-modeled gradient coding feature is proposed to effectively represent distorted visual information in the catadioptric image. For tracking purpose, a rotational kinematic modeled adaptive particle filter is proposed to effectively handle multiple movements, even including the rapid movement and the short term target occlusion. However, since only limited information can be employed in thermal vision, long term occlusion in thermal omnidirectional system is still a challenging topic which should be solved in our future work. Importing a visible sensor into the thermal omnidirectional system may compensate the drawbacks of the thermal sensor and enrich the features pool that we can adopted, which may supply the supports to reduce the effect of occlusion with a great extent.

#### Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

#### Acknowledgments

This work was supported by the National Natural Science Foundation of China (Project nos. 61273286, 61233010) and City University of Hong Kong (Project no. 9680067). The authors acknowledge Xiaolong Zhou as a coauthor of the paper.