## Target Tracking via Particle Filter and Convolutional Network

^{1}College of Automation, Harbin Engineering University, Harbin, China^{2}College of Electrical and Information Engineering, Heilongjiang Institute of Technology, Harbin, China

#### Abstract

We propose a more effective tracking algorithm which can work robustly in a complex scene such as illumination, appearance change, and partial occlusion. The algorithm is based on an improved particle filter which used the efficient design of observation model. Predefined convolutional filters are used to extract the high-order features. The global representation is generated by combining local features without changing their structures and space arrangements. It not only increases the feature invariance, but also maintains the specificity. The extracted feature from convolution network is introduced into particle filter algorithm. The observation model is constructed by fusing the color feature of the target and a set of features from templates which are extracted by convolutional networks without training in our paper. It is fused with the features extracted from convolutional network for tracking. In the process of tracking, the template is updated in real time, and then the robustness of the algorithm is improved. Experiments show that the algorithm can achieve an ideal tracking effect when the targets are in a complex environment.

#### 1. Introduction

Object tracking has a wide application prospect in computer vision. Recently, many researchers have carried out a lot of research on it in the real world [1]. Detecting and tracking the target is a very difficult task in practical application [2]. Many factors can impact the performance of the tracking algorithm. These issues are made up of attitude change, appearance variation owing to illumination changes, partial occlusion, and background noise [3, 4]. To solve all these problems, we need more efficient machine learning [5] and feature extraction [6] to describe the target.

At present, the tracking algorithms mainly include two types: generative model and discriminative model [5]. Particle filter is one of the representatives of generative tracking algorithm. Particle filters have been used widely in the tracking problem. Particle filter algorithm has the advantage of simplicity and flexibility. And it is easy to handle non-Gaussian and multimodality system model. There are many related literatures presented in [7–11]. The information from different measurement sources can be used in the framework of particle filter. This has greatly improved the tracking performance. But in the actual process of tracking, there are still a lot of ways to improve the effectiveness of the tracking algorithm. In addition, the classical particle filter usually adopts the dynamic model with global information. Regardless of whether the target is blocked or deformed, it treats the target as a whole. This leads to the neglect of the local information of the target. When the target is partially occluded and local appearance of it changes, particle filter algorithm cannot accurately track the target.

Discriminative methods are used to distinguish targets and backgrounds by training classifiers. At present, most of deep learning methods are also attributed to discriminative frames in the target tracking. Deep learning has made outstanding achievements in the field of image classification and target detection. It has become one of the most powerful automatic feature extraction methods. The deep network can get high-level abstract features gradually from low-level features through learning and mapping of multiple levels. These abstract features have high dimension and strong distinction. High accuracy of classification and regression tasks can be achieved by using simple classifier. At present, some tracking methods based on learning feature have been proposed, using convolutional networks trained offline [12, 13]. In tracking, the target localization is achieved by intercepting the characteristics of the target in different layers of the network.

The key point in all of these methods is how to learn an effective feature extraction offline with a great deal of auxiliary data, and it consumes a lot of time. The methods also have given no consideration to the similar local structure and inner geometry distribution information between the targets over consequent frames, which is handy and effective in distinguishing the target from background for visual tracking. In addition, the pure use of the deep learning method does not solve the problem of tracking drift, and it needs to be combined with other methods in order to better play the role of the depth network [14, 15].

In summary, a target tracking algorithm combining particle filter and convolution network is proposed in the paper. The extracted features from convolutional networks are introduced into the particle filter framework. The target block is represented by sparse representation. The local information and spatial information of the target are fully exploited to represent the state change of the object. According to the target state, different information is dealt with. Because the global pieces of information of particle filter are combined to determine the position of the current target, the local appearance change and partial occlusion problem of the target are better solved. And in the tracking process, the template is updated according to the tracking results, which improves the robustness of the algorithm to a certain extent. Experiments show that when the target is in a complex environment, the algorithm can achieve an ideal tracking effect.

#### 2. Particle Filtering Tracking Formula

The tracking problem for particle filter is to estimate the posterior probability density at the moment, which is obtained by two steps [10].

*Step 1 (prediction). *First, suppose the initial value of the probability density is known and the posterior probability density function is also known at the moment. is described as a three-dimension vector; . , express the position of object. expresses the size change of the object. Then the prior probability iswhere is defined by the state equation of the target.

*Step 2 (updating). *Finally, is obtained from the observation model of the system.The observation likelihood function is determined by the observation of the target. is a normalization constant.

In fact, since the integral of formula (1) is difficult to realize, the recursive Bayesian filtering (i.e., particle filter) is simulated by the nonparametric Monte Carlo method. The basic formula iswhere is the weight of the corresponding particle. The weight of the particle is updated according to the observation value. That is,where is the proposed distribution (importance density) function in Bayesian importance sampling. The optimal choice is to select the proposed distribution as a priori density. That is, . Then the weight is

Finally, the real state estimate of the target is obtained; that is,

#### 3. Model Construction and Implementation

##### 3.1. Target Motion Model

For images in a new frame, each particle carries out state transition according to the following motion model:where is the Gauss white noise and is the propagation radius the particle, which is proportional to the average state change of the target at the previous moment, .

##### 3.2. Target Observation Model

Each input image is divided into a fixed size of pixels, denoted as . A set of local image blocks is obtained by densely sampling through sliding a window of size ( is called the receptive field size). is the th image block and . Each block is preprocessed through subtracting the mean and normalization, respectively [11].

###### 3.2.1. Color Model

*(**1) Target Model. *, is vectorized image patches, with zero as the center. The number of Eigen values’ bin is . The probability of the Eigen value of the target model is [16]where is kernel function used to adjust the size of the weights, is delta function, represents the color value of pixels at , is the color index of the histogram, and is normalized constant coefficient.

*(**2) Target Candidate Model.* For empathy, taking as the center, the probability of the target candidate model iswhere window radius is and is normalized constant coefficient.

*(**3) Similarity Function.* In the paper, Bhattacharyya coefficients are used to calculate the similarity functions of two models [17]:

It has a value of 0~1. Then we suppose the distance between the two target templates as

The corresponding color observation probability of the particle is obtained:where mean square deviation is .

###### 3.2.2. Convolutional Networks Model

In order to describe the target better, we applied convolutional networks to learn robust representations for visual tracking without offline training using a large amount of auxiliary data, which is inspired by recent studies [11, 18]. First, we use predefined convolutional filters to extract the high-order features. Second, we generate a global representation by combining local features which their structures and space arrangements need not be changed. So it increases feature invariance while maintaining specificity.

*Step 1 (target layer). * The -means clustering method is used to cluster out filters as a convolution kernel from patch. The patch is . Given the th filter , its response on the input image is denoted with a feature map , where and is the convolution operator.

*Step 2 (background layer). *At the same time, the useful background information around the target is used to distinguish the target from the background. samples are selected around the background target, and -means is used to select a bank of filters . We use the average pooling method to summarize each filter in , and generate the background context filter set . Then it does the convolution with the input image . Finally, the simple cell feature maps are defined as

*Step 3 (convolution layer). *At first, simple cell feature map consists of the filter set . Then different feature maps are stacked to construct a three-dimensional tensor , that is, the combination of the characteristic graphs. This kind of specificity has the characteristic of shift and sensitivity. In addition, the warp region is , which makes the characteristic of the target scale robust.

To increase the robustness of appearance change, we represent the feature by using the sparse representation.

Then the solution of the model can be solved by using the method of soft shrinkage [19]:where is a sign function and is set to median value of it and , with .

*Step 4 (model update). *The update strategy is as follows. It is a low pass filtering form, in which is the target template at frame , is the characteristic of the upper frame, and is the sparse expression of .where is a learning parameter.

Observation model is defined by (17) in convolution model. where .

is the th candidate sample representation at frame based on the complex cell features, where expresses the product of elements and is an indicator function whose element is defined as

###### 3.2.3. System Observation Model

The system observation probability density function of each particle is

The parameter is used to regulate the proportion of the observed probability of each feature in the total observed probability. When the background is complex and the target is partially occluded, the global positioning advantage of color distribution should be fully exploited, and the value should be increased at this time. When the color of the target is different from the background color, the value should be reduced, and the localization advantage of the convolution feature can be fully exploited. Under normal circumstances, we take .

The flow chart of particle filter algorithm based on feature fusion is displayed in Figure 1.