Computational NeuroscienceView this Special Issue
Research Article | Open Access
The Research and Application of Visual Saliency and Adaptive Support Vector Machine in Target Tracking Field
The efficient target tracking algorithm researches have become current research focus of intelligent robots. The main problems of target tracking process in mobile robot face environmental uncertainty. They are very difficult to estimate the target states, illumination change, target shape changes, complex backgrounds, and other factors and all affect the occlusion in tracking robustness. To further improve the target tracking’s accuracy and reliability, we present a novel target tracking algorithm to use visual saliency and adaptive support vector machine (ASVM). Furthermore, the paper’s algorithm has been based on the mixture saliency of image features. These features include color, brightness, and sport feature. The execution process used visual saliency features and those common characteristics have been expressed as the target’s saliency. Numerous experiments demonstrate the effectiveness and timeliness of the proposed target tracking algorithm in video sequences where the target objects undergo large changes in pose, scale, and illumination.
Target tracking has attracted a lot of attention in computer vision due to its fundamental importance for many vision applications such as visual surveillance, traffic safety monitoring, and abnormal activity detection. And many successful techniques of target tracking have been proposed in the last several decades . The applicability of the techniques in general scenarios, however, is still very limited due to practical difficulties: appearance variations (e.g., illumination, viewpoint, and background changes), occlusions, complex backgrounds, and so forth. These difficulties are inevitable in practical applications and thus noticeably aggravate this problem . To overcome these problems, many researchers have proposed numerous target tracking methods.
In many traditional approaches, various kinds of low-level observation models have been used for object tracking, such as feature points , lines or templates , moving areas , and color appearance models . The common framework for target tracking algorithm mainly includes mean-shift method using Kalman filter and particle filtering algorithm. The particle filter is a filtering algorithm based on Bayesian inference, through nonparametric sequential Monte Carlo methods. The particle filter is not linear and Gaussian distribution systems to meet the restrictions are widely used in navigation, machine vision, target tracking and so on. We can use the variety of particle filtering features, colors, and edge contour. They are more commonly used as two representations of target features. Wherein the color feature is not sensitive to noise and partial occlusion, but sensitive to changes in illumination, and when the target’s color and the background’s color distribution are similar to the distribution, it cannot correctly distinguish the object and the background. Edge contour feature robust to illumination changes, but in the case of complex backgrounds, higher computational complexity, cannot guarantee the real-time state.
The use of visual attention mechanism of the visual information on the input selection process, visual information from the mass filter has found out a small amount of useful information to the visual tracking algorithm . The information processing can reduce the amount of computation to improve the efficiency of information processing. Therefore, some researchers have gradually begun to study visual saliency attention based on object tracking method. Some researchers target tracking problem as the human visual attention transfer process to establish a computational model of visual attention. The focus of visual attention has shifted through sight target detection and target tracking [8, 9], but the goal is not always to be tracked the most significant and therefore full advantage of significant information and track deviation results. However,  has considered only static significance, ignoring the dynamic characteristics significantly, while  only for the case of stationary background target tracking problem is studied.
We have proposed a novel combination of adaptive support vector machine (ASVM) based on support vector machine (SVM) and visual saliency feature extraction algorithm for the moving target tracking. The new algorithm has utilized ASVM as the tracking algorithm’s framework, and the use of visual saliency has measured the model to calculate the target saliency features. They have included many common characteristics as the target for model representation in order to overcome the use of the single color feature which brings tracking instability problems. Numerous experiments demonstrate an effective solution because the target deformation, illumination change, the target’s background color distribution similar difficulties arising track to achieve the robust target tracking algorithm.
2. Adaptive Support Vector Machine
2.1. Sample Selection Algorithm
During the image preprocessing and feature extraction, we firstly need to mark the given sample selection of specific definition.
Definition 1. For each training sample pixel of , is called the sample of the sample pixel tags. The assignment conditions are shown as follows:
The definition in the role of the labeled sample is used to determine whether discarded mainly based on SVM of incremental learning. It can be used to store a number of occurrences of . When , had been based on the data in the original plus one operation. When had reach specific thresholds, you can sample the pixel from the entire training set .
Deepening of the training process execution can result in some samples happening “oscillation sample”. In order to solve the problem, the threshold should be introduced to represent the training set for the sample point data as nonsupport vector frequency maximum threshold values need to consider the training time and training precision balance factor. When , the pixel sample expressed by can be eliminated from the training set. The algorithm can reduce the “hunting” phenomenon of the sample pair classifiers related impact in preprocessing and feature extraction.
2.2. Increment and Decrement of ASVM Learning Algorithm
Since many of the existing incremental learning algorithms and lack of learning algorithms for reducing training set data selectively eliminated, so that the processing time will largely affect the accuracy of SVM. Most of the original incremental learning algorithms are executed in an incremental process which discards pixel samples from nonsupport vectors, but with subsequent incremental ongoing training process, before being discarded, nonsupport vectors are likely to become the support vector. Moreover, it is likely to effectively deal with some important information in a single training sample, and it directly discards the pixel with nonsupport vectors. It is results in decreased accuracy of the classification process. To solve the above problem, Section 2.1 had made the defined markers to mark a sample of the treatment. The effect can be achieved from increment and decrement ASVM learning algorithm.
The increment and decrement learning algorithm’s idea is shown as in the following description. Firstly, the introduction of threshold can be expressed in the training set for tolerating a non-pixel sample maximum number of support vectors. The sample pixel into the training set samples and the sample of the recording pixel non-specific number of support vectors , when the number has reached a predetermined threshold value, will be pixels of a sample after drop operation performed from the training set. The following steps have described the specific description of increment and decrement ASVM learning algorithm.
Algorithm 2. Consider the increment and decrement adaptive support vector machine learning algorithm.
Step 1 (initialization process). For the specific setting of the threshold value of , get a sample point until SVM support vector types and non-pixel samples appear and set the current pixel to obtain the sample set to . is the training set to perform the training process, thereby obtaining a sample labeled , so that , .
Step 2. The current pixel sample set is , and . Newly acquired pixel sample had been expressed as , so that . Put as the training set performs a retraining process.
Step 3. Consider (3.1) (3.2) for to (3.2.1) if ( is not support vector), then . If , remains; (3.2.2) if , then ; (3.3) if , then to become the training set to perform decrement process of learning.
Step 4. When there is a new sample, then transfer to Step 2 to continue to perform incremental learning (3.3) and decrement process (3.2).
Algorithm 2 had described the adaptive learning algorithm using increment and decrement ASVM learning method. The existing increment learning algorithms and decrement learning algorithms have differences in the third step of Algorithm 2. Before discarding sample pixel , Algorithm 2 uses to depict the determining execution whether the number of support vectors exceed the pre-determined threshold of . If it exceeds the threshold value, then the sample pixel concentration dropped from the sample. If it does not exceed the threshold value, then the sample pixels remain. The threshold may be the introduction of adaptive strategies and adaptive strategy aims to improve the training accuracy and reduce training time.
3. Visual Saliency Feature Extraction in ASVM Tracking Algorithm
Figure 1 is based on the visual saliency feature extraction and adaptive SVM target tracking algorithm flow diagram.
The object tracking algorithm had been described for the specific process as follows. (1) The initialization process: as , the image is acquired for the scene, depending on the scene image for relevant target, setting the initial state of relevant targets: , coordinates of the center position of the target. is the width of the target area and the target area height. According to Section 4, the method of calculation of the various parts of the scene image saliency features and similarity values, for the detection of the target area, calculates the target color space histograms and visual saliency feature histogram which has become the target feature representation model. and the color histogram feature visual saliency histogram calculation in Section 4 will be described. The initial distribution for the point of pixels is sampled for each pixel and the weight of the importance is assigned to . (2) If , it can continue to get the visual scene images to calculate its significance. The application of autoregressive model is shown in In formula (2), is the process noise. is the state transition function. We can use adaptive learning algorithm and increment and decrement learning method in the target state transition model for processing and support vector sample. The support vector had been calculated for each color space histogram and visual saliency histogram of . (3) Using each of the supported equation to calculate the support vector and the distance between the target areas, the target has support vector similarity metrics. (4)According to the visual saliency measure of calculation result, the effective extraction of the image scene significant regions. Use to calculate for each region which is significantly related to the distance between the target models. We can select the smallest distance from the salience regions to replace a large number of support vectors. So, they can be called the adjusted support vector. (5) For calculating the weight of each support vector and normalizing it, the specific method will be described in Section 4. (6) Making Section 2 and decrement learning adaptive incremental learning method is to get the sample optimal target state. (7) Based on support vector collection of support vector weights, high weight retention support vector, discarding the low weight of the support vector, updates support vector reaches of . At last, return to Step 2 implementation of the overall process.
4. Proposed Approach Description
4.1. Color Model and Similarity Measure
Adaptive SVM can be used to treat a variety of relevant characteristics expressed to detect target objects. About the color feature part, due to the color block part of the noise signal and the fact that it is not sensitive, the calculation process is simple. The color characteristics have widely been used. The HSI color space and the human visual system characteristics are similar to their own features. We can use the formula (3) to depict the scene image from the RGB color space. It can be switched to the HSI color space
The algorithm in Section 3 can be used for the color histogram as the color model of the target areas. It can be supposed that the whole color space is divided into subregions by calculating the image of the color vector scene. It can enter each subregion pixel number of frequencies to obtain a histogram of . The columns have contained the color space histogram. Considering the sample in the target area pixel position on the color distribution for the related effects can increase kernel function to spatial information for the integration, namely,
In formula (4), is a pixel and is the distance between the midpoints of the target area.
The application of , can to be expressed in the center region pixel color distribution model. Formula (5) describes the model
In (5), is a pixel region, is the number of pixels in the central region, and is the color characteristics of the . It can be assigned to the corresponding part of the color histogram. is the Dirac function. is represented as the area size. is a normalization factor.
4.2. Visual Saliency and Similarity Measurement
Color is very sensitive to illumination changes, when the target had been detected and the background color of the color range interval is closed. Only using color feature representation model as a target feature, the object tracking results are often not easy to achieve the desired state. The proposed algorithm can use visual saliency feature fusion and targets’ color to be detected as a representation model.
In some small part of the scene image, most of the contents are more than the other to win the human observer visual saliency. The people called these small parts with high visual saliency. Visual saliency measurement by the color characteristics of the scene image, brightness feature, and sports feature together produces an effect, compared with the simple color features. Visual saliency with high robustness, high robustness, and high noise immunity and the visual saliency calculation of specific overall process are shown in Figure 2.
(1) Feature Extraction. The application of (3) in the scene image from the RGB space to HSI space has been switched. The channel of , the channel of , and the channel of as the luminance characteristics are as the color features. The motion characteristics expression is shown in
In (6), represents pixel’s values of at time. denotes a pixel point at time which is determined.
(2) Visual Saliency Calculation. Visual saliency feature is the scene image in each area and surrounding environment caused by mutations arising from visual saliency. The greater effects of mutations have significantly higher vision. It can be characterized by calculating the various regions of the figure relative to the surrounding environment with the local characteristics which were compared to calculate the saliency value.
Firstly, the image is featured from the spatial domain into the frequency, so that the image can be obtained from the amplitude spectrum and . The phase spectrum in two feature images has been expressed as
In (7), is the pixel of the specific characteristic value. is the image size.
Image phase spectrum amplitude of features and characteristics of each image have contained a variety of specific information in the image. The characters in the amplitude spectrum of the image information in each of the frequency change and phase information changing spectral characteristics indicate position information. Calculating visual saliency was aimed at each image pixel to measure the significance, looking significantly for larger pixel location. Using the image phase spectrum tectonic features had restored image; the output value has greater saliency location of the pixels in the original image corresponding to eigenvalues of larger changes position, and these positions are in “visual saliency” area. Therefore, only the spectral characteristics of the phase image for the original structure to conduct inverse image using Fourier transform can be restored after reflecting on the various parts of the image was significantly related to the saliency map. It is named as
(3) Feature Fusion. According to the above method of color, brightness, and sports feature, each saliency map can significantly show in these visual features fusion diagram. It is to obtain the final visual saliency figure, namely, as
In (9), , , , respectively, using formula (8) stand for the color characteristics, luminance characteristics, sports characteristics. Visual saliency measurement had been calculated finally to obtain three relevant characteristic time of the visual saliency map. , , weights represent color weights, brightness weights, and sports weights. It depicts concrete representation of the three characteristics significantly averaging feature fusion graph execution.
Integrated visual saliency map is exactly the same size with the scene grayscale image. Each pixel’s value represents the corresponding position in the scene image pixel of visual saliency values. The application and feature extraction have part of the same way, and we can get visual saliency distribution model . Each saliency model of support vector and the target model were significantly similar between the values and visual saliency map.
(4) The Right to Calculate the Pixel Value. The color of each pixel based on models and visual saliency model and target features to represent the value of the degree of similarity between the models used to calculate the weight of each pixel has a value. It is shown as
In (10), represents pixel of color model and the target color values of the similarity degree between these models. represent pixel point visual saliency model and the target visual saliency value of the similarity degree between these models. is all averages of and is all averages of .
5. The Experimental Results and Analysis
5.1. Adaptive SVM Numerical Experiments
In order to evaluate the proposed algorithm in the specific performance, the correct rate of the paper from the training, testing accuracy, and CPU execution time of three elements had been compared with ASVM and online incremental learning algorithms in . Using online algorithms directly below the expression,  had proposed the online incremental learning algorithm, using ASVM for the proposed adaptive increment and decrement the SVM learning algorithm. The following experiments have been from the linear case and the respective status of implementation of the relevant nonlinear numerical experiments as well.
In the adaptive SVM numerous experiments, firstly select the UCI machine learning database  concerning the data sets related numerical experiments. The sample through the training set individually is added to online simulation. Taken , for the penalty parameter, is required to achieve . Penalty parameter is selected from the training set through the adjustment set in the training process selecting the optimal value. Numerous experiments on the selected threshold value are achieved by the variety of UCI machine learning databases through constant adjustment and test selected. According to the numerous experiments relevant results, finally finding the threshold value is to ensure the training success rate, test success rate, and CPU execution time of the optimal solution.
The experimental results in the linear case are shown in Table 1. ASVM on the classification accuracy than online algorithm, the CPU execution time significantly better than the online algorithm, such as for higher dimensional data sets Pima-diabetes. ASVM’s execution time is 1.85 seconds, and the online execution time is 15.4012 seconds.
For the nonlinear case, we use the RBF kernel function . The nonlinear case numerous results are shown in Table 2. is the parameter of kernel function. According to Table 2, the numerous results to be seen are shown: ASVM executed by the CPU than the online time is significantly smaller; ASVM correct rate training and which testing accuracy is more than the online algorithm is significantly higher.
5.2. Calculating Visual Saliency Map Experiment
In Figures 3 and 4, the first line of the original video image of each frame image. The second line of image is obtained by considering only the color characteristics of the visual saliency map . The third line of image is obtained by the algorithm and applied ASVM visual saliency map. The fourth line of the image is applied only notable characteristics obtained saliency map .
According to Figures 3 and 4, the visual saliency map has comparison results. The application of ASVM algorithm proceeds visual saliency map best reflecting the original image characteristic graphs of the results. The effect is good visual saliency map in the subsequent tracking process which can also play a very good supporting role.
5.3. The Testing Experiments of Object Tracking Algorithm
In order to verify the accuracy of the proposed algorithm and effectiveness, the experiment environments have included Intel Pentium 2.6 GHz, 4 G RAM. The simulation testing software has used Matlab 2012b. The focusing algorithm for tracking the effectiveness of tests, including light intensity changes in the target deformation and occlusion of target tracking results, does not introduce significant visual support vector machine algorithm for horizontal contrast.
Object tracking experiment is rotated and the changes of shape and size for the video sequence header target tracking as well . The test video sequence has 500 frames of video targets which are moving the camera. The target size, shape, and pose had significantly changed in Figure 5. It demonstrates the detailed tracking results of three cases. The 1, 63, 83, 96, 105, 156, 178, 244 video sequences have respectively the frame . It can be seen in Figure 5 when the target before and after exercise-induced changes in the target size, but not the target color characteristics, does not change significantly. The first two algorithms can be tracked correctly. The turn movements occur when the target and the target color distribution greatly occurred changes, using only a single feature representation model as a target tracking is performed. It will fail and lose the target track. And in addition to the color characteristics of the algorithm, it also joined the target saliency features, with single feature algorithms with high stability and antijamming capability. The algorithm is tested even when the target rotation, shape, and color are changed. It is possible for tracking results to obtain high efficiency.
(a) Only consider the color feature tracking algorithm results
(b) Only consider visual saliency feature tracking algorithm results
(c) Application of ASVM and visual saliency feature tracking algorithm results
The algorithm is a novel target tracking algorithm based on the integration visual saliency features and color co-feature model as target detection feature representation model. To ensure the effectiveness of target tracking algorithm, the significant increase in space and time complexity of the algorithm had to ensure the algorithm for real-time results.
The use of visual saliency features and ASVM is significant in the target tracking algorithm proposed in the paper. The algorithm had been based on the image color feature, brightness feature, and sports feature. Visual saliency measurement execution process, visual saliency features, and color features common characteristics are expressed as the target. In the video sequence for related experiments, the effectiveness and timeliness of object tracking algorithm in the paper have achieved excellent results.
In comparison with the existing target tracking algorithm, the algorithm can avoid the application of single color characteristics caused by target tracking instability. In the larger target attitude change, illumination change, shape change, and the emergence of sheltered cases correctly track the target. Prolonged occlusion and dramatic lighting changes may still cause the tracking algorithm failure. Therefore, in addition to color features and visual salient feature, you can also consider other effective features. Using reasonable means, the model will achieve greater robustness and the goal is to further research efforts for tracking direction.
This work is supported by the Project supported by Scientific Research Fund of Hunan Provincial Education Department (no. 12B005), a project supported by Hunan Province Science and Technology Planning (no. 2012FJ3005 and no. 2012SK4046), a project supported by the Research Foundation from Ministry of Education of China (no. 208098), and a Project supported by the Hunan Province Undergraduates Innovating Experimentation Project (no. 191-501).
- F. F. Du, Particle Filter Object Tracking Algorithm Based on Vision and Its Application on Mobile Robot, Hangzhou Dianzi University, Hangzhou, China, 2009.
- G. Zhang, Z. Yuan, N. Zhang, X. Sheng, and T. Liu, “Visual saliency based on object tracking,” in Computer Vision—ACCV 2009, H. Zha, R.-I. Taniguchi, and S. Maybank, Eds., vol. 5995 of Lecture Notes in Computer Science, pp. 193–203, Springer, 2009.
- P.-H. Li, “A novel color based particle filter algorithm for object tracking,” Chinese Journal of Computers, vol. 32, no. 12, pp. 2454–2463, 2009.
- B. Pu, F. Zhou, and X. Bai, “Particle filter based on color feature with contour information adaptively integrated for object tracking,” in Proceedings of the 4th International Symposium on Computational Intelligence and Design (ISCID '11), pp. 359–362, Zhejiang University, Hangzhou, China, October 2011.
- Y. Xia, X. J. Wu, and Z. H. Feng, “Mean shift algorithm for visual tracking based on feature contribution,” Control and Decision, vol. 27, no. 7, pp. 1021–1026, 2012.
- Z. Jiang, Z. Lin, and L. S. Davis, “Label consistent K-SVD: learning a discriminative dictionary for recognition,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 11, pp. 2651–2664, 2013.
- V. Badrinarayanan, I. Budvytis, and R. Cipolla, “Semi-supervised video segmentation using tree structured graphical models,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 11, pp. 2751–2764, 2013.
- Z. H. Zeng, C. L. Zhou, K. H. Lin et al., “Visual attention computational model based on tracking target,” Computer Engineering, vol. 34, no. 23, pp. 241–243, 2008.
- G. Yang and H. Liu, “Visual attention & multi-cue fusion based human motion tracking method,” in Proceedings of the 6th International Conference on Natural Computation (ICNC '10), pp. 2044–2054, Yantai University, Yantai, China, August 2010.
- S. Desire, F. David, and M. Fabrice, “Using visual saliency for object tracking with particle filters,” in Proceedings of the 18th European Signal Processing Conference (EUSIPCO '10), Aalborg University, Aalborg, Denmark, August 2010.
- Y. Zhang, Z.-L. Zhang, Z.-K. Shen, and X.-Y. Lu, “The images tracking algorithm using particle filter based on dynamic salient features of targets,” Acta Electronica Sinica, vol. 36, no. 12, pp. 2306–2311, 2008.
- G. Cauwenberghs and T. Paggio, Incremental and Decremental Support Vector Machine Learning, vol. 13 of Advances in Neural Information Processing, MIT Press, 2001.
- S. He, Q. Yang, R. W. H. Lau et al., “Visual tracking via locality sensitive histograms,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '13), pp. 2427–2434, 2013.
- B. Babenko, M.-H. Yang, and S. Belongie, “Robust object tracking with online multiple instance learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 8, pp. 1619–1632, 2011.
- J. Yan, M. Zhu, H. Liu, and Y. Liu, “Visual saliency detection via sparsity pursuit,” IEEE Signal Processing Letters, vol. 34, no. 9, pp. 739–742, 2010.
- L. Xu, J. Du, and Q. Li, “Image fusion based on nonsubsampled contourlet transform and saliency-motivated pulse coupled neural networks,” Mathematical Problems in Engineering, vol. 2013, Article ID 135182, 10 pages, 2013.
- K. Madani, D. M. Ramik, and C. Sabourin, “Multilevel cognitive machine-learning-based concept for artificial awareness: application to Humanoid robot awareness using visual saliency,” Applied Computational Intelligence and Soft Computing, vol. 2012, Article ID 354785, 11 pages, 2012.
Copyright © 2013 Yuantao Chen et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.