Research Article  Open Access
Jianbing Yang, Zhiyong Tang, Zhongcai Pei, Xiao Song, "A Novel MotionIntelligenceBased Control Algorithm for Object Tracking by Controlling PANTilt Automatically", Mathematical Problems in Engineering, vol. 2019, Article ID 9602460, 11 pages, 2019. https://doi.org/10.1155/2019/9602460
A Novel MotionIntelligenceBased Control Algorithm for Object Tracking by Controlling PANTilt Automatically
Abstract
As computer vision develops, pantilt platform visual systems are able to track moving target over static camera systems. In this paper, a novel motionintelligencebased control algorithm for object tracking by controlling pantilt platform has been proposed. The algorithm includes the motion control model based on angular speed and the intelligent control algorithm based on reinforcement learning (RL). The motion control model converts deviation between the center point of the tracked target and the center point of the frame to angular speed of pantilt platform. It can keep position of the tracked object in the center of the frame automatically. The intelligent control algorithm based on reinforcement learning can reduce the error between the ideal value and the actual value when the pantilt platform moves. The two blocks work together to make the pantilt platform track a dynamic object more stably and the experiment result shows that both the tracking accuracy and robustness are improved.
1. Introduction
In recent years, visual object tracking has been used in many fields such as video surveillance [1], unmanned assistance systems [2], and autonomous robotics [3]. Visual object tracking with pantilt platform can be employed as active surveillance, trajectory measured, and so on. However, how to make the pantilt platform control camera move smoothly and accurately to tracking object is a challenge. Reference [4] proposed a PID control algorithm based on the displacement between the tracked object in image and the center of image to control the pantilt platform. Reference [5] proposed a control algorithm with two ADRC controllers based rotation angle of the horizontal and vertical direction. Reference [6] proposed a control methods, namely, leadlag compensator and PID control, which using black box model. Although the control algorithms mentioned above can control the pantilt platform to track the target, there will still be overshooting, lag, and vibration.
In fact, the camera installed in the pantilt platform follows the target, which makes the azimuth axis and pitching axis of the pantilt platform follow the target’s angular speed when the target rotates around the pantilt platform. Therefore, we proposed a motion control model based on angular speed for pantilt platform. In order to eliminate the overshooting, lag, and vibration, we still need a control algorithm for the motion control model.
Since the advent of Alpha Go, reinforcement learning has gradually become the focus and attracted more and more scholars to study it. In robotic applications, [7] presents an implementation of RL enabling the learning process for multiple robotic tasks with minimal pertask tuning. Reference [8] makes robotic vehicles capable of Socially Aware Motion Planning with Deep Reinforcement Learning. Reference [9] makes mobile robot navigation with inverse reinforcement learning. Reference [10] uses Reinforcement Learning to design UAV controller. The pantilt platform motion can be formulated as a sequential decision process based on Markov stochastic model and reinforcement learning (RL) is a class of machine learning methods for solving sequential decisionmaking problems with unknown statetransition dynamics [11]. As such, we proposed an intelligent control algorithm based on reinforcement learning.
The outline of this paper is organized as follows. Section 2 discusses the composition of visual tracking system. Section 3 presents the design details of motion control model based on angular speed. Section 4 presents the intelligent control algorithm based on reinforcement learning. Section 5 draws the conclusion.
2. Composition of Visual Tracking System
The visual tracking system consists of visual pantilt platform, axis control system, and graphics processing computer, which is shown in Figure 1. The visual pantilt platform consists of the azimuth axis, the pitching axis, the tracking camera, and the highspeed recording camera (in this case, the visual pantilt platform is used to track parachute and we need to record the parachute posture; thus there is a highspeed recording camera in the pantilt platform). The axis control system consists of two motor drivers and a programmable Multiaxis Controller. Tracking camera is used to capture the target and send frame to graphics processing computer. The graphics processing computer processes the frame to generate the control instructions and the send them to axis control system and then the axis control system controls the azimuth axis and the pitching axis to yaw and pitch, respectively. The target will be kept at the center of the frame with the yawing and pitching of the pantilt platform in order to achieve realtime tracking.
Hardware configuration:(1)Tracking camera: PointGray (GS3U341C6CC), resolution: , Pixel Size: (2)Lens: NIKON (Focal length = 50 mm)(3)Motor: Kollmorgen (KBMS17H01A00 and KBMS25H01A00)(4)Motor driver: Kollmorgen (AKDP00606NBEC0000 and AKDP01206NBEC0000)(5)Programmable MultiAxis Controller: Beckhoff (CX51300125)(6)Graphics processing computer: CPU is INTEL 7700 K, 32 GB RAM, two GPU: Gigabyte GVN1080Ti, Linux system.
3. Motion Control Model Based on Angular Speed for PanTilt Platform
In this section, we will at first explain the relationship between deviation and rotation angle of the pantilt platform. Then we will present a mathematical motion control model based on angular speed for pantilt platform.
3.1. Relationship between Deviation and Rotation Angle
In this article, particle tracking algorithm [12–15] is used to track the moving target on the frame. The particle tracking algorithm is not the focus of this paper, so it will not be described in detail.
When the moving target is captured by the tracking camera, the tracking algorithm will generate the tracking box of the moving target in frame, as shown in Figure 2. If the moving target is not in the center of the frame, there will be two deviations ( and ) between the center point of the target and the center point of the frame in axis and axis, respectively. The and are the number of pixels. We already know the pixel size (see Section 2) of the tracking camera, so we can calculate the distance between the center of target and the center of the mos plane on the camera mos plane. The equations are shown in (1) and (2).
where and represent the distance deviations in axis and axis on the camera mos plane, separately. s represents the pixel size.
Through the imageforming principle, we can convert the and to the deviation angles and , shown in Figure 3. There is a point in the space, and it is projected onto the mos plane of the camera as point . Through the geometric relationship, we can get the deviation angle between the line connecting point and focal point and the camera’s central axis (the axis) in the xaxis direction. Meanwhile the deviation angle between the line connecting point and focal point and the camera’s central axis in the yaxis direction can also be obtained; the equations are shown as (3) and (4).
where f is the focal length. Due to the fact that we are using a fixedfocus lens, f is a known constant and f = 50 mm.
3.2. Mathematical Model of Motion Control
In order to drive the movement of the pantilt platform to keep position of the tracked object in the center of the frame automatically, the most commonly used control methods for pantilt platform are to calculate and , and then let the azimuth axis and the pitch axis of the pantilt platform rotate the angles and , respectively.
In fact, the camera installed in the pantilt platform follows the target, which makes the azimuth axis and pitching axis of the pantilt platform follow the target’s angular speed when the target rotates around the pantilt platform. If we can calculate the angular speed of the target and make the angular speed of the pantilt platform follow the target, we can also keep position of the tracked object in the center of the frame. Therefore, we proposed a motion control model based on angular speed for pantilt platform.
For ease of analysis, we separate the control of pantilt platform into the control of the azimuth axis and the pitch axis, because the motion of the azimuth axis and the pitch axis is independent. We will introduce the mathematical model of motion control for azimuth axis; the mathematical model for pitch axis is the same.
The movement of the target relative to the pantilt platform is the movement of the target on the frame. We can get the previous angle and the current angle of the target on the frame, as shown in Figure 4. Then, the current angular speed of the relative motion can be obtained. The equation is shown as
where is the cycle of the image processing algorithm.
can also be expressed as
where is the current angular speed of the target which rotates around azimuth axis and is the current angular speed of azimuth axis. can be obtained from motor driver of azimuth motor and it is a known constant.
According to (6), we can get the current angle speed of the target. The equation is shown as
Our goal is to calculate the next angle speed instruction for the azimuth axis and make the tracked object in the center of the frame. Therefore, the ideal position of the target at the next moment is at the center of the image. In Figure 4, the next position is point O. When the target is at the point O, the angle between the line connecting point and focal point and the camera’s central axis in the xaxis direction is . The next angular speed of the target relative to the azimuth axis is shown as
can also be expressed as
So, we can get ; the equation is shown as
where is the angle speed of target at the next moment. We cannot accurately predict , but the (is short enough; hence we consider . So, (10) can be reformulated as
Finally, can be expressed as
Equation (12) is the motion control model based on angular speed which is proposed in this paper. The algorithm is shown as Algorithm 1.
Input Capture the adjacent two frames from camera: ,  
Output  
Initialization particle tracking algorithm with the target  
While true do  
Find the position of the target on frame , with particle tracking algorithm  
Get and with equation (3)  
Get and with equation (5) and (8), respectively  
Get from motor driver of azimuth motor  
Get with equation (12)  
Send the instructions to motor driver of azimuth motor 
In order to verify motion control model based on angular speed, we conducted related experiments. The experiment scenarios are shown in Table 1.

With Algorithm 1, the pantilt platform keeps the pedestrian’s face at the center of the image in Experiment 1, shown as Figure 5. This proves that the motion control model based on angular speed we proposed is correct. However, from Figure 5, we find that the pantilt platform vibrates back and forth around the center of the frame.
(a)
(b)
In order to tackle this problem, we multiply (8) by a coefficient (), shown as (13). The reason is that it can reduce the angular speed of the target relative to the pantilt platform. As we can see from Figure 5(a)), the speed of the pedestrian moving toward the center of the image will be reduced on the image. Finally, (12) can be expressed as (14).
We conducted the second experiment to verify whether the addition of coefficient can solve the vibration problem. In Experiment 2, we set . The result of Experiment 2 is shown as Figure 6. Comparing Figure 6 with Figure 5(b)), we find that after adding coefficient , the amplitude of is significantly reduced and is almost equal to 0. The vibration has been greatly reduced.
Experiment 2 also verified that the pantilt platform can effectively keep the stationary target at the center of the image with the motion control model based on angular speed which we proposed. Next, we will use the pantilt platform to track the moving object in Experiment 3.
In Experiment 3, we also set . The result is shown as Figure 7. As the target begins to move, the pantilt platform has a significant lag in tracking the UAV and the response of the pantilt platform is not fast enough, causing the target to fly out of the camera’s field of view. The reason is that the γ is too small, resulting in the angular speed of the UAV relative to the pantilt platform is too small, so that the target cannot be tracked in time.
(a)
(b)
In Experiment 4, we selected a series of () to conduct experiments separately. The results of Experiment 4 are shown in Figure 8. We have found that increasing γ has a significant improvement in hysteresis. When γ reached 0.4, we found that a good result was achieved. However, when γ exceeded 0.4, the pantilt platform still shakes.
(a)
(b)
In Experiment 5, we increased the speed of the UAV and conduct experiment with the optimal γ = 0.4 obtained in Experiment 4. We find that there is still a hysteresis. Therefore, we increased γ and conducted experiments. The results of Experiment 5 are shown as Figure 9. When γ reached 0.6, a good result was achieved. However, when γ exceeds 0.6, the pantilt platform still shakes.
Through the above experiments, we find that under different motion states of the target, we need different γ to keep the target at the center of the frame and make the pantilt platform move smoothly. Therefore, we propose an intelligent control algorithm based on reinforcement learning for γ.
4. Intelligent Control Algorithm Based on Reinforcement Learning
As time goes by, the moving target will be in different states and the pantilt platform needs optimal γ to keep the target at the center of the frame and move smoothly. The fact that the pantilt platform continuously obtains optimal γ can be formulated as a sequential decision process based on Markov stochastic model. Meanwhile, reinforcement learning (RL) is a class of machine learning methods for solving sequential decisionmaking problems with unknown statetransition dynamics [11]. Normally, Markov decision process (MDP) can be defined by a tuple , in which is the state space, a is the action space, P is the statetransition model, R is the reward function, and is a discount factor.
When pantilt platform stays in a certain state, it obtains the optimal γ by learning the process of trial through the reinforcement learning algorithm. However, it is infeasible to let the pantilt platform continuously explore online to learn an optimal γ because the pantilt platform will lose the target with a bad γ when the target is out of the camera’s field of view. To tackle this problem, we apply datadriven reinforcement learning that the reallife data is collected by tracking object in different moving conditions with the pantilt platform and then is used to train the proposed Qnetwork in order to output optimal γ.
In this section, we will design variables of reinforcement learning and then design experiments and collect data according to the requirements of and a. Finally, we show how to use the collected data to train the proposed Qnetwork. For the control of azimuth axis, the proposed intelligent control algorithm based on reinforcement learning is as follows. The method for pitch axis is the same.
4.1. Variable Design of Reinforcement Learning
4.1.1. Design of State Space
Since the feedback of the control pantilt platform is derived from the frame directly, we consider that (see equation (3)), (see equation (5)), and (see equation (15)) are the most direct factors of γ.
where is current angular acceleration of the target relative to the pantilt platform.
Therefore, the state is designed as follows.
4.1.2. Design of Action Space
Our goal is to output the optimal value of γ, so the action space is the range of γ. Obviously, it is desirable if we can design an action space of all possibilities of γ. However, the computer cannot compute a continuous variable. In (13), . Therefore, for action space of γ, we divide 0 to into 10 values at the step of 0.1. The action space of γ is .
4.1.3. Design Experiments and Collect Data
In order to output the optimal γ, we need to collect as many states as possible for each action to make Qnetwork training better. According to this principle, for each action, we conduct 10 sets of experiments and these experimental scenarios are as follows.
The distance between UAV and pantilt platform is 100 m. Select the UAV as the initialization of the particle tracking algorithm and then make the UAV move at 10 different speeds () but the motion control model based on angular speed with same γ. Note that the speed at does not mean that the UAV moves at this speed at a constant speed. It means that the UAV first accelerates to this speed and then keeps moving at a constant speed for a while before it decelerates to 0 m/s. The purpose of this is to get more target motion states because our action space has 10 actions with 10 sets of experiments under each action, and, finally, we have done a total of 100 sets of experiments.
Since the state is , we store the obtained , , and in chronological order in “.txt” file. When we finish all the experiments, we will get the “.txt” files that store the experimental data shown Table 2. The content format in the “.txt” file is as follows:

4.1.4. Design of Reward Function and QValue Function
To compute Qvalue of the RL network, we need to firstly design the reward function for γ. When is sent to the motor driver of azimuth motor, the azimuth axis will move and make target arrive at during . To evaluate the reward of γ, the distance between the center of the target and the center of the frame is an important factor. When the target is very close to the center of the frame, we think γ is good, so it will be given a higher reward; otherwise it will be a lower reward. The reward function is shown in
where R is the reward of γ. is the threshold of the . is the deviation angle at the next moment. If is in the range of , γ can be regarded as an optimal action and a positive reward value 100 is given. Otherwise, the reward value is negative. Meanwhile, the greater the is, the smaller the reward value is, and is 10 times of . In this paper, we set because when , the target is almost at the center of the image.
Moreover, we use the one step QLearning for proposed intelligent control algorithm based on reinforcement learning approach. The transition rule of QLearning is shown in
where corresponds to the next state after carrying action while in state , is the action of state . is the discount factors and we use in this paper.
4.2. Design and Training of QValue Network
4.2.1. Design of QValue Network
We propose to design a neural network to learn QValues of the 10 actions and a fully connected neural network with ReLU nonlinearities is employed to parametrize the QValue function. The structure of QValue network is shown as Figure 10.
The QValue network with 3 hidden layers and the numbers of hidden neurons are all 100. In input layer, there are 3 neurons corresponding to the state elements . In output layer, there are 10 neurons corresponding to the QValues of the 10 actions. The input and output of the QValue network are shown as Tables 3 and 4.
4.2.2. Training Method for QValue Network
(1) StateTransition and Getting Experience. Unlike online exploration learning, we do not use a random selection of an action to transfer to the next state, but to transfer the state of agent (pantilt platform) under the specified action. Therefore, the statetransition model is that the current state transfer to next state with the same γ which we chose during an experiment.
Before training the QValue network, we first need to get experience of agent. In collected experimental data, we can get the current state , the next state , the reward which get from (18), and the current action ( is γ which we chose during an experiment). We store as a tuple and the agent’s experience will be expressed as . We put all into the memory pool D.
There are about 900,000 experiences in the memory pool D.
(2) Memory Replay for QValue Computation. In order to train our QValue network, the key idea is experience replay [16]. During learning, we sample in the memory pool randomly to get a bath of experiences. We use as input of the QValue network which is shown in Figure 10 and get current state QValue: . We clone the QValue network to obtain the target QValue network for every 100 steps. We use as input of the target QValue network and get next state QValue: . Then we apply QLearning (see (19)) to update the current state QValue: . So, we get . The loss function is as follows:
where θ are the parameters (weights and bias) of the network.
Since the sample collected by the agent to explore the environment is a time series, there is continuity between the samples. We use a random sampling strategy to get training batch in the memory pool. The algorithm for training Qvalue network is presented in Algorithm 2.
Get the memory pool D  
Initialize Qvalue network with random  
For epochs = 1, 1000,000 do  
Sample random from memory pool to get 50 samples  
Perform a gradient descent step on equation (21) respect to the network parameters  
Every 100 steps, clone the Qvalue network to obtain the target network Qvalue network  
End For 
(3) Network Training Details. The main learning method of our Qvalue network is Backpropagation (BP) [17]. This procedure is repeated numerous times until the predefined upper limit of a certain number of steps. Other key training parameters and functions are as follows.(1)Optimization algorithm: stochastic gradient descent.(2)Batch size and the number of training steps: 50, 1000,000.(3)Learning rate: 0.00001. To make gradient descent have a good tradeoff of accuracy and performance, it is desirable to set the learning rate to a suitable value.(4)Activation function is Rectified Linear Unit (ReLU): .(5)Loss function is using the meansquare error (MSE) method, i.e., . Here θ is the networks parameter set, i.e., . w is the weight and b is the bias.
4.3. Experiment
Select the UAV as the initialization of the particle tracking algorithm and then make the UAV move at different speed (1m/s, ⋯,10m/s). The experiment results are shown as Figure 11.
From Figure 11, we found the tracked target under different motion states, the deviation angle in the range of which means the tracked target almost in the center of the frame. This shows that our proposed algorithm has better accuracy and robustness.
5. Conclusion
This paper presents a novel pantilt platform control method which consists of two parts. One is a motion control model based on angular speed and the other is an intelligent control algorithm based on reinforcement learning. To the best of our knowledge, this is the first work to use the angular speed to control visual pantilt platform to track target. Meanwhile, in order to reduce the error between the ideal value and the actual value when the pantilt platform moves, we implement an intelligent control algorithm based on reinforcement by learning in a datadriven manner, which is different from traditional knowledgedriven methods, such as PID. The experiment results show that the proposed methods can be applied to track a realtime dynamic target. Moreover, the tracking accuracy and robustness are improved and the stability of the system is ensured.
Due to the limitations of our experimental conditions, the experimental data we collected does not contain all the motion states of the target. In the future work, we will focus on improving the response frequency of the pantilt platform, doing more experiments to track highspeed targets, collecting more data and designing deeper QValue networks to output optimal γ.
Data Availability
The [.txt] data used to support the findings of this study have been deposited in the [Partially data collected by experiments_datas] repository (https://github.com/blueskyM01/experimentdataoftracking/tree/master/Partially%20data%20collected%20by%20experiments_datas). As the project is funded by AVIC AEROSPACE LIFESUPPORT INDUSTRIES, LTD and has been used for commercial purposes. It only provides partial data to prevent other people from using the complete data for other commercial purposes. The [.jpg] data used to support the findings of this study have been deposited in the [Partially data collected by experiments_photos] repository (https://github.com/blueskyM01/experimentdataoftracking/tree/master/Partially%20data%20collected%20by%20experiments_photos). As the project is funded by AVIC AEROSPACE LIFESUPPORT INDUSTRIES, LTD and has been used for commercial purposes. It only provides partial data to prevent other people from using the complete data for other commercial purposes. The [Fig.] data used to support the findings of this study are included within the article. The [Code] data used to support the findings of this study are currently under embargo while the research findings are commercialized. Requests for data, [3 years] after publication of this article, will be considered by the corresponding author.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This work was supported by a grant from the Major State Basic Research Development Program of China (973 Program) (no. 2016YFC0802703).
References
 K. Mehmood, M. Mrak, J. Calic, and A. Kondoz, “Object tracking in surveillance videos using compressed domain features from scalable bitstreams,” Signal Processing: Image Communication, vol. 24, no. 10, pp. 814–824, 2009. View at: Publisher Site  Google Scholar
 C. Yu, J. Cai, and Q. Chen, “Multiresolution visual fiducial and assistant navigation system for unmanned aerial vehicle landing,” Aerospace Science and Technology, vol. 67, pp. 249–256, 2017. View at: Publisher Site  Google Scholar
 Y. Motai, S. Kumar Jha, and D. Kruse, “Human tracking from a mobile agent: optical flow and Kalman filter arbitration,” Signal Processing: Image Communication, vol. 27, no. 1, pp. 83–95, 2012. View at: Publisher Site  Google Scholar
 B. Zhang, J. Huang, and J. Lin, “A novel control algorithm for object tracking by controlling PAN/TILT automatically,” in Proceedings of the 2010 2nd International Conference on Education Technology and Computer (ICETC), pp. V1596–V1602, Shanghai, China, June 2010. View at: Publisher Site  Google Scholar
 H. Chen, X. Zhao, and M. Tan, “A novel pantilt camera control approach for visual tracking,” in Proceedings of the 2014 11th World Congress on Intelligent Control and Automation, WCICA 2014, pp. 2860–2865, China, July 2014. View at: Google Scholar
 S. R. Yosafat, C. Machbub, and E. M. I. Hidayat, “Design and implementation of PanTilt control for face tracking,” in Proceedings of the 7th IEEE International Conference on System Engineering and Technology, ICSET 2017, pp. 217–222, Malaysia, October 2017. View at: Google Scholar
 A. MartínezTenor, J. A. FernándezMadrigal, A. CruzMartín, and J. GonzálezJiménez, “Towards a common implementation of reinforcement learning for multiple robotic tasks,” Expert Systems with Applications, vol. 100, no. 15, pp. 246–259, 2018. View at: Publisher Site  Google Scholar
 Y. F. Chen, M. Everett, M. Liu, and J. P. How, “Socially aware motion planning with deep reinforcement learning,” in Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1343–1350, Vancouver, BC, September 2017. View at: Publisher Site  Google Scholar
 H. Kretzschmar, M. Spies, C. Sprunk, and W. Burgard, “Socially compliant mobile robot navigation via inverse reinforcement learning,” International Journal of Robotics Research, vol. 35, no. 11, pp. 1352–1370, 2016. View at: Google Scholar
 J. A. Bagnell and J. G. Schneider, “Autonomous helicopter control using reinforcement learning policy search methods,” Proceedings  IEEE International Conference on Robotics and Automation, vol. 2, pp. 1615–1620, 2001. View at: Google Scholar
 R. S. Sutton and A. G. Barto, Introduction to Reinforcement Learning, MIT Press, Cambridge, MA, USA, 1st edition, 1998.
 M. Firouznia, K. Faez, H. Amindavar, and J. A. Koupaei, “Chaotic particle filter for visual object tracking,” Journal of Visual Communication and Image Representation, vol. 53, pp. 1–12, 2018. View at: Publisher Site  Google Scholar
 X. Qian, L. Han, Y. Wang, and M. Ding, “Deep learning assisted robust visual tracking with adaptive particle filtering,” Signal Processing: Image Communication, vol. 60, pp. 183–192, 2018. View at: Publisher Site  Google Scholar
 W. Li, P. Wang, and H. Qiao, “Topdown visual attention integrated particle filter for robust object tracking,” Signal Processing: Image Communication, vol. 43, pp. 28–41, 2016. View at: Publisher Site  Google Scholar
 I. A. Iswanto and B. Li, “Visual Object Tracking Based on Meanshift and ParticleKalman Filter,” in Proceedings of the 2nd International Conference on Computer Science and Computational Intelligence, ICCSCI 2017, pp. 587–595, Indonesia, October 2017. View at: Google Scholar
 V. Mnih, K. Kavukcuoglu, D. Silver et al., “Humanlevel control through deep reinforcement learning,” Nature, vol. 518, no. 7540, pp. 529–533, 2015. View at: Publisher Site  Google Scholar
 X. Song and J. Sun, “OpenPTDS,” in Proceedings of AsiaSim, 2018. View at: Google Scholar
Copyright
Copyright © 2019 Jianbing Yang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.