#### Abstract

As computer vision develops, pan-tilt platform visual systems are able to track moving target over static camera systems. In this paper, a novel motion-intelligence-based control algorithm for object tracking by controlling pan-tilt platform has been proposed. The algorithm includes the motion control model based on angular speed and the intelligent control algorithm based on reinforcement learning (RL). The motion control model converts deviation between the center point of the tracked target and the center point of the frame to angular speed of pan-tilt platform. It can keep position of the tracked object in the center of the frame automatically. The intelligent control algorithm based on reinforcement learning can reduce the error between the ideal value and the actual value when the pan-tilt platform moves. The two blocks work together to make the pan-tilt platform track a dynamic object more stably and the experiment result shows that both the tracking accuracy and robustness are improved.

#### 1. Introduction

In recent years, visual object tracking has been used in many fields such as video surveillance [1], unmanned assistance systems [2], and autonomous robotics [3]. Visual object tracking with pan-tilt platform can be employed as active surveillance, trajectory measured, and so on. However, how to make the pan-tilt platform control camera move smoothly and accurately to tracking object is a challenge. Reference [4] proposed a PID control algorithm based on the displacement between the tracked object in image and the center of image to control the pan-tilt platform. Reference [5] proposed a control algorithm with two ADRC controllers based rotation angle of the horizontal and vertical direction. Reference [6] proposed a control methods, namely, lead-lag compensator and PID control, which using black box model. Although the control algorithms mentioned above can control the pan-tilt platform to track the target, there will still be overshooting, lag, and vibration.

In fact, the camera installed in the pan-tilt platform follows the target, which makes the azimuth axis and pitching axis of the pan-tilt platform follow the target’s angular speed when the target rotates around the pan-tilt platform. Therefore, we proposed a motion control model based on angular speed for pan-tilt platform. In order to eliminate the overshooting, lag, and vibration, we still need a control algorithm for the motion control model.

Since the advent of Alpha Go, reinforcement learning has gradually become the focus and attracted more and more scholars to study it. In robotic applications, [7] presents an implementation of RL enabling the learning process for multiple robotic tasks with minimal per-task tuning. Reference [8] makes robotic vehicles capable of Socially Aware Motion Planning with Deep Reinforcement Learning. Reference [9] makes mobile robot navigation with inverse reinforcement learning. Reference [10] uses Reinforcement Learning to design UAV controller. The pan-tilt platform motion can be formulated as a sequential decision process based on Markov stochastic model and reinforcement learning (RL) is a class of machine learning methods for solving sequential decision-making problems with unknown state-transition dynamics [11]. As such, we proposed an intelligent control algorithm based on reinforcement learning.

The outline of this paper is organized as follows. Section 2 discusses the composition of visual tracking system. Section 3 presents the design details of motion control model based on angular speed. Section 4 presents the intelligent control algorithm based on reinforcement learning. Section 5 draws the conclusion.

#### 2. Composition of Visual Tracking System

The visual tracking system consists of visual pan-tilt platform, axis control system, and graphics processing computer, which is shown in Figure 1. The visual pan-tilt platform consists of the azimuth axis, the pitching axis, the tracking camera, and the high-speed recording camera (in this case, the visual pan-tilt platform is used to track parachute and we need to record the parachute posture; thus there is a high-speed recording camera in the pan-tilt platform). The axis control system consists of two motor drivers and a programmable Multiaxis Controller. Tracking camera is used to capture the target and send frame to graphics processing computer. The graphics processing computer processes the frame to generate the control instructions and the send them to axis control system and then the axis control system controls the azimuth axis and the pitching axis to yaw and pitch, respectively. The target will be kept at the center of the frame with the yawing and pitching of the pan-tilt platform in order to achieve real-time tracking.

Hardware configuration:(1)Tracking camera: PointGray (GS3-U3-41C6C-C), resolution: , Pixel Size: (2)Lens: NIKON (Focal length = 50 mm)(3)Motor: Kollmorgen (KBMS-17H01-A00 and KBMS-25H01-A00)(4)Motor driver: Kollmorgen (AKD-P00606-NBEC-0000 and AKD-P01206-NBEC-0000)(5)Programmable Multi-Axis Controller: Beckhoff (CX5130-0125)(6)Graphics processing computer: CPU is INTEL 7700 K, 32 GB RAM, two GPU: Gigabyte GV-N1080Ti, Linux system.

#### 3. Motion Control Model Based on Angular Speed for Pan-Tilt Platform

In this section, we will at first explain the relationship between deviation and rotation angle of the pan-tilt platform. Then we will present a mathematical motion control model based on angular speed for pan-tilt platform.

##### 3.1. Relationship between Deviation and Rotation Angle

In this article, particle tracking algorithm [12–15] is used to track the moving target on the frame. The particle tracking algorithm is not the focus of this paper, so it will not be described in detail.

When the moving target is captured by the tracking camera, the tracking algorithm will generate the tracking box of the moving target in frame, as shown in Figure 2. If the moving target is not in the center of the frame, there will be two deviations ( and ) between the center point of the target and the center point of the frame in -axis and -axis, respectively. The and are the number of pixels. We already know the pixel size (see Section 2) of the tracking camera, so we can calculate the distance between the center of target and the center of the mos plane on the camera mos plane. The equations are shown in (1) and (2).

where and represent the distance deviations in -axis and -axis on the camera mos plane, separately.* s* represents the pixel size.

Through the image-forming principle, we can convert the and to the deviation angles and , shown in Figure 3. There is a point in the space, and it is projected onto the mos plane of the camera as point . Through the geometric relationship, we can get the deviation angle between the line connecting point and focal point and the camera’s central axis (the -axis) in the* x*-axis direction. Meanwhile the deviation angle between the line connecting point and focal point and the camera’s central axis in the* y*-axis direction can also be obtained; the equations are shown as (3) and (4).

where* f* is the focal length. Due to the fact that we are using a fixed-focus lens,* f* is a known constant and* f* = 50 mm.

##### 3.2. Mathematical Model of Motion Control

In order to drive the movement of the pan-tilt platform to keep position of the tracked object in the center of the frame automatically, the most commonly used control methods for pan-tilt platform are to calculate and , and then let the azimuth axis and the pitch axis of the pan-tilt platform rotate the angles and , respectively.

In fact, the camera installed in the pan-tilt platform follows the target, which makes the azimuth axis and pitching axis of the pan-tilt platform follow the target’s angular speed when the target rotates around the pan-tilt platform. If we can calculate the angular speed of the target and make the angular speed of the pan-tilt platform follow the target, we can also keep position of the tracked object in the center of the frame. Therefore, we proposed a motion control model based on angular speed for pan-tilt platform.

For ease of analysis, we separate the control of pan-tilt platform into the control of the azimuth axis and the pitch axis, because the motion of the azimuth axis and the pitch axis is independent. We will introduce the mathematical model of motion control for azimuth axis; the mathematical model for pitch axis is the same.

The movement of the target relative to the pan-tilt platform is the movement of the target on the frame. We can get the previous angle and the current angle of the target on the frame, as shown in Figure 4. Then, the current angular speed of the relative motion can be obtained. The equation is shown as

where is the cycle of the image processing algorithm.

can also be expressed as

where is the current angular speed of the target which rotates around azimuth axis and is the current angular speed of azimuth axis. can be obtained from motor driver of azimuth motor and it is a known constant.

According to (6), we can get the current angle speed of the target. The equation is shown as

Our goal is to calculate the next angle speed instruction for the azimuth axis and make the tracked object in the center of the frame. Therefore, the ideal position of the target at the next moment is at the center of the image. In Figure 4, the next position is point* O*. When the target is at the point* O*, the angle between the line connecting point and focal point and the camera’s central axis in the* x*-axis direction is . The next angular speed of the target relative to the azimuth axis is shown as

can also be expressed as

So, we can get ; the equation is shown as

where is the angle speed of target at the next moment. We cannot accurately predict , but the (is short enough; hence we consider . So, (10) can be reformulated as

Finally, can be expressed as

Equation (12) is the motion control model based on angular speed which is proposed in this paper. The algorithm is shown as Algorithm 1.

Input Capture the adjacent two frames from camera: , | |

Output | |

Initialization particle tracking algorithm with the target | |

While true do | |

Find the position of the target on frame , with particle tracking algorithm | |

Get and with equation (3) | |

Get and with equation (5) and (8), respectively | |

Get from motor driver of azimuth motor | |

Get with equation (12) | |

Send the instructions to motor driver of azimuth motor |

In order to verify motion control model based on angular speed, we conducted related experiments. The experiment scenarios are shown in Table 1.

With Algorithm 1, the pan-tilt platform keeps the pedestrian’s face at the center of the image in* Experiment 1*, shown as Figure 5. This proves that the motion control model based on angular speed we proposed is correct. However, from Figure 5, we find that the pan-tilt platform vibrates back and forth around the center of the frame.

**(a)**

**(b)**

In order to tackle this problem, we multiply (8) by a coefficient (), shown as (13). The reason is that it can reduce the angular speed of the target relative to the pan-tilt platform. As we can see from Figure 5(a)), the speed of the pedestrian moving toward the center of the image will be reduced on the image. Finally, (12) can be expressed as (14).

We conducted the second experiment to verify whether the addition of coefficient can solve the vibration problem. In* Experiment 2*, we set . The result of* Experiment 2* is shown as Figure 6. Comparing Figure 6 with Figure 5(b)), we find that after adding coefficient , the amplitude of is significantly reduced and is almost equal to 0. The vibration has been greatly reduced.

*Experiment 2* also verified that the pan-tilt platform can effectively keep the stationary target at the center of the image with the motion control model based on angular speed which we proposed. Next, we will use the pan-tilt platform to track the moving object in* Experiment 3*.

In* Experiment 3*, we also set . The result is shown as Figure 7. As the target begins to move, the pan-tilt platform has a significant lag in tracking the UAV and the response of the pan-tilt platform is not fast enough, causing the target to fly out of the camera’s field of view. The reason is that the *γ* is too small, resulting in the angular speed of the UAV relative to the pan-tilt platform is too small, so that the target cannot be tracked in time.

**(a)**

**(b)**

In* Experiment 4*, we selected a series of () to conduct experiments separately. The results of* Experiment 4* are shown in Figure 8. We have found that increasing *γ* has a significant improvement in hysteresis. When *γ* reached 0.4, we found that a good result was achieved. However, when *γ* exceeded 0.4, the pan-tilt platform still shakes.

**(a)**

**(b)**

In* Experiment 5*, we increased the speed of the UAV and conduct experiment with the optimal *γ* = 0.4 obtained in* Experiment 4*. We find that there is still a hysteresis. Therefore, we increased *γ* and conducted experiments. The results of* Experiment 5* are shown as Figure 9. When *γ* reached 0.6, a good result was achieved. However, when *γ* exceeds 0.6, the pan-tilt platform still shakes.

Through the above experiments, we find that under different motion states of the target, we need different *γ* to keep the target at the center of the frame and make the pan-tilt platform move smoothly. Therefore, we propose an intelligent control algorithm based on reinforcement learning for *γ*.

#### 4. Intelligent Control Algorithm Based on Reinforcement Learning

As time goes by, the moving target will be in different states and the pan-tilt platform needs optimal *γ* to keep the target at the center of the frame and move smoothly. The fact that the pan-tilt platform continuously obtains optimal *γ* can be formulated as a sequential decision process based on Markov stochastic model. Meanwhile, reinforcement learning (RL) is a class of machine learning methods for solving sequential decision-making problems with unknown state-transition dynamics [11]. Normally, Markov decision process (MDP) can be defined by a tuple , in which is the state space,* a* is the action space,* P* is the state-transition model,* R* is the reward function, and is a discount factor.

When pan-tilt platform stays in a certain state, it obtains the optimal *γ* by learning the process of trial through the reinforcement learning algorithm. However, it is infeasible to let the pan-tilt platform continuously explore online to learn an optimal *γ* because the pan-tilt platform will lose the target with a bad *γ* when the target is out of the camera’s field of view. To tackle this problem, we apply data-driven reinforcement learning that the real-life data is collected by tracking object in different moving conditions with the pan-tilt platform and then is used to train the proposed Q-network in order to output optimal *γ*.

In this section, we will design variables of reinforcement learning and then design experiments and collect data according to the requirements of and* a*. Finally, we show how to use the collected data to train the proposed Q-network. For the control of azimuth axis, the proposed intelligent control algorithm based on reinforcement learning is as follows. The method for pitch axis is the same.

##### 4.1. Variable Design of Reinforcement Learning

###### 4.1.1. Design of State Space

Since the feedback of the control pan-tilt platform is derived from the frame directly, we consider that (see equation (3)), (see equation (5)), and (see equation (15)) are the most direct factors of *γ*.

where is current angular acceleration of the target relative to the pan-tilt platform.

Therefore, the state is designed as follows.

###### 4.1.2. Design of Action Space

Our goal is to output the optimal value of *γ*, so the action space is the range of *γ*. Obviously, it is desirable if we can design an action space of all possibilities of *γ*. However, the computer cannot compute a continuous variable. In (13), . Therefore, for action space of *γ*, we divide 0 to into 10 values at the step of 0.1. The action space of *γ* is .

###### 4.1.3. Design Experiments and Collect Data

In order to output the optimal *γ*, we need to collect as many states as possible for each action to make Q-network training better. According to this principle, for each action, we conduct 10 sets of experiments and these experimental scenarios are as follows.

The distance between UAV and pan-tilt platform is 100 m. Select the UAV as the initialization of the particle tracking algorithm and then make the UAV move at 10 different speeds () but the motion control model based on angular speed with same *γ*. Note that the speed at does not mean that the UAV moves at this speed at a constant speed. It means that the UAV first accelerates to this speed and then keeps moving at a constant speed for a while before it decelerates to 0 m/s. The purpose of this is to get more target motion states because our action space has 10 actions with 10 sets of experiments under each action, and, finally, we have done a total of 100 sets of experiments.

Since the state is , we store the obtained , , and in chronological order in “.txt” file. When we finish all the experiments, we will get the “.txt” files that store the experimental data shown Table 2. The content format in the “.txt” file is as follows:

###### 4.1.4. Design of Reward Function and Q-Value Function

To compute Q-value of the RL network, we need to firstly design the reward function for *γ*. When is sent to the motor driver of azimuth motor, the azimuth axis will move and make target arrive at during . To evaluate the reward of *γ*, the distance between the center of the target and the center of the frame is an important factor. When the target is very close to the center of the frame, we think *γ* is good, so it will be given a higher reward; otherwise it will be a lower reward. The reward function is shown in

where* R* is the reward of *γ*. is the threshold of the . is the deviation angle at the next moment. If is in the range of , *γ* can be regarded as an optimal action and a positive reward value 100 is given. Otherwise, the reward value is negative. Meanwhile, the greater the is, the smaller the reward value is, and is -10 times of . In this paper, we set because when , the target is almost at the center of the image.

Moreover, we use the one step Q-Learning for proposed intelligent control algorithm based on reinforcement learning approach. The transition rule of Q-Learning is shown in

where corresponds to the next state after carrying action while in state , is the action of state . is the discount factors and we use in this paper.

##### 4.2. Design and Training of Q-Value Network

###### 4.2.1. Design of Q-Value Network

We propose to design a neural network to learn Q-Values of the 10 actions and a fully connected neural network with ReLU nonlinearities is employed to parametrize the Q-Value function. The structure of Q-Value network is shown as Figure 10.

The Q-Value network with 3 hidden layers and the numbers of hidden neurons are all 100. In input layer, there are 3 neurons corresponding to the state elements . In output layer, there are 10 neurons corresponding to the Q-Values of the 10 actions. The input and output of the Q-Value network are shown as Tables 3 and 4.

###### 4.2.2. Training Method for Q-Value Network

*(1) State-Transition and Getting Experience*. Unlike online exploration learning, we do not use a random selection of an action to transfer to the next state, but to transfer the state of agent (pan-tilt platform) under the specified action. Therefore, the state-transition model is that the current state transfer to next state with the same *γ* which we chose during an experiment.

Before training the Q-Value network, we first need to get experience of agent. In collected experimental data, we can get the current state , the next state , the reward which get from (18), and the current action ( is *γ* which we chose during an experiment). We store as a tuple and the agent’s experience will be expressed as . We put all into the memory pool* D*.

There are about 900,000 experiences in the memory pool* D*.

*(2) Memory Replay for Q-Value Computation*. In order to train our Q-Value network, the key idea is experience replay [16]. During learning, we sample in the memory pool randomly to get a bath of experiences. We use as input of the Q-Value network which is shown in Figure 10 and get current state Q-Value: . We clone the Q-Value network to obtain the target Q-Value network for every 100 steps. We use as input of the target Q-Value network and get next state Q-Value: . Then we apply Q-Learning (see (19)) to update the current state Q-Value: . So, we get . The loss function is as follows:

where* θ* are the parameters (weights and bias) of the network.

Since the sample collected by the agent to explore the environment is a time series, there is continuity between the samples. We use a random sampling strategy to get training batch in the memory pool. The algorithm for training Q-value network is presented in Algorithm 2.

Get the memory pool D | |

Initialize Q-value network with random | |

For epochs = 1, 1000,000 do | |

Sample random from memory pool to get 50 samples | |

Perform a gradient descent step on equation (21) respect to the network parameters | |

Every 100 steps, clone the Q-value network to obtain the target network Q-value network | |

End For |

*(3) Network Training Details*. The main learning method of our Q-value network is Backpropagation (BP) [17]. This procedure is repeated numerous times until the predefined upper limit of a certain number of steps. Other key training parameters and functions are as follows.(1)Optimization algorithm: stochastic gradient descent.(2)Batch size and the number of training steps: 50, 1000,000.(3)Learning rate: 0.00001. To make gradient descent have a good trade-off of accuracy and performance, it is desirable to set the learning rate to a suitable value.(4)Activation function is Rectified Linear Unit (ReLU): .(5)Loss function is using the mean-square error (MSE) method, i.e., . Here* θ *is the networks parameter set, i.e., .

*w*is the weight and

*b*is the bias.

##### 4.3. Experiment

Select the UAV as the initialization of the particle tracking algorithm and then make the UAV move at different speed (1m/s, ⋯,10m/s). The experiment results are shown as Figure 11.

From Figure 11, we found the tracked target under different motion states, the deviation angle in the range of which means the tracked target almost in the center of the frame. This shows that our proposed algorithm has better accuracy and robustness.

#### 5. Conclusion

This paper presents a novel pan-tilt platform control method which consists of two parts. One is a motion control model based on angular speed and the other is an intelligent control algorithm based on reinforcement learning. To the best of our knowledge, this is the first work to use the angular speed to control visual pan-tilt platform to track target. Meanwhile, in order to reduce the error between the ideal value and the actual value when the pan-tilt platform moves, we implement an intelligent control algorithm based on reinforcement by learning in a data-driven manner, which is different from traditional knowledge-driven methods, such as PID. The experiment results show that the proposed methods can be applied to track a real-time dynamic target. Moreover, the tracking accuracy and robustness are improved and the stability of the system is ensured.

Due to the limitations of our experimental conditions, the experimental data we collected does not contain all the motion states of the target. In the future work, we will focus on improving the response frequency of the pan-tilt platform, doing more experiments to track high-speed targets, collecting more data and designing deeper Q-Value networks to output optimal *γ*.

#### Data Availability

The [.txt] data used to support the findings of this study have been deposited in the [Partially data collected by experiments_datas] repository (https://github.com/blueskyM01/experiment-data-of-tracking/tree/master/Partially%20data%20collected%20by%20experiments_datas). As the project is funded by AVIC AEROSPACE LIFE-SUPPORT INDUSTRIES, LTD and has been used for commercial purposes. It only provides partial data to prevent other people from using the complete data for other commercial purposes. The [.jpg] data used to support the findings of this study have been deposited in the [Partially data collected by experiments_photos] repository (https://github.com/blueskyM01/experiment-data-of-tracking/tree/master/Partially%20data%20collected%20by%20experiments_photos). As the project is funded by AVIC AEROSPACE LIFE-SUPPORT INDUSTRIES, LTD and has been used for commercial purposes. It only provides partial data to prevent other people from using the complete data for other commercial purposes. The [Fig.] data used to support the findings of this study are included within the article. The [Code] data used to support the findings of this study are currently under embargo while the research findings are commercialized. Requests for data, [3 years] after publication of this article, will be considered by the corresponding author.

#### Conflicts of Interest

The authors declare that they have no conflicts of interest.

#### Acknowledgments

This work was supported by a grant from the Major State Basic Research Development Program of China (973 Program) (no. 2016YFC0802703).