Mobile Target Tracking Based on Hybrid Open-Loop Monocular Vision Motion Control Strategy
This paper proposes a new real-time target tracking method based on the open-loop monocular vision motion control. It uses the particle filter technique to predict the moving target’s position in an image. Due to the properties of the particle filter, the method can effectively master the motion behaviors of the linear and nonlinear. In addition, the method uses the simple mathematical operation to transfer the image information in the mobile target to its real coordinate information. Therefore, it requires few operating resources. Moreover, the method adopts the monocular vision approach, which is a single camera, to achieve its objective by using few hardware resources. Firstly, the method evaluates the next time’s position and size of the target in an image. Later, the real position of the objective corresponding to the obtained information is predicted. At last, the mobile robot should be controlled in the center of the camera’s vision. The paper conducts the tracking test to the L-type and the S-type and compares with the Kalman filtering method. The experimental results show that the method achieves a better tracking effect in the L-shape experiment, and its effect is superior to the Kalman filter technique in the L-type or S-type tracking experiment.
The target tracking is always a popular field of study, and the motion state of its tracking objects is usually nonlinear and non-Gaussian. Therefore, when the motion state of the object cannot be clearly defined, how to accurately predict the next time’s state of the object is the direction studied by many researchers. Aiming at the problems of the linear and Gaussian’s target tracking, the Kalman filtering method can obtain a better tracking effect. Another tracking method, called sequential Monte Carlo or particle filtering, can be suitable for object’s motion behavior which possesses the nonlinear and non-Gaussian properties. The particle filtering has been proved an effective target tracking method and many related references have conducted the practice and application for the particle filtering [1–10].
Bayesian filtering is the earliest state prediction model. Fox and other scholars proposed method whose state estimation was conducted by the Bayesian filtering in 2003. The method can be applied in large parts of the sensors for achieving voice recognition, image tracking, and other techniques. The Kalman filtering obtained by improving the Bayesian filtering was introduced. Brock and Schmidt applied the Kalman filtering in the navigation system in 1970. Gutman and Velger used the Kalman filtering in the autonomous tracking in 1990. Lee and other scholars used the Kalman filtering in the object tracking of the monocular vision robot. In order to solve more complicated problems, the particle filtering was introduced. Shan and other scholars issued an article about how to use the particle filtering to real-time track the palming movement in 2004. Kwolek also proposed a method about the hand tracking in the same year, whose judging basis is the complexion and ellipse fitting combined by the face.
The image processing technique is adopted to obtain the real distance in the object. The technique is an important study topic in the machine vision field and the robot navigation field. The method for calculating the object’s distance on the basis of the image can be divided into stereoscopic vision method and the monocular vision method. The stereoscopic vision method is based on the human’s eyes by adopting two cameras to simulate the human’s eyes. In this way, the camera can obtain the image in the same time and the object’s real distance can be obtained by some complicated calculations. The distance obtained through the stereoscopic vision method is more accurate. However, it requires the processing of two images so that it consumes more time. In addition, the application of two cameras can consume higher cost. Compared with the stereoscopic vision method, the monocular vision method just requires a camera so that it costs very little. Yamagut and other scholars proposed the calculation of the object distance on the basis of the image scaling value proportional to the camera movement. In addition, the method requires processing many images to calculate its object distance. The calculating method for the object distance proposed by Mirzabaki includes two steps. Obtain the interpolating function based on the camera’s height and horizontal angle. Adopt the interpolating function to calculate the object distance in the image. The limitation of the method is that it relies on the camera’s height and horizontal angle. The method proposed by Murphey and other scholars also requires the processing of many images to obtain the object’s distance. Therefore, its limitation is the application in the real-time situation. The method for calculating the object’s distance proposed by Takatsuka must match the auxiliary equipment besides the camera, such as laser rotational sensor and LED. The method proposed by Rahman and other scholars just requires the processing of a single image to obtain the object’s distance, while the method is restricted to some simple object shapes, such as rectangle, cylinder, triangle, and circle [11–14].
Combining with the particle filtering technique and the monocular vision, the paper introduces a new object tracking method. The method estimates the target’s next time position and its size in the image, and then the object’s real position corresponding to the obtained information can be predicted. At last, according to the target’s position in the image and the actual position in the real world, the object obtained by controlling the mobile robots should be kept in the center of the camera.
2. The Hybrid Real-Time Target Tracking Rule for the Monocular Vision Mobile Robot
The whole tracking rule can be shown in Figure 1. Firstly, users must select it or set the target’s position and size in the image. Later, all possible states in the next time should be predicted, and the similarity both in all possible states and in the original states should be measured so that the similarity in all states can obtain a group of state values again. In this way, the most similar state can be adopted to calculate the actual depth, the horizontal distance and angle of the targets can be calculated, and then the movement of the mobile robot can be controlled. At last, a group of the new state can be obtained and the next time state should be estimated [15–17].
The above steps should not be ended until the nonimage data appears. In addition, the image’s target tracking, the image’s target location, and the control of the mobile robot belong to a signal procedure, as shown in Figure 1; the method proposed in the paper is the mobile target tracking rule based on the open-loop .
2.1. The Target Tracking Based on the Image
The paper adopts the particle filtering to conduct the procedure based on the image’s dynamic target tracking, as shown in Figure 2. Each particle possesses the information, including the target’s position and size in the image and its weighted value in the particle filtering .
Above all, the position and the size of the target must be obtained so that the object’s HSV reference histogram in the image can be obtained. In this way, the HSV reference histogram can be compared with the histogram whose predicted target may show the position [20–23]. The initial particle set is produced by the position and the size’s information. The particle includes the current and coordinates, the previous and coordinates, and the original and coordinates. In addition, the particle contains the current target’s amplification and shrinking rate and the previous target’s amplification and shrinking rate. The paper makes the users select the needed tracking targets by using the camera to seize the first image. In this way, the target’s position and size can be obtained; namely, the current sate can be obtained.
After the initial particle set is obtained, the second image can be received for the predicted movement. After receiving the image, the previous particles should conduct the predicted movement. The predicted movement is the most possible state found by the previous state in terms of the predicting equation. After the initial particle set conducts the predicting movement, a group of new particle sets can be obtained, and this step is called the prediction stage [24, 25].
The measuring stage should be conducted; that is, the position of the HSV reference histogram represented by each new particle and the HSV reference histogram should be calculated so that the weighted value can be obtained. The larger the weighted value is, the higher the similarity is and the more possible it is for the tracking target to be obtained.
The last step is the sampling stage again. This stage is to produce a group of new particles so that the next predicting step can be conducted. How to fetch out a group of new particles depends on the weighted values to distribute. First of all, the weighted value should be conducted using the standardizing movement and the movement is adopted to have an understanding of the weight of each particle. If a particle’s value after being weighted standardization is 0.1, the weight of the particle is 10%. If the number of the particles is 100, there are 10 particles in the new produced particles. In addition, the particle should be ordered again in terms of the size of the weighted value from large to small. The largest particle in the weighted value is regarded as the tracking target’s current real state.
2.2. The Target Location Based on the Image
The target location based on the image is to obtain the relative distance between the tracking target and the camera. In addition, the operation resources in the embedded computer are inferior to the general desktop computer. If the stereoscopic vision method is adopted, the computing burden is increased. Therefore, the monocular vision method can be adopted to reduce the operating number and the cost. The following steps describe how to use the information of signal camera to obtain the actual relative position in the tracking target in detail [12–17].
Calculating the object’s real distance can be done by the simple triangle theorem. The triangle theorem can be obtained by the depth of the target and the horizontal distance of the target, as shown in Figure 3. It can be written as the following equation:
The actual distance can be obtained by acquiring the target’s depth and the horizontal distance. The following steps describe how to obtain the image information of the target in detail. The image information is transformed to the target’s depth and the horizontal distance for obtaining the target’s actual distance. The assumption of the applications scenarios is limited to the horizontal movement in the paper. Therefore, the depth mapping formula and the horizontal distance mapping formula introduced in the paper can be obtained in the restricted condition [7–11].
Calculating the target’s actual distance in the image must be through the depth mapping and the horizontal distance mapping. Figure 4 is the image obtained by the binarization so that the target’s height of the resolution and the width of the resolution can be obtained. The horizontal resolution width is defined as the shortest straight line distance both in the objects of peripheral and in its central line. Figure 5 shows that the height of the resolution in the target has 14 pixels and the width of the resolution in the target has 13 pixels. The experimental data with many groups of the height of the resolution, the width of the resolution, the depth, and the horizontal distance can obtain the mathematical relation among these physical quantities. The procedure of transforming the target’s image to the target’s actual position with the mathematical relation is called the mapping.
2.2.1. Depth Mapping
The depth mapping is derived from a concept. When a target is smaller, the distance is farther. If the image is larger, its distance is closer, as shown in Figure 6. The experiments show that the target’s level in the same target depth can be changed and its pixel in the image cannot be changed with the changing of the horizontal position. Therefore, the target’s height of the resolution recorded in different target depths is as shown in Table 1. The quadratic equation which can represent the mathematical relation between the target’s depth and the target’s pixel obtained through the curve fitting is as follows:
Formula (2) and its data are obtained by the curve fitting method with the quadratic polynomial regression, where is the depth and the is the height of resolution. Therefore, formula (2) can obtain a pixel corresponding to its actual depth.
2.2.2. The Horizontal Distance Mapping
The experiments show that the rate between the horizontal width of resolution and the horizontal distance remains unchanged when the depth is fixed. Therefore, the width of resolution in the different horizontal distance and the fixed depth should be measured. Figure 7 shows the horizontal width of resolution obtained in the 5 cm, 10 cm, and 15 cm position close to the central line whose depth is 50 cm. The transformation rate between the horizontal pixel and the actual horizontal distance is defined as follows:
Table 2 shows the complete results of the experiment. After the data results are processed by the curve fitting method with the quadratic polynomial regression, the quadratic polynomial equation is as follows:Formula (4) can find out the corresponding transforming rate, and then the actual horizontal distance can be obtained with measuring horizontal pixel and formula (3).
2.3. The Movement Controlling Rule for the Mobile Robot
How to control the robot successfully to catch up with the prey requires some information, such as the speed at straight and the angle at turning. All the information should be judged by the image. Therefore, after being processed and judged by the controlling rule, the information in the image can be transformed into the operation instruction and then it is transmitted to the hunters who make it do the specified movement.
Section 2.2 shows that a signal image can calculate its actual distance. Therefore, the intersection angle between the target and the camera can be calculated with the use of the depth and the horizontal distance. The intersection angle is the needed rotated angle by the hunters so that the prey can remain in the horizontal central position in the camera. Its calculating formula is as follows:
In addition, Figure 8 divides the picture into three blocks to make the prey remain in the vertical central location, and the block from 80 pixels to 83 pixels is defined as the center. If the prey is in the above center block, it should be sped up. Otherwise, it should be static. The speed controlling method relies on ’s position to the prey, as shown in Figure 6. If the position is in the above 83 pixels, it is static. If the position is in the below 80 pixels and the speed is not 0, it should speed up 10%. If it is not in the below 80 pixels and the speed is 0 cm/s, the speed is 5 cm/s. Consider
The above descriptions aim at the speed control rules for the hunters and robots, and their results are obtained by the trial-and-error process. Combining with the target tracking, the target location, and the movement controlling rule for the movement robot, the actual distance between the tracking target and camera can be obtained through the position and the size of the target in the image and the set tracking target. Finally, the motion controlling rule controls the mobile robots to obtain the tracking objective.
3. Software and Hardware Specification
3.1. System Framework
The experiment tracks the prey robots to other prey robots, as shown in Figure 9. The environment we use is the region covered by the wireless network. Each system uses the wireless network as the communication bridge. The sensing system is the network camera, and its main function is to grasp the image. The image can be regressed to the controlling system through the wireless network to calculate the target’s position, the needed rotated angle, or moving angle finally transmitted to the mobile platform for achieving the autocontrol ability, as shown in Figure 10. The sensing system and the controlling system are installed in the prey robot, and the needed power in both systems is provided by the mobile platform’s internal power.
3.2. Sensing System
The sensing system we use is AXIS 207 MW Network Cameras, as shown in Figure 11. Its size is 8.5 cm × 5.5 cm × 4.0 cm and its weight is 190 g. It includes the wire and wireless networks and its input voltage is 12 V which is supplied by the battery built in the mobile platform. The 30 pixels can be grasped in the image grasping aspect in each second; the 1280 × 1024 pixel is supported in each second and its transmitting method can be used in the wire network or the wireless network.
3.3. Mobile Platform
The mobile platform used in the paper is Amigo Bot produced by the Mobile Robots company, as shown in Figure 12. The mobile platform is used as the basis for designing the whole system, as shown in Figure 10, that is, the whole hunter robot.
Amigo Bot has the server procedure which can receive the external transmitted controlling instructions. The controlling chip releases the mobile instruction to the Amigo Bot and then the Amigo Bot can be moved by controlling the motor.
3.4. Mobile Platform
How to operate the whole system depends on controlling the system to do the operations and release its movement order, and its used hardware is VIA ARTIGO A1100, as shown in Figure 13. Its size is 14.6 cm × 9.9 cm × 5.2 cm and its weight is 0.6 kilos. Therefore, its input is 12 V which is supplied by the battery built in the mobile platform. Due to the small volume, the complete function, and the simple operation, it is used as the controlling system. Its objective is to transmit the signals for the sensing system. After the signals are processed by the controlling system, the order should be released to the driving system.
3.5. Mobile Platform
Aiming at the VIA ARTIGO A1100, the used operating system is Windows XP. VMware Workstation virtual system should be installed in the Windows XP, and the virtual system can be installed in the same computer and the two different operating systems can run at the same time. Therefore, Ubuntu is installed in the virtual system and it is a kind of operating system issued by Linux. Open CV (Open Source Computer Vision) and it is installed in the bottom of the Ubuntu. In order to seize the real-time image in the webcam and process the appropriate image, the C function library should be built. ARIA (Advanced Robot Interface for Applications) function library developed by Mobile Robot should be installed, and the function library is a C function library which is adopted to do the real-time moving state control to the Amigo Bot.
4. Experimental Results
4.1. L-Type Tracking Experiment
The experiment adopts the particle filtering and makes the hunter walk the approximate L-type route. The hunter’s initial position is in the back 100 cm of the prey, as shown in Figure 14. The hunter must control prey in the center of the camera vision and keep a certain distance (100 cm) with prey .
In order to discuss the tracking effect in the experiment, the hunter’s ideal route satisfied with the above two conditions can be obtained by the following method. If the hunter’s position in the moment needed to be obtained, the prey’s position in the moment and the hunter’s position in the moment must be obtained.
Then a straight line should be drawn to combine with the two positions, and the prey’s position in the moment found in the straight line should be regarded as the point whose fixed length is , that is, the hunter’s position in the moment. The hunter’s position in all moments obtained by the method combines the hunter’s ideal route.
In Figure 15, A point represents the prey’s coordinate positions in different moments, and B point is the hunter’s initial position when is 0. The distance between the initial position and the prey’s position is . When the prey moves to A point whose is 1, the straight line between the A point whose is 1 and the B point whose is 0 represents hunter’s forwarding direction whose is 1. The hunter’s correct position is the C point whose distance to the A point is . The dotted line in Figure 16 is the hunter’s ideal route obtained by simulating the procedures in terms of the above method. Figure 16 shows that the hunter’s forwarding route is different from the prey’s forwarding route; it is an arc curve.
The experiment adopts the tracking rule proposed in the paper, making the hunter robot and the prey robot record their coordinate values, respectively, during the process of marching. Compared with the ideal hunter route, the experimental results in Figure 17 show that the real route is identical with the ideal route so that the tracking method can obtain a better tracking effect.
4.2. Kalman Filtering Tracking Experiment
Kalman filtering is one of the methods used for the target tracking. The experiment mainly compares the accuracy between the Kalman filtering and particle filtering. Its experimental method adopts the Kalman filtering to do the L-type tracking experiment and then compare its results.
A better tracking effect in the paper can be obtained by improving the basic Kalman filtering. The basic Kalman filtering is adopted to predict the new state vector, and the used prediction method is as follows: where , , , and are, respectively, the coordinate, coordinate, displacement, and displacement, that is, the best state in the moment. , , , and can predict the state in the moment. Formula (7) is as follows:
Formula (8) can obtain the predicted coordinate value, that is, the best state in the previous moment, and the displacement in the unit interval. The displacement in the paper is defined as the differential value between the measuring coordinate in the previous moment and the measuring coordinate in the current moment. The following adopts the method to do the L-type tracking experiment for comparing the difference between the method and the particle filtering method.
The experiment also records the hunter’s and the prey’s coordinates the same as the previous experiment and then compares them with the hunter’s ideal forwarding route, as shown in Figure 18. The figure shows that the route obtained in the Kalman filtering is the same as the predicted route, while its tracking procedure is not all the linear tracking. Therefore, the tracking effect in the turning is better.
4.3. S-Type Tracking Experiment
The experiment makes the prey march along the S-type route and discusses its tracking effect. First of all, aiming at the prey’s forwarding route, the corresponding hunter’s ideal forwarding route can be drawn with the above method.
Figures 19 and 20 are, respectively, the tracking results of the particle filtering and the L-type tracking. The area combined the hunter’s forwarding route, and ideal forwarding route is regarded as the basis for comparing the tracking effects. After being estimated, the areas combined by the two routes are, respectively, 1.71 m2 and 2.16 m2. The tracking effect of the particle filtering is superior to the Kalman filtering in the S-type tracking experiment. However, the two routes are not approximating to the ideal route. If a better tracking effect should be obtained, it must be discussed or improved by the movement control or tracking rule.
The paper proposes a real-time target tracking method based on the open-loop monocular vision mobile control. The method adopts the particle filtering to predict the mobile target’s position in the image. The method can effectively grasp the target’s linear and nonlinear mobile behavior in terms of the characteristics of the particle filtering. In addition, the simple mathematical operation can be adopted to transmit the mobile target’s image information to its real coordinate information which just requires a few operating resources. Moreover, the method adopts the monocular vision to reduce the hardware resources.
The method can estimate the target’s position and size in the next moment and then predict the target’s real position in the obtained information. The rotated angle and the forwarding speed of the mobile robot should be controlled to make the target stay in the vision center of the camera. The paper conducts the L-type and the S-type tracking test and compares them with the Kalman filtering. The experimental results show that the method proposed in the paper can obtain a better tracking effect in the L-type experiment and is superior to the Kalman filtering in the two kinds of the experiments.
The future work includes the following points: widening the limitation of the application scenarios, adopting the method suitable for the scientific principle to improve or establish the mobile controlling rule for the robots, and adopting the Haar filter, AdaBoost classification method, online update method, and other methods to avoid the appearance change when the target is in the turning or confronts the shield. In this way, the problems which cannot be overcome by the HSV reference histogram can be solved and the tracking effect can be improved.
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
This study was supported by the National Natural Science Foundation of China (Grant nos. 61202163 and 61373100); Natural Science Foundation of Shanxi Province (Grant no. 2012011015-1), and Programs for Science and Technology Development of Shanxi Province (Grant no. 20120313032-3).
G. W. Ng, Intelligent Systems: Fusion, Tracking and Control, Institute of Physics, 2003.
A. Doucet, N. D. Freitas, and N. Gordon, Sequential Monte Carlo Methods in Practice, Springer, 2001.
B. Ristic, S. Arulampalam, and N. Gordon, Beyond the Kalman Filter: Particle Filters for Tracking Applications, Artech House, 2004.
D. Fox, J. Hightower, H. Kauz, L. Liao, and D. Patterson, “Bayesian techniques for location estimation,” in Proceedings of the Workshop on Location-Aware Computing, pp. 16–18, 2003.View at: Google Scholar
L. D. Brock and G. T. Schmidt, “General questions on Kalman filtering in navigation systems,” in Kalman Filtering: Theory and Application, C. T. Leondes, Ed., NATO, AGARD, 1970.View at: Google Scholar
B. Kwolek, “Stereovision-based head tracking using color and ellipse fitting in a particle filter,” in Computer Vision—ECCV 2004: 8th European Conference on Computer Vision, Prague, Czech Republic, May 11–14, 2004. Proceedings, Part IV, T. Pajdla and J. Matas, Eds., pp. 192–204, Springer, Berlin, Germany, 2004.View at: Publisher Site | Google Scholar
D. A. Forsyth and J. Ponce, Computer Vision: A Modern Approach, Prentice Hall, New York, NY, USA, 2002.
N. Yamaguti, S. Oe, and K. Terada, “Method of distance measurement by using monocular camera,” in Proceedings of the 36th SICE Annual Conference (SICE '97), International Session Papers, pp. 1255–1260, July 1997.View at: Google Scholar
M. Mirzabaki, “A new method for height estimation in depth,” in Proceedings of the 8th Conference on Signal, Speech and Image Processing (WSEAS '08), pp. 29–33, Santander, Spain, September 2008.View at: Google Scholar
Y. L. Murphey, J. Chen, J. Crossman, J. Zhang, P. Richardson, and L. Sieh, “DepthFinder, a real-time depth detection system for aided driving,” in Proceedings of the IEEE Intelligent Vehicles Symposium, pp. 122–127, 2000.View at: Google Scholar
M. Takatsuka, G. A. W. West, S. Venkatesh, and T. M. Caelli, “Low-cost interactive active monocular range finder,” in Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’99), pp. 444–449, Fort Collins, Colo, USA, June 1999.View at: Publisher Site | Google Scholar
AXIS 207MW Network Camera, 2012, http://www.axis.com/products/cam_207mw/.
AmogoBot Robot for Education and Research, 2012, http://www.mobilerobots.com/ResearchRobots/AmigoBot.aspx.
VMware Workstation, 2012, http://www.vmware.com/tw/support/product-support/workstation.html.
Ubuntu for you, 2012, http://www.ubuntu.com/ubuntu.