Target Recognition and Localization of Mobile Robot with Monocular PTZ Camera

Wang, Hongxing; Li, Ruifeng; Gao, Yunfeng; Cao, Chuqing; Ge, Lianzheng; Xie, Xiong

doi:https://doi.org/10.1155/2019/8789725

Journal of Robotics

On this page

Abstract Introduction Analysis Conclusions and Discussion Data Availability Conflicts of Interest Acknowledgments References Copyright Related Articles

Research Article | Open Access

Volume 2019 | Article ID 8789725 | https://doi.org/10.1155/2019/8789725

Target Recognition and Localization of Mobile Robot with Monocular PTZ Camera

Hongxing Wang,^1,2Ruifeng Li ,¹Yunfeng Gao,¹Chuqing Cao,^3,4Lianzheng Ge,¹and Xiong Xie²

Academic Editor: Gordon R. Pennock

Received15 Jan 2019

Accepted11 Feb 2019

Published19 Mar 2019

Abstract

The target recognition and location based on the vision sensor is a kind of more intuitive and effective method. The paper designs a human mobile robot and the monocular PTZ camera fixed on the differential driver of the mobile robot for the target recognition and localization. The camera parameters are calibrated using the Zhang Zhengyou calibration method; target recognition is based on the combination of the color and the edge detection. First, the objects with the same color are extracted by the color recognition method (by setting an appropriate threshold value for the HSV color space of the acquired image). Then the target is further extracted by the edge detection (by Hough circle transformation); target location is based on the Similar Triangle Principle because the PTZ camera has pitch, tilt, and zoom characteristics and the monocular vision sensor of the camera with different pitching angles is used to measure the distance between the robot and the target. According to the characteristics of the monocular PTZ camera the mobile robot realizes the target localization and tracking. The simulation and experimental results demonstrate that the mobile robot shows better target recognition and localization in mobile tracking target process and proves the effectiveness of the method.

1. Introduction

Target recognition and location based on monocular vision is of great practical value. In recent years, the research of the monocular vision-based target detection algorithm has been valued by the home and abroad scholars. In [1], a method by using a single camera as monocular measurement is presented based on image processing to alleviate the effect of matching of corresponding feature points and extraction error of single feature point. But the maximum relative error still reaches 1.68% after revised data. In [2], the author assumes that a mobile robot is equipped with a single camera and a map marking the positions in its environment of landmarks. In [3], two different images were chosen from a sequence of images for the same target in different position. Then, feature points could be extracted and matched through scale-invariant feature transform algorithm for the two images. Through analysis of the different positions for the same feature point on different images and combined with the movement parameters of the aircraft, the information between the aircraft and the obstacle could be got. Reference [4] proposes the monocular vision measurement algorithm based on the color and texture of the ground. It improves the ant colony optimization combined with the car model performance and actual running environment. The performance of the optimization algorithm is better than other algorithms in run time. In order to reduce the deviation brought by feature points matching and optical axis inclination, [5] proposes a new method to measure the target distance for wheeled mobile robot based on the monocular vision, which can extend the plane objective to the three-dimensional one and achieve high measurement accuracy without adjustments. Experimental results indicate that the comprehensive error ratios of the proposed method are all under 0.7%, which can satisfy the system requirements of instantaneity and reliability for monocular distance measurement of wheeled mobile robot. A sensor system is developed in [6] to measure the position and orientation of planar robots. In the sensor system, a monocular vision is integrated with a detection method for abstracting the scale- and orientation-invariant image features. Instead of using multiple cameras, a monocular vision is utilized as the only sensing device to reduce the computation cost. The scale- and orientation-invariant method is employed to guarantee a robust detection and description of features abstracted from an image. Experiment is carried out on a free-moving monocular camera to verify the performances of the proposed system. Reference [7] researched the real-time location algorithm based on monocular vision. According to the principle of pinhole imaging, the mapping relationship between the imaging points and the target points is obtained, and pinhole model is established. Then, the depth information of the image is obtained through the geometric relationship between the image points and the target points. A range finding algorithm based on monocular vision is proposed in [8], which realizes 3D to two-dimensional transformation using a camera captured image and obtains depth information. This method can accomplish target tracking and ranging in dynamic environment. Reference [9] proposed a kind of panoramic vision which can observe visual field without dead angle at 360 degrees and can collect all the visual information in the direction of space. But its disadvantage is that the robustness of extracting environment features is poor, and the feature matching is difficult. However, these studies lack real-time tracking information on the entire mobile process of the mobile humanoid redundant robot without adopting the PTZ camera with the function of the pitch and the tilt.

Considering the drawbacks of the previous studies, we propose the target recognition methods based on color and geometry using the monocular PTZ camera. The target is measured based on the Right Triangle Similar Principle under the different pitching angles of the monocular visual sensor and the relative coordinates. Comparing the results of the simulation and of the experiment, the method demonstrates the effectiveness and the real-time performance.

2. Target Recognition Based Multifeature Fusion of Dual-Arm Mobile Robot

The paper models the target reorganization by adopting the recognition method of the combining color and geometric shape of the target. The experimental results prove that the method has better real-time performance, effectiveness, and practicability which offer a good foundation for the follow-up studies that the humanoid robot realizes tracking and operating the object completely. Thus the target recognition of the mobile robot with more redundancy has a wider range of theoretical and practical value.

2.1. Target Recognition Based on Color

A color target is extracted according to color characteristics and the rest is removed. The purpose of this is to search only in this region in the next step of the processing, without searching the entire image, and shorten the visual processing time. In this paper the target recognition system based on the monocular PTZ camera consists of the dual-arm mobile robot shown in Figure 1(a), monocular PTZ camera shown in Figure 1(b), and the differential driving platform shown in Figure 1(c), respectively. The main product parameters and the internal parameter matrix of the monocular PTZ camera are shown in Table 1 and in Equation (3), respectively. The internal parameter of the PTZ camera was obtained by Zhang Zhengyou calibration method, as shown in Section 3.1.

(a)

(b)

(c)

The target recognition based on the color consists of the image capturing, a series of image processing, and the image display. This paper uses the 2-Mode threshold segmentation method using HSV color space model to detect and identify the targets.

2.1.1. Imaging Processing Method

The imaging processing methods include gray level conversion, histogram, threshold segmentation, image binarization, edge detection, and color component extraction.

Gray-Scale Histogram. Histogram is a simple and practical tool for image processing and used to understand feature processing in image. The gray-scale graph usually has 256 gray-scale, and the gray histogram represents the number of pixels with gray level. The left side of the histogram represents the dark part, the right side represents the bright part, and the middle indicates the middle tone. Histogram can directly provide us with different gray-scale distribution information. Figure 2 is the gray-scale in Figure 2(b) and histogram in Figure 2(c) corresponds to the original image in Figure 2(a).

(a)

(b)

(c)

Threshold Segmentation and Image Binarization. Threshold segmentation is a region-based image segmentation technique, which divides image pixels into several categories. Threshold image segmentation is one of the most commonly used traditional image segmentation methods. It has become the most basic and widely used segmentation technology in image segmentation because of its simple implementation, small computation, and stable performance. It is especially suitable for images with different gray level ranges between targets and backgrounds. It can not only greatly compress the amount of data but also greatly simplify the analysis and processing steps. Therefore, in many cases, it is a necessary image preprocessing process before image analysis, feature extraction, and pattern recognition. Threshold segmentation is to binarize the image first, then to determine a suitable threshold, and to compare the current pixel with the threshold. If the current pixel is larger than the threshold, it is the target (black); if the current pixel is smaller than the threshold, it is the background (white). The gray threshold segmentation transformation expression is described as

where T represents the threshold of the binary image.

There are many algorithms for image segmentation. The object recognition in this paper is simple in image background and the 2-Mode image segmentation method can be used. The threshold selection for image segmentation based on 2-Mode method is that many gray-scale similar pixels make up each region. If there is a big difference between the objects and the background in the image, bimodal image will appear in the gray-scale histogram. Therefore, an appropriate threshold can be selected for image segmentation. When the histogram is two obvious peaks, choose the valley value between the two peaks values as the best threshold.

The image distribution of two peaks can be shown in Figure 3. The vertical values of the peaks are Hmax1 and Hmax2, respectively, and their corresponding gray values are T1 and T2, respectively. Then the idea of bimodal image segmentation is to find the lowest valley value between the two peaks of the image, that is, to find the threshold T in the gray range of [T1, T2], which can satisfy the minimum number of corresponding pixels and is the lowest height in the image. T is used to segment or binary process the image. Figure 2(a) is converted into the binary image shown in Figure 4(a) using threshold value and the binary image with a range of 150 is shown in Figure 4(b).

(a)

(b)

2.1.2. Target Recognition Based on S Component of HSV Color Space Model

The target for location in paper is a green tennis, as shown in Figure 5(a). The display effects and color histograms of the target models on three different color channels R, G, and B are shown in Figures 5(b)–5(g) and on three different color channels H, S, and V are shown in Figures 6(a)–6(f), respectively.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(a)

(b)

(c)

(d)

(e)

(f)

The distribution of the histogram values of R, G, and B components based on the RGB space model of the target is scattered, as shown in Figures 5(c), 5(e), and 5(g), respectively, while the distribution of the histogram values of H, S, and V components based on the HSV space model of the target is relatively concentrated, as shown in Figures 6(b), 6(d), and 6(f), respectively. This feature plays a key role in the performance of the target detection and recognition based on color features. So this paper uses HSV color space model to detect and identify the targets.

Considering more noise interference in H and V component images, as shown in Figure 6(b) and in Figure 6(f), respectively, which are not conducive to target detection and recognition based on edge features, this paper uses S component in HSV color space model to detect and identify the targets further. The display effect images of the target with different minimum threshold values in S component are shown in Figures 5(a)–5(d) respectively. When the minimum threshold value of the S component is above 150, the effect of the target detection and recognition based on color feature is the best, as shown in Figure 7(c). So the paper chooses threshold 150 as the T value of 2-Mode image segmentation method.

(a)

(b)

(c)

(d)

2.1.3. Target Recognition Experiment Based on Color

The target in this paper is a green tennis. First, the green objects can be well identified in Figure 8(b) when different color objects are used, which is compared with Figure 8(a) by color recognition.

(a)

(b)

(c)

(d)

2.2. Target Recognition Based on Edge Detection

Using color segmentation, the possible region of the target is obtained. Then the target recognition based on edge detection is performed, which can reduce the work of the data processing of the original image and increase the success rate of the target recognition and improve the efficiency of testing. Edge detection is relatively basic and practical in visual research and the algorithm processing of edge detection is usually carried out after image enhancement, algorithm processing, color threshold segmentation, and filtering algorithm processing, all of which are detected by adjusting the value of the threshold.

2.2.1. Canny Edge Detection Method

The purpose of image edge detection is to enhance the edge information of objects in the image and to reduce the interference of useless information to a certain extent, so as to serve the later image processing. The paper adopts Canny edge detection algorithm. The method of Canny edge detection is to use a 55 convolution kernel for Gaussian blur (the Gaussian blur here is not the same step as that of the upper section, where the Gaussian blur belongs to a part of the Canny edge detection algorithm), and then use a pair of convolution matrices to calculate the gradient direction and amplitude to suppress nonpolarity. Large values are used to filter out potential edges. Finally, the edges of objects in the detected image are determined by a lag threshold (consisting of a low threshold and a high threshold). The function of lag threshold is to regard a pixel as an edge pixel when its pixel value exceeds a high threshold value and a pixel as a nonedge pixel when its pixel value is lower than a low threshold value. When the pixel value of a pixel is in the range of two thresholds, only the pixels connected with the pixel point are edge images. Prime pixels only regard pixels as edge pixels. The original image and the effect diagram by Canny edge detection are, respectively, shown in Figures 7(b) and 7(c).

It can be seen from Figure 7(c) that Canny edge detection can detect the object in the image well when the parameters are selected properly (including target).

2.2.2. Hough Transform for Circle Detection

Hough Transform is one of the basic methods to recognize geometric shapes from images in image processing. The basic principle of Hough Transform is to transform a given curve in the original image space into a point in the parametric space by using the duality of points and lines. In this way, the problem of detecting a given curve in the original image is transformed into the problem of finding the peak value in the parameter space, that is, to transform the overall detection characteristics into detection of local characteristics, such as straight lines, ellipses, circles, and arcs.

For more precision of the target recognition, Hough Transform for the circle detection is used for the further recognition after the target recognition based on the color and on the edge detection. The basic principle of Hough circle detection is that the basic formula of circle transformation is to accumulate the three-dimensional parameter space coordinates (a, b, r) and complete the detection task by statistics. In the image, a cone surface appears in the corresponding parameter space for each point transformed by (2). After all the edge points are transformed, a cluster of cones intersect at a point, and then the number of times intersecting at the same point is accumulated. If the number exceeds a set threshold, the circle parameters are obtained.

where represents pixel coordinates in images. Hough Transform can accurately select the geometric shapes with specific requirements in images with multiple geometric shapes.

2.2.3. Edge Detection Experiment Based on Canny Edge Detection and Hough Circle Detection

According to the method in Section 2.1.2, we recognize the green objects, as shown in Figure 7(b). According to the methods in Section 2.2.1, when different geometric shape targets are used, the contours of every target can be well represented in Figure 7(c) which is compared with Figure 7(b) by the geometric shape recognition. According to the methods in Section 2.2.2, we further recognize the circular target in Figure 7(d) which is compared with Figure 7(c) by the Hough Circle detection. Similarly when the target is in major environment, as shown in Figure 9(a), the contours of every target can be well represented by Canny edge detection in Figure 9(b) and the circular target can be recognized by Hugh Circle detection in Figure 9(c). Experiments show that this method has better recognition effect and meets the requirement of the target recognition.

3. Target Location Based on Similar Triangle Theory of Dual-Arm Mobile Robot

Compared with the distance measurement based on the binocular vision, the monocular vision ranging is cheaper, has relatively simpler structure, is the simpler algorithm, and is more practical. Combined with the feature that the PTZ (Pan/Tilt/Zoom) camera can speed up tracking target, the paper adopts the monocular PTZ camera to locate target.

3.1. Zhang Zhengyou Calibration Method

In this method, we first need to make a plane calibration board, then use the camera to take calibration plates from different directions, usually take ten or twenty pictures, then use the camera calibration tool of MATLAB software to process the image, and then calculate the camera’s internal parameter matrix. There is a one-to-one correspondence between each feature point (corner point extracted by Harris algorithm) on the calibration board and the corresponding image points on its image, which can be expressed by homography matrix. For each image, a corresponding matrix can be determined, which provides constraints for the solution of internal parameters. The algorithm is based on the idea of two-step method. First, the initial values of some parameters are obtained by a linear method; then the linear results are optimized nonlinearly by considering radial distortion and maximum likelihood criterion. Finally, the external parameters are obtained by using the calculated internal parameters and homography matrix. The space map of Zhang Zhengyou’s plane calibration is shown in Figure 9.

The internal parameter matrix of the monocular PTZ camera can be obtained by Zhang Zhengyou calibration method as follows:

This paper adopts the monocular vision location method with PTZ camera. The schematic of monocular vision-based distance measurement is presented in Figure 10.

is the centre point of the lens; is the cross point between the optic axis and the imaginary plane and the origin of the imaginary plane; is the projection of the detected point in the imaginary plane, as shown in Figure 10.

The relevant relationships can be described as follows:

The distance between the mobile robot and the target can be derived by (4), (5), and (6):

where and are known; is established; the unit of is the pixel; is the frame memory coordinate of the in the camera optical axis and the image plane intersection; the right-arm represents the velocity of the fixed dual-arm robot; is the frame memory coordinate of the ; the physical dimensions of the image plane in the X axis and the Y axis corresponding to one pixel in frame memory are and , respectively.

The transformation from in image coordinates to in pixel coordinates can be described by the following homogenous transformation matrix:

When and are established, (11) is deduced by (10):

where are separately the intrinsic parameters of the monocular PTZ camera obtained by Zhang Zhengyou calibration method. In the end the distance between the measured point and the monocular PTZ camera can be derived by (9), (10), and (11), as shown in (12).

4. Realization of Target Recognition and Location Based on Monocular PTZ Camera of the Mobile Robot

Considering the characteristics of the mobile robots and the target movement, the image is captured at rate of 10 frames per second and processed to assure the real-time requirement of the dual-arm mobile robot system. The specific processes are listed as follows.

Step 1. Build a dual-arm mobile robot and mount visual system on the platform of the robot.

Step 2. Calibrate the PTZ camera using the Zhang Zhengyou calibration method and get its internal parameters.

Step 3. Perform image processing: selecting proper camera, reading the image data per frame, distortion rectifying, histogram equalization of acquired color image, morphological filtering, and so on.

Step 4. Adopt the combined target recognition method based on color and the geometric shape of the target.

Step 5. Use the similarity principle of right-angled triangle to measure the distance between the target and the mobile robot.

Step 6. The next motion status of the mobile dual-arm robot is confirmed and the program will go to Step 3; perform loop process.

Step 7. End program.

5. Experiment and Analysis

For verifying validity about this method, we have done several verification experiments separately.

5.1. Experiment I: Target Recognition Experiment

Except for the target in simple scenes, as shown in Figure 7, the experiment of the target put in complex real-world scene is done, which is shown in Figure 11(a). The effect diagram of Canny edge detection is shown in Figure 11(b) and the effect diagram of Hough transform for circle detection is shown in Figure 8(c) (the specific principle and operation process are shown in Sections 2.1 and 2.2). From its effects point of view, the circular target can be better detected.

(a)

(b)

(c)

5.2. Experiment II: Target Location Experiment

The target is put from near to far, which is shown in Figure 12. The actual distance and the visual ranging between the mobile robot and the target and the error between the two distances are shown in Table 2.

(a)

(b)

(c)

(d)

(e)

As seen from Table 2, when the distance is less than 600mm (α = -25°), the relative error is larger. The italic digital part is the part whose measuring distance error is less than one percentage point and is the better tilt angle corresponding to the measurement distance. Within this range the ranging method based on the PTZ visual sensor has better effect for the mobile robot. This shortcoming can be solved by the movement of the mobile robot which also included the pan and the tilt of the PTZ camera and the correlation algorithm. Therefore the method of the recognition and location based on the PTZ visual sensor of the mobile robot shows good practicality and reliability.

6. Conclusions and Discussion

This paper sets up a visual recognition system based on color and the geometric shape target recognition (Hough circle transformation) and visual location system based on the similarity principle of right-angled triangle of the mobile robot. According to the characteristics of the monocular PTZ vision sensor the mobile robot realizes the target localization and tracking through the real experiment. The experimental results prove the effectiveness and the practicability of the two methods. In future works, combining the advantages of laser sensor and research on the target recognition and localization of the mobile robot based on the monocular PTZ camera and the laser sensor will be carried out to improve accuracy of the target recognition and localization of the mobile robot. The relative algorithm will be optimized in further experiments including the study of the multisensor recognition and high-precision location about the redundant mobile dual-arm robots.

Data Availability

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare that there are no conflicts of interest regarding the publication of this paper.

Acknowledgments

This work was supported by NSFC Grant (no. 61673136), IPSKL Grant (no. SKLRS201609B), NSFC Grant (no. 61763032), and EDJXPGRP Grant (no. 20161BBH80040).

References

Y.-X. Han, Z.-S. Zhang, and M. Dai, “Monocular vision system for distance measurement based on feature points,” Optics and Precision Engineering, vol. 19, no. 5, pp. 1110–1117, 2011.
View at: Google Scholar
E. Krotkov, “Mobile robot localization using a single image,” in Proceedings of the International Conference on Robotics & Automation, pp. 978–983, IEEE, 1989.
View at: Google Scholar
S. Xiao and T. Jianwu, “Location method of static object based on monocular vision,” Acta Photonica Sinica, vol. 45, no. 10, pp. 1012003-1–1012003-8, 2016..
View at: Google Scholar
X. Juntao, X. Shaopeng, and W. Ye, “Path planning of intelligent car based on computer vision,” Computer Engineering and Applications, vol. 52, no. 7, pp. 236–241, 2016.
View at: Google Scholar
X. Dawei and Z. Junyong, “Target distance measurement method with monocular vision for wheeled mobile robot,” Computer Engineering, vol. 43, no. 4, pp. 287–291, 2017.
View at: Google Scholar
Y. Wang, K. Chen, P. Li, and C. Chi, Planar Robot Position and Orientation Measurement Using a Monocular Vision, vol. 212, Springer FIRA, Berlin, Heidelberg, Germany, 2011.
Z. Song, S. Haijun, and W. Huaqiang, “Real-time location algorithm based on monocular vision,” Journal of Suzhou University, vol. 31, no. 8, pp. 114–117, 2016.
View at: Google Scholar
H. Firouzi and H. Najjaran, “Real-time monocular vision-based object tracking with object distance and motion estimation,” in Advanced Mechatronics (AIM), pp. 987–992, IEEE/ASME, 2013.
View at: Google Scholar
J. Xu, J. Wang, and W. Chen, “Omni-vision-based simultaneous localization and mapping of mobile robots,” Robot, vol. 30, no. 4, pp. 289–297, 2008.
View at: Google Scholar

Copyright

Copyright © 2019 Hongxing Wang et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

PDF Download Citation

Download other formats

Order printed copies

Views

4225

Downloads

1359

Citations