Journal of Robotics

Journal of Robotics / 2016 / Article

Research Article | Open Access

Volume 2016 |Article ID 2379685 |

Yaguang Zhu, Baomin Yi, Tong Guo, "A Simple Outdoor Environment Obstacle Detection Method Based on Information Fusion of Depth and Infrared", Journal of Robotics, vol. 2016, Article ID 2379685, 10 pages, 2016.

A Simple Outdoor Environment Obstacle Detection Method Based on Information Fusion of Depth and Infrared

Academic Editor: Shahram Payandeh
Received18 Jul 2016
Revised10 Oct 2016
Accepted01 Nov 2016
Published04 Dec 2016


In allusion to the existing low recognition rate and robustness problem in obstacle detection; a simple but effective obstacle detection algorithm of information fusion in the depth and infrared is put forward. The scenario is segmented by the mean-shift algorithm and the pixel gradient of foreground is calculated. After pretreatment of edge detection and morphological operation, the depth information and infrared information are fused. The characteristics of depth map and infrared image in edge detection are used for the raised method, the false rate of detection is reduced, and detection precision is improved. Since the depth map and infrared image are not affected by natural sunlight, the influence on obstacle recognition due to the factors such as light intensity and shadow is effectively reduced and the robustness of the algorithm is also improved. Experiments indicate that the detection algorithm of information fusion can accurately identify the small obstacle in the view and the accuracy of obstacle recognition will not be affected by light. Hence, this method has great significance for mobile robot or intelligent vehicles on obstacle detection in outdoor environment.

1. Introduction

Detection of the obstacle has a wide application in smart robots, intelligent vehicles, and autonomous agricultural vehicles. Among them, obstacle detection based on visual image information [1, 2] is mainly used in vision system of automatic driving vehicle or robot, The system completes some special functions similar to the human eye and provides reliable and accurate environmental information for path planning. Hence, the visual navigation system must be able to detect obstacles quickly and accurately and has better robustness.

The sensors used for traditional obstacle detection mainly include laser radar sensors, ultrasonic sensors, infrared sensors, and visual equipment [3]. The basic principle of laser radar sensors is that the distance information can be obtained by calculating the time interval between emission and receiving of laser [4]. Intelligent vehicle named Stanley [5] from Stanford University installs a lot of laser radar sensors at the top of the car, which is used for judging whether the area is an obstacle or an area that can be passed through by 2D grid based on Markov model and height difference between adjacent points. However, the size of the laser radar sensor is relatively large, the cost is very high, and the installation and use are also more complex. In the study of detecting obstacles by using ultrasonic sensors, Figueroa and Lamancusa [6] proposed a method to detect obstacles by using the relationship between the phase delay and the peak value of the echo and then combined with echo time, which had higher detection accuracy. Furthermore, Hua et al. [7] proposed a method to detect phase position by modulating double-sideband amplitude, in which excitation signal of ultrasonic pulse was modulated by using a low frequency signal and adopting appropriate modulation data. This method not only guarantees the resolution of the phase detection obstacle, but also expands the scope of the obstacle detection to a certain extent. But the orientation of the ultrasonic sensors is poorer during the detection. When used in moving or complex conditions such as wild environment, it will not perform well. The principle of infrared sensor is infrared thermal imaging technology [8], which is a wavelength conversion technology. It is the transformation of infrared radiation to the visible graphics. Infrared thermal imaging system has a certain ability to penetrate and it can distinguish right from wrong and has strong anti-interference ability. According to the characteristics of infrared imaging, the infrared sensor has better adaptability and stronger direction. But it is very easy to be affected by the color of the object, and it cannot provide distance information. In recent years, with the development of computer technology, it has become more and more popular to use binocular camera to obtain the depth information of object and scene. However, the disadvantage of this method is that it is easy to be affected by light [9, 10]. Zhang and Zhu [11] proposed a method to segment image based on binocular vision, but it was very easy to be affected by light and has low robustness. Although the use of multiple sensors can avoid the shortage of a single sensor, it is difficult to achieve the integration of information and to ensure the synchronization between the various sensors.

Aiming at the problems of poor robustness and low detection accuracy, a fast and accurate detection method based on depth information and infrared information is proposed. Kinect V2.0 sensor is used in this paper, which can obtain depth information and infrared information at the same time. The main processes of obstacle detection in this paper are as follows. Firstly, the areas to be detected are segmented by the mean-shift algorithm. Secondly, gradient calculation is carried out to highlight the edge information of the object. Finally, the obstacle can be detected by using edge detection or other methods and fusing infrared and depth information. The method proposed in the paper can solve the problems of the accuracy and robustness of obstacle detection of mobile robot and intelligent vehicle in complex environment.

2. Problem Descriptions

Usually, obstacles need to be detected in both indoor and outdoor environment. The only difference is that the outdoor obstacles may be more complex and diverse. Two different scenes collected by Kinect are shown in Figures 1 and 2. It can be seen from the two images that the edge information of contact part between the obstacle and ground in depth map is not very clear. The edge information of the upper part of the obstacle is more obvious. But it is opposite in infrared image. Using the characters of infrared image information, the depth map can be improved. Therefore, if depth information and infrared information are fused together effectively, the detected obstacle boundaries can be segmented from the visual scene accurately. Besides, it will enhance the robustness and can guarantee high recognition rate because the depth information and infrared information are not affected by natural light.

Recently, most of the obstacle detection algorithms based on single vision adopt motion information of the different images [12]. However, due to the lack of additional information, it makes the obstacle produce abnormal values. But it can be solved by fusion of infrared image and depth map. Most obstacles information in the depth map is uniform, because the depth information of the same obstacle has similar characteristics and the depth differences of connection parts between obstacle and upholder are small. Hence, the depth information can be used to remove abnormal values in detection areas as background or some areas beyond the scope of the detection area. If only the depth image is used for obstacle detection and the obstacle is on the edge of the identified range, the depth value of obstacle is similar to the value of the ground when obstacle is smaller. So the ground and obstacle are easily identified as one object, and then errors may occur. Moreover, even though the objects have beyond the setting detection distance, with the good reflection, Kinect camera is still able to get its depth information and be able to detect it. That makes the identified obstacles exceede the scope of the requirements. On the other hand, if only infrared image is used for detection, many wrong areas will be detected because of the characters of infrared information. The colors of areas are diversified and most of these errors are relatively close to the camera and at the same time further obstacles will be ignored. The detection results of infrared images are shown in Figure 3(a).

According to the analysis above, we find that the wrong detection areas of depth map and the wrong detection areas of infrared image are not coincident. So the infrared image can be used for removing the wrong detection areas of depth map. The detection results after the fusion of depth map and infrared image are shown in Figure 3(b), in which the obstacle is a can. Therefore, in order to improve precision and performance, the depth map and the infrared image should be fused to detect the obstacles.

3. Fusion Method

Kinect V2.0 sensor is adopted in this paper, which can collect depth map and infrared image with 512 × 424 resolutions and 30 fps. Meanwhile, it can also collect color image with 1920 × 1080 resolutions and 30 fps. The effective range is 0.5 m–8 m [13]. Compared with other sensors, Kinect V2.0 can collect depth information and infrared and color information at the same time. It effectively avoids the problem of uncoordinated caused by using different sensors. In addition, it has lots of advantages; for example, this sensor has small volume and can obtain accuracy information of image. Besides, the most important is that it has strong adaptability to complex environment. Here, the method proposed by Zhang et al. [14, 15] is adopted to calibrate Kinect camera before the detection of obstacles. The thick red line in Figure 4 represents the vertical wall. Compared with pictures before calibration, the distortion caused by the camera has been corrected obviously.

According to the advantages of Kinect V2.0, obstacle detection algorithm based on information fusion of depth and infrared is proposed. The foreground of depth map is intercepted by mean-shift algorithm firstly, and then the edge information is extracted and processed by the fusion of depth map and infrared image, which is shown in Figure 5. The detection range is 0.5 m–4 m. Edge detection based on depth image or infrared image alone has its own characteristics. Gradient features of depth information can be obtained in obstacle detection with depth image alone. However, since the edge information of the areas segmented by mean-shift algorithm is more prominent, they also can be detected. Similarly, in obstacle detection with infrared image alone, the edge information of the front object in the image taken by the Kinect camera can be extracted according to the characteristics of infrared image, while the edge information of the rear object cannot be detected. Finally, a perfect detection result can be obtained as long as combining these two features.

In the depth image obtained by Kinect, it is an effective method to segment foreground of detection area via mean-shift algorithm. It has not received much attention when the mean-shift theory was proposed by Fukunaga and Hostetler [16], until the kernel function was proposed by Cheng [17]. The use scope of the mean-shift algorithm was expanded, and it is only beginning to attract more attention. Comaniciu et al. [18, 19] successfully applied the mean-shift algorithm to image processing and got good results in smoothness and segmentation of image. The kernel function defined by Cheng is that: if a function has a profile function , namely,and should meet the following: is not negative; is not increasing, namely if , then ; is piecewise continuous, and , then is a kernel function, in which is a point in space, represented by column vector and represents real number field.

According to kernel function, the mean-shift formula can be described as follows [16]:in which is the depth of the pixel to be processed, is the depth of the sampled pixel point, is an unit kernel function, is a weight of sampling point , , and is bandwidth matrix that is usually expressed by a positive definite symmetric matrix with . When this function is used, the error value is set firstly. The mean-shift algorithm is a process of loop calculation; make

Step 1. Calculate .

Step 2. Substitute the value of into .

Step 3. If , as for the location of pixel point : location identifier , the cycle will terminate. At that time, for any pixel point , we havein which is region for segmentation at this time, is a pixel coordinate of in image. Otherwise, the pixel point will be taken as a new starting point of space, and then the first step will restart.

We can get the expected foreground as long as the which satisfied the requirement are combined together. That is,in which , is foreground area. The result of foreground obtained by mean-shift algorithm is shown in Figure 6(c). The foreground is a region near the camera, and the obstacles on the front will be segmented into the foreground overall. The purpose of calculating gradient of each pixel point in the foreground is to highlight the edge information of the obstacle according to the characteristic of gradient. In the regions of continuous depth value, the gradient value will be small, and the change will be continuous. But in the edge of the object, the gradient value will change dramatically, and the edge information is more obvious. In the depth image , when a pixel point is and the depth value of it is represented by , the convolution gradient operators of the point in direction and direction are as follows [20]:So, the gradient value of pixel point is

The new gradient image is obtained after gradient calculation of foreground, which is shown in Figure 6(d). After that, we use edge detection algorithm to detect the gradient image in Figure 6(d). The Canny edge detectors are easier to improve the performance of edge detection algorithms. Similarly, the edge detection of infrared image can be carried out by this way. The value of most areas in the image is uniform. The results of detection indicate that it can completely avoid the effects of light, shadow, and other factors. After obstacles detection in the two images, morphology operations are performed. In detail, dilation operations are carried out, and then erosion operations, finally, merging adjacent points and removing isolated points. represents a foreground that has achieved gradient calculation and edge detection. represents a structural element for dilation. represents a structural element for erosion. represents a location of pixel point. Expanding by , we havein which is a domain of . For corrosion by , we havein which is a domain of .

Then, we label the results of morphological operations and combine the detection result of two kinds of images by intersection calculation. In this way, the wrong results of detection can be removed. So far, the whole process of obstacle detection has been achieved. The value of the center point of the obstacle region is regarded as the depth value of the obstacle. and , respectively, represent pixel value of depth image and infrared image after labeling at the location of . represents pixel value of these points after mixture:

When the obstacle is detected, the value of and is 1; if not, it is 0. If the values of and are 1 at the same time, the value of is 1. Finally, the identified obstacles are marked by the blue box.

4. Experiments and Result

4.1. Single Image Identification

The results of obstacle detection via a single image on the grass by Kinect sensor are shown in Figure 7. In the picture, the size of carton is 25 cm × 45 cm, and the distance from the sensor is 1 m. The size of the paper baskets is 16 cm × 25 cm, and the distance from the sensor is 1.5 m. The angle of inclination of Kinect sensor is . Since the yellow flowers are far away from the sensor and have over 8 m in color image, their information cannot be collected in depth image and infrared image. We can clearly conclude from the pictures that any results from single image cannot accurately identify the obstacles. In color image, due to the abundant color information and color change, the edge information is really distinct. As shown in Figure 7(b), the edge information of foreground and background both can be detected, but only obstacles in foreground are demanded. So the color image is not suitable for our research. Besides, the obtainment of color map is also easily affected by the illumination condition. When the depth map is used alone, the area edge and obstacle will be easily identified as one object if the obstacle is close to the edge of segmented foreground, which is shown in Figure 7(f). Meanwhile, the edge information is not detected in front regions without obstacles and the error information is less in rear regions, which is shown in Figure 7(e). However, when the infrared image is used alone, the edge information of front obstacles can be detected effectively. But the edge information of farther grass can also be detected and the information of the rear object is ignored, which is shown in Figure 7(h). In a word, if only single image is obtained for detection, it will produce many errors. Accurate obstacles identification cannot be achieved and neither does the robustness.

4.2. Experiments and Results Analysis

However, the method of this paper is to fuse image (e) and image (h) in Figure 7 reasonably. It can not only identify the position of obstacles accurately but also avoid the effect of illumination condition effectively. The experimental result of the obstacle detection using fusion of depth and infrared information is shown in Figure 8. The experimental parameters of edge detection and morphology operation are shown in Table 1, in which and , respectively, represent the value of Canny operator in infrared image and depth image. More edge information is detected when the value is small, but when the value is too small, the rough road is also considered as an obstacle. When the value is too large, the edge information of obstacles cannot be detected, so the value should be selected properly. According to our tests, the best value is 3. , , and , respectively, represent expansion radius in three kinds of images. When the value is large, the expansion of the edge information is more obvious. But when the value is too large, it will make the area of the obstacle detection larger than the actual area. On the contrary, if it is too small, one obstacle will be recognized as multiple obstacles. The values in Table 1 will be selected after several tests. , , and , respectively, represent expansion neighborhood in three kinds of images. Their values have little effect on the experimental results so they are 4 in this experiment. The experimental results show that the parameters used above can identify the obstacles accurately.


Canny operator
Inflation factors of infrared image
Inflation factors of depth image
Inflation factors of fusion image

After foreground segmentation, gradient calculation, edge detection, and morphology operation in depth image, the general outline of obstacle will be conducted, but sometimes the contact surface between the ground and the obstacle cannot be segmented. The foreground edge segmented by mean-shift algorithm will coincide with the edge of obstacles. Besides, there is some other messy identification information. In infrared image, the edge information detected from front objects is abundant. After morphology operation, most areas are identified as obstacles. However, after combining Figures 8(c) and 8(e) by intersecting, we can get Figure 8(f). From Figure 8(f), the location of obstacles can be easily identified and marked. The result is shown in Figure 8(g). After experimental verification, the accuracy rate of the method reached 98%, over detection rate is 1.1%, missing rate is 0.6% according to 50 groups images each with 1–3 objects as obstacles. The accuracy of the obstacles identification will not be affected by the illumination. The method of obstacle detection with fusion of depth and infrared information is more superior to the method with single image information. In addition, it has a strong practicality.

The results of obstacle detection are shown in Table 2. It can be concluded that the distance between the obstacle and the camera is the main factor to the results. Take the can as an example: when the actual distance is 1 m, the error between the measured distance and actual distance is 0.011 m. When the distance reaches 3 m, the error reaches 0.029 m. Because the surface of obstacle is not smooth and sometimes the obstacle is a cluster of objects, the depth value is not constant. The limitation of the camera also contributes to that result. However, such accuracy is relatively high for outdoor robots, smart cars, and so forth. We also find that the smaller obstacle will produce higher measurement accuracy in the scene of road. The reason is that the edge information of small obstacle performs better during the fusion of depth image and infrared image. When the obstacle is large, it will affect the edge information of image and the accuracy of measurement, due to the variant colors in different parts. According to the data from the table, the proposed method can detect the obstacle accurately, and it can be applied to practical problems and holds a significant position for its research value.

The type of sceneObstacleActual distance/mActual size
Wide × high/cm
Actual geometric center
Measured distance/mMeasured size
Wide × high/cm
Measured geometric center


Tree and flowers32.965
Carton and paper basket11.016

5. Conclusions

A simple obstacle detection method based on fusion of depth and infrared image is proposed in this paper. The feature of scene and the information of image are analyzed. According to the character of depth and infrared image in traditional edge detection, a field obstacle detection algorithm based on visual information fusion is proposed, which is used for detecting the obstacle in the near area of the sensor (the foreground mentioned before). It overcomes the defects that the color image is easily influenced by the factors such as light and shadow, and it also overcomes the limitation of the single vision algorithm. Finally, the algorithm effectively removes the error detection area caused by using single vision algorithm. The experiments show that the method can identify the obstacle area accurately and it is not easy to be affected by the factors of light condition and intensity. Furthermore, better robustness is achieved. Hence, it can provide reliable data for intelligent robot.

Competing Interests

The authors declare that they have no competing interests.


The authors acknowledge the National Natural Science Foundation of China (no. 51605039), the Natural Science Basic Research Plan in Shaanxi Province of China (no. 2016JQ6066), the China Postdoctoral Science Foundation (no. 2016M592728), and Fundamental Research Funds for the Central Universities (no. 310825151041).


  1. G. B. Vitor, D. A. Lima, A. C. Victorino, and J. V. Ferreira, “A 2D/3D vision based approach applied to road detection in urban environments,” in Proceedings of the IEEE Intelligent Vehicles Symposium (IEEE IV '13), pp. 952–957, Glod Coast City, Australia, June 2013. View at: Publisher Site | Google Scholar
  2. A. Benzerrouk, L. Adouane, and P. Martinet, “Obstacle avoidance controller generating attainable set-points for the navigation of multi-robot system,” in Proceedings of the IEEE Intelligent Vehicles Symposium (IV '13), pp. 487–492, Glod Coast City, Australia, June 2013. View at: Publisher Site | Google Scholar
  3. A. Discant, A. Rogozan, C. Rusu, and A. Bensrhair, “Sensors for obstacle detection—a survey,” in Proceedings of the 30th International Spring Seminar on Electronics Technology, pp. 100–105, Cluj-Napoca, Romania, May 2007. View at: Publisher Site | Google Scholar
  4. S. Wang and Z.-Y. Xiang, “Detecting obstacles in vegetation by multi-spectral fusion,” Journal of Zhejiang University, vol. 49, no. 11, pp. 2223–2229, 2015. View at: Publisher Site | Google Scholar
  5. S. Thrun, M. Montemerlo, H. Dahlkamp et al., “Stanley: the robot that won the DARPA Grand Challenge,” Journal of Field Robotics, vol. 23, no. 9, pp. 661–692, 2006. View at: Publisher Site | Google Scholar
  6. J. F. Figueroa and J. S. Lamancusa, “A method for accurate detection of time of arrival: analysis and design of an ultrasonic ranging system,” Journal of the Acoustical Society of America, vol. 91, no. 1, pp. 486–494, 1992. View at: Publisher Site | Google Scholar
  7. H. Hua, Y. Wang, and D. Yan, “A low-cost dynamic range-finding device based on amplitude-modulated continuous ultrasonic wave,” IEEE Transactions on Instrumentation and Measurement, vol. 51, no. 2, pp. 362–367, 2002. View at: Publisher Site | Google Scholar
  8. K. Oki and K. Omasa, “A technique for mapping thermal infrared radiation variation within land cover,” IEEE Transactions on Geoscience and Remote Sensing, vol. 41, no. 6, pp. 1521–1524, 2003. View at: Publisher Site | Google Scholar
  9. X.-M. Xiao, H.-M. Hu, Z.-X. Cai, and L. Wang, “Fast obstacle diction using adaptive segmentation and stereo vision,” Application Research of Computers, vol. 24, no. 9, pp. 182–184, 2007. View at: Google Scholar
  10. S. He, Z. Liu, and J. Shi, “Obstacle diction of indoor robots based on monocular vision,” Journal of Computer Applications, vol. 32, no. 9, pp. 2556–2559, 2012. View at: Google Scholar
  11. H. Zhang and J. Zhu, “Research strategy based on binocular vision obstacle avoidance algorithm,” Hangzhou University of Electronic Science and Technology, vol. 33, no. 4, pp. 31–34, 2013. View at: Google Scholar
  12. J. Choi, D. Kim, H. Yoo, and K. Sohn, “Rear obstacle detection system based on depth from kinect,” in Proceedings of the 15th International IEEE Conference on Intelligent Transportation Systems (ITSC '12), pp. 98–101, Anchorage, Alaska, USA, September 2012. View at: Publisher Site | Google Scholar
  13. Microsoft Kinect for Windows [OL],
  14. Z. Zhang, “A flexible new technique for camera calibration,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 22, no. 11, pp. 1330–1334, 2000. View at: Publisher Site | Google Scholar
  15. D. Chi, Y. Wang, L. Ning, J. Yi, and L. S. University, “Experimental research of camera 13 calibration based on ZHANG's method,” Journal of Chinese Agricultural Mechanization, vol. 36, no. 2, pp. 287–289, 2015. View at: Google Scholar
  16. K. Fukunaga and L. D. Hostetler, “The estimation of the gradient of a density function, with applications in pattern recognition,” IEEE Transactions on Information Theory, vol. 21, no. 1, pp. 32–40, 1975. View at: Google Scholar | MathSciNet
  17. Y. Cheng, “Mean shift, mode seeking, and clustering,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 17, no. 8, pp. 790–799, 1995. View at: Publisher Site | Google Scholar
  18. D. Comaniciu and P. Meer, “Mean shift: a robust approach toward feature space analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, no. 5, pp. 603–619, 2002. View at: Publisher Site | Google Scholar
  19. D. Comaniciu, V. Ramesh, and P. Meer, “Real-time tracking of non-rigid objects using mean shift,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '00), vol. 2, pp. 142–149, June 2000. View at: Google Scholar
  20. N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR '05), vol. 1, pp. 886–893, 2005. View at: Google Scholar

Copyright © 2016 Yaguang Zhu et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

More related articles

 PDF Download Citation Citation
 Download other formatsMore
 Order printed copiesOrder

Related articles

We are committed to sharing findings related to COVID-19 as quickly as possible. We will be providing unlimited waivers of publication charges for accepted research articles as well as case reports and case series related to COVID-19. Review articles are excluded from this waiver policy. Sign up here as a reviewer to help fast-track new submissions.