Any mobility aid for the visually impaired people should be able to accurately detect and warn about nearly obstacles. In this paper, we present a method for support system to detect obstacle in indoor environment based on Kinect sensor and 3D-image processing. Color-Depth data of the scene in front of the user is collected using the Kinect with the support of the standard framework for 3D sensing OpenNI and processed by PCL library to extract accurate 3D information of the obstacles. The experiments have been performed with the dataset in multiple indoor scenarios and in different lighting conditions. Results showed that our system is able to accurately detect the four types of obstacle: walls, doors, stairs, and a residual class that covers loose obstacles on the floor. Precisely, walls and loose obstacles on the floor are detected in practically all cases, whereas doors are detected in 90.69% out of 43 positive image samples. For the step detection, we have correctly detected the upstairs in 97.33% out of 75 positive images while the correct rate of downstairs detection is lower with 89.47% from 38 positive images. Our method further allows the computation of the distance between the user and the obstacles.

1. Introduction

In 2014, the World Health Organization estimated that 285 million people were visually impaired in the world: 39 million are blind and 246 million have low vision [1]. Furthermore, about 90% of the world’s visually impaired live in low-income settings and 82% of people living with blindness are aged 50 and above. Generally, these individuals are facing important difficulties with independent mobility that relates to sensing the near-field environment, including obstacles and potential paths in the vicinity, for the purpose of moving through it [2]. The recent advances of computer science now allow the development of innovative solutions to assist visually impaired people. Various types of assistive devices have been developed to provide blind users means of learning or getting to know the environment. A recent literature review of existing electronic aids for visually impaired individuals has identified more than 140 products, systems, and assistive devices while providing details on 21 commercially available systems [3]. A large number of these systems are based on the Global Position System (GPS) that unfortunately prevent them to be effectively and efficiently employed in indoor environment. Indeed, these systems are not able to provide local information on the obstacles that are encountered due to the inaccurate nature and the susceptibility to loss of the GPS signal. Other types of mobility and navigational aids are based on the sonar to provide information about the surroundings by means of auditory cues [46]. They use short pulses of ultrasound to detect objects but there are some disadvantages with this. Different surfaces differ in how well they reflect ultrasound and ultrasonic aids are subject to interference from sources of ultrasound. Finally, another type of assistive devices for blind and visually impaired people has been developed based on the stereo vision technique like [7].

With the advances of computer vision algorithms, intelligent vision systems have received a growing interest. The computer vision-based assistive technology for the visually impaired people has been studied and developed extensively. These systems can improve the mobility of a person who has an impaired vision by reducing risks and avoiding dangers. As imaging techniques advance, such as RGB-D cameras of Microsoft Kinect [8] and ASUS Xtion Pro Live [9], it has become practical to capture RGB sequences as well as depth maps in real time. Depth maps are able to provide additional information of object shape and distance compared to traditional RGB cameras. Some existing systems use RGB-D camera and translate visual images into corresponding sounds through stereo headphones [10, 11]. However, these systems can distract the blind user’s hearing sense that could limit their efficient use in daily life.

In this paper, we present a Microsoft Kinect-based method specifically dedicated to the detection of obstacles in indoor environment based on 3D image processing with color-depth information (Figure 5). Precisely, our system was designed to obtain reliable and accurate data from the surrounding environment and to detect and warn about near obstacles such as walls, doors, stairs, and undefined obstacles on the floor with the ultimate goal in order to ultimately assist visually impaired people in their mobility. This paper, indeed, is a part of our long-term research on low vision assistance devices. Inside, the main objective is to design and evaluate a complete prototype of an assistive device which can help the visually impaired in their mobility (Figure 2). To achieve this goal, we rely on various themes explored in literature, including obstacle detection using computer vision, embedded system design, and sensory substitution technology. The main novelty of our work is unifying into a single prototype and this paper is result of image processing module.

In the last decades, obstacle detection has received a great interest. Interestingly, the majority of the existing systems have been developed for mobile robots [12, 13]. In this section, we will only focus on the works related to assistive technology to help visually impaired people. Wearable systems have been developed based on various technologies such as laser, sonar, or stereo camera vision for environment sensing and using audio or tactile stimuli for user feedback. For instance, Benjamin et al. [14] have developed a laser cane for the blind called C-5 Laser Cane. This device is based on optical triangulation to detect obstacles up to a range of 3.5 m ahead. It requires environment scanning and provides information on one nearest obstacle at a time by means of acoustic feedback. Molton et al. [15] have used a stereo-based system for the detection of the ground plane and the obstacles.

With the RGB-D sensor-based computer vision technologies, the scientists are finding incredible uses for these devices that have already led to advances in the medical field. For instance, Costa et al. [16] used the low-cost RGB-D sensors to reconstruct human body. Other systems based on the RGB-D devices (e.g., Microsoft Kinect or ASUS Xtion Pro) are able to detect and recognize human activities [17, 18]. Other researchers have developed the methods for detecting falls in the homes of older adults using the Microsoft Kinect. For instance, Mundher and Jiaofei [19] have developed a real-time fall detection system using mobile robot and Kinect sensor. The Kinect sensor is used to introduce a mobile robot system to follow a person and detect when the target person has fallen. This system can also send an SMS message notification and make an emergency call when a fall is detected. Stone and Skubic [20] have also presented a method for detecting falls in the homes of older adults using an environmentally mounted depth-imaging sensor. RGB-D sensor based assistive technology can improve the mobility of blind and visually impaired people to travel independently. Numerous electronic mobility or navigation assistant devices have been developed based on converting RGB-D information into an audible signal or into tactile stimuli for the visually impaired persons. For instance, Khan et al. [21] have developed a real-time human and obstacle detection system for a blind or visually impaired user using a Xtion Pro Live RGB-D sensor. The prototype system includes a Xtion Pro live sensor, a laptop for processing and transducing the data, and a set of headphones for providing feedback to the user. Tang et al. [22] presented an RGB-D sensor based computer vision device to improve the performance of visual prostheses. First, a patch-based method is employed to generate a dense depth map with region-based representations. The patch-based method generates both a surface-based RGB and depth (RGB-D) segmentation instead of just 3D point clouds. Therefore, it carries more meaningful information and it is easier to convey the information to the visually impaired person. Then, they applied a smart sampling method to transduce the important/highlighted information and/or remove background information, before presenting it to visually impaired people. Lee and Medioni [23] have conceived a wearable navigation aid for the visually impaired, which includes an RGB-D camera and a tactile vest interface device. Park and Howard [24] presented a real-time haptic telepresence robotic system for the visually impaired to reach specific objects using an RGB-D sensor. In addition, Tamjidi et al. [25] developed a smart cane with SR4000 3D camera for camera’s pose estimation and obstacle detection in an indoor environment. More recently, Yus et al. [26] have proposed a new stair detection and modelling model that provides information about the location, orientation, and the number of steps of the staircase. Aladren et al. [27] have also developed a robust system for visually impaired people based on visual and range information. This system is able to detect and classify the main structural elements of the scene.

3. The Proposed System

3.1. Overview of the Proposed System

Figure 1 illustrates the overall structure of our proposed system. The proposed system uses a personal computer (PC) for processing color-depth images captured from a RGB-D camera. An obstacle detection method aims to define the presence of obstacles and to warn the visually impaired users by using the feedback devices such as auditory, tactile, and vibration.

In this paper, we focus on obstacle detection using information coming from RGB-D cameras. To warn the visually impaired users, we use a sensory substitution device called Tongue Display Unit (TDU) [32]. In [33], we have presented a complete system for obstacle detection and warning for visually impaired people based on electrode matrix and mobile Kinect. However, this system detects only loose obstacle on the floor. We extend this work by proposing new approach for detecting different types of obstacles for visually impaired people.

3.2. User Requirements Analysis

In order to define the obstacles, we have done a survey with ten blind students in Nguyen Dinh Chieu school in Vietnam. The results of the preliminary study indicated that there are many obstacles in an indoor environment such as moving objects, walls, doors, stairs, pillar, rush bins, and flower pots that blind students have to avoid. In this study, we have defined four frequent types of obstacle that the blind students face in typical indoor environments of their school: (1) doors, (2) stairs, (3) walls, and (4) a residual class that covers loose obstacles (see some examples in Figure 3).

3.3. Obstacles Detection

In this section, we provide the obstacle detection process as illustrated in Figure 4. The process is divided into five consecutive steps of which the first one is dedicated to the acquisition data. In this step, color-depth and accelerometer data of the scene in front of the user are acquired using a RGB-D camera. These data are then used to reconstruct point cloud in the second step. The third step subsequently filters the obtained point cloud which is fed to the segmentation step. The main goal of the segmentation step is to identity the floor plane. Finally, in the obstacles detection step, we can identify the types of obstacle based on their characteristics.

Step 1 (acquisition data). Various types of RGB-D camera such as Microsoft Kinect and ASUS Xtion Pro can be used in our work. However, in this work, we use Kinect sensor (Figure 5). Kinect is a low-cost 3D camera that is able to work with various hardware models. It is also supported by various framework and drivers. It should be noted, however, that the fundamental part of our system does not need to be changed if we want to use another type of RGB-D cameras in the future.
RGB-D camera captures both RGB images and depth maps at a resolution of 640 × 480 pixels with 30 frames per second. The effective depth range of the Kinect RGB-D camera is from 0.4 to 3.5 m. The Kinect color stream supports a speed of 30 frames per second (fps) at a resolution of 640 × 480 pixels [34]. Figure 6 shows an illustration of the viewable range of the Kinect camera.
In this step, in order to capture color and depth information of the scene, we use the standard framework for 3D sensing OpenNI.

Step 2 (reconstruction). Color and depth are combined to create a Point Cloud; it is a data structure used to represent a collection of multidimensional points and is commonly used to represent three-dimensional data. In a 3D Point Cloud, the points usually represent the , , and geometric coordinates of an underlying sampled surface. We use the parameters provided by Burrus [35] in order to calibrate color and depth data. Once the Point Cloud is created, it is defined in the reference system of the Kinect, indicated by in Figure 7.
This represents a disadvantage because we have to determine the location of obstacles in the reference system centered at the user’s feet, indicated by . Therefore, we apply the transformations geometry including translation, rotation, and reflection to bring Point Cloud from the reference system of the Kinect to . The orientation information computed from accelerometer data and Point Cloud data has been used to perform this task.

Step 3 (filtering). The execution time of the program depends on the number of points in Point Cloud. We thus need to reduce the number of points to ensure that the system can be able to respond fast enough. We will use Voxel Grid Filter to downsample the Point Cloud and, then, Pass Though Filter will remove all points that are located at a position larger than 75 cm in the -axis (see Figure 8).

Step 4 (segmentation). The next step is Plane Segmentation. Random Sample Consensus (RANSAC) algorithm [36] is used for plane detection in Point Cloud data. RANSAC is an iterative method to estimate the parameters of a model using data that contains outliers. In the present work, RANSAC can choose a set of points which satisfy the equation of the plane: , combined with parallel condition between the floor plane and the -plane (see Figure 7). An example of floor image is illustrated in Figure 9 and the detected floor plane is shown in Figure 10.

Step 5 (obstacles detection). Consider the following:
(a) Obstacles on the Floor Detection. After performing the floor detection, Euclidean Cluster Extraction is used to determine the clusters on the floor plane. Each cluster is a set of the points in Point Cloud. In a cluster, the distance of each point to the other is smaller than a threshold and it presents for each obstacle [37]. An example of loose obstacle detection is illustrated in Figure 11.
In addition, some classes in PCL library will help us provide the obstacle’s size. Each obstacle can be approximated with a different structure. In our study, each obstacle is approximated by a rectangular parallelepiped (see Figure 12).

(b) Door Detection. The detection of door is based on certain conditions on door width as described in [37]. The algorithm can be summarized as follows.

Algorithm 1 (door detection). Consider the following:(1)Input is Point Cloud .(2)Estimate the local surface normal of the points belonging to a Point Cloud .(3)Segment the set of planes of potential doors using both the surface normal and the color information.(4)Use RANSAC to estimate the parameters of each plane as in the following equation:Herein, is the normal of plane .(5)Determine the angle between and the normal of the ground plane. This angle should approximate 90 degrees since doors are perpendicular to the floor.(6)Determine the dimensions of each plane.(7)Check for each plane if its width satisfies the conditions. Remove if this is not the case.

(c) Staircase Detection. A staircase consists of at least 3 equally spaced steps as Figure 13.

The authors of Monash University [38] have developed an algorithm to detect the steps of a staircase with the high performance. This algorithm is able to provide the end user with the information as the presence of a staircase (both the upstairs and the downstairs) and the number of steps of a staircase in the field of view. The staircase detection algorithm can be summarized as follows.

Algorithm 2. The staircase detection algorithm is as follows: (1)Input is Point Cloud .(2)Estimate the local surface normals of the points belonging to a Point Cloud .(3)Filter based on where is parallel to the ground plane.(4)Initialize . Determine , the set of points that are located at height . Herein, where is an estimate of the height of a step and is the tolerance measure.(5)Estimate a plane using the set of points in a RANSAC based manner.(6)Determine the number of inliers of . If no step is found and the procedure quits.(7)In the other case, increment by 1 and restart the procedure from step (4).The value of can be changed. In our program, we can choose to make sure not to miss a step.

(d) Wall Detection. With the support of the RANSAC algorithm, we can determine the planes in the scene and calculate the local surface normal of each point in the Point Cloud. The wall was detected based on the perpendicular characteristic with the floor plane (see Figure 17). In other words, the wall consists of points with a normal that forms an angle of approximately 90 degrees with the -axis (see Figure 7) and the number of points within a fixed range.

4. Results and Discussion

We have developed the program in C++ using Visual Studio 2010. The implementation extensively makes use of the Point Cloud Library (PCL), an open source for 2D/3D image and Point Cloud processing. PCL supports natively the OpenNI 3D interfaces and can thus acquire and process data from the Microsoft Kinect sensor. Outside the PCL, we also work with some external libraries including Boost, Eigen, OpenCV, FLANN, QHull, and VTK. Our image processing program was tested on a computer, 2.4 GHz. The experiments have been performed in multiple indoor environments with different light conditions.

In the obstacle detection process, the floor detection is very important and the performance of our system is based on floor detection. Indeed, in the case that we cannot detect the floor accurately, the system will not be able to detect the obstacles. The program has been tested with a dataset in multiple indoor scenarios. The results showed that the ground plane was detected in most cases in the indoor environment. Figure 14 shows some results of ground plane detection. However, there still exist some situations in which it could fail. For example, the light condition is too strong for Kinect camera.

In another case, when an obstacle consists of a large horizontal plane, in this situation, the horizontal plane of the obstacle could be wrongly identified as the ground plane (see Figure 15).

With regard to the detection of the wall and the loose obstacle on the floor plane (see Figure 16), the result showed that the wall and the loose obstacle are detected in practically all cases. After measuring the real distance between the user and the obstacles, we further have compared this value with the results obtained by the obstacle detection program and the result showed that the error was negligible (<2%).

In order to evaluate the performance of the door algorithm, we use the standard measures widely used for classification or identification evaluation, namely, Precision [39]. This is defined as follows:where TP and FP denote the number of true positives and false positives, respectively. The doors are detected in 90,69% out of 43 positive images (Figures 18 and 19). The results are summarized in Table 1. We further can compare our results with the performance of some other systems (see Table 2).

With our data, we see that the program also operates well when the camera is approaching the door at an angle of approximately 45 degrees. Figures 20 and 21 show some examples in this case.

In another experiment, we have tested our approach on the dataset with 75 images of upstairs. We also used the standard measures proposed by [39] for evaluating the upstairs algorithm. The results of this experiment are presented in Table 3. Some examples of upstairs images are shown in Figure 22 and the detected downstairs are shown in Figure 23.

The program was also capable of detecting the downstairs but with the lower performance. The downstairs are detected in 89.47% out of 38 positive image samples. Table 4 shows the performance measures of the downstairs detection using our approach. Representative downstairs images and the detected downstairs are shown in Figures 24 and 25. In addition, there are some cases that the program cannot detect downstairs; see Figures 26 and 27.

The results show that our approach can be used for the low-light environments. This feature can overcome the limitations of the monocular or stereo vision technique [7, 8, 16].

The execution time for intermediate processing steps is negligible (about 0.04 s for the floor segmentation time and 0.009 s for the normal estimation time). The detection time per image (see Tables 2 and 3) includes all the steps such as reconstruction, filtering, and segmentation. The total execution time of the algorithms is fast enough to develop the real-time applications for visually impaired people; the experiments performed in different indoor environments have produced positive results in terms of accuracy and time response.

5. Conclusions

In this paper, we have presented a Microsoft Kinect-based method specifically dedicated to the detection of obstacles in indoor environment based on 3D image processing with color-depth data. The proposed solution was designed to provide information about the space in front of the user in real time by using color-depth data acquired by Microsoft Kinect sensor. Our main contribution is the obstacle detection module, which combines different algorithms for obstacle detection such as walls, doors, stairs, and the bumpy parts on the floor.

Our proposed system shows good results with our dataset. However, a number of caveats and limitations still have to be taken into account. Firstly, the system is not reliable in all kinds of scenarios or data. For example, when the light condition is too strong, the Kinect cannot capture the depth information. In the future, we plan to combine the color and depth information in order to build a reliable approach for obstacle detection. Secondly, our obstacle detection replies on the floor ground detection. In the case that floor cannot be detected, our system will fail. For this, we have to use different cues for detecting the obstacle candidates. Thirdly, the performance of the downstairs detection program is still low, and we need a new approach for downstairs detection from far distance. Finally, our approach performs obstacle detection in every frame; therefore it does not take into account the results of the previous frame. In the future, we can reduce program execution time by combining four algorithms and apply probabilistic models for estimating the presence of obstacles in a given frame based on the result of previous ones.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.


This work was supported in part by funding by the French national program “Programme d’Investissements d’Avenir IRT Nanoelec” ANR-10-AIRT-05, Institut Universitaire de France, and the Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant no. FWO.102.2013.08. The authors would like to thank the members of AGIM laboratory and International Research Institute MICA for helpful comments and suggestions.