Abstract

Computers and computerized machines have tremendously penetrated all aspects of our lives. This raises the importance of Human-Computer Interface (HCI). The common HCI techniques still rely on simple devices such as keyboard, mice, and joysticks, which are not enough to convoy the latest technology. Hand gesture has become one of the most important attractive alternatives to existing traditional HCI techniques. This paper proposes a new hand gesture detection system for Human-Computer Interaction using real-time video streaming. This is achieved by removing the background using average background algorithm and the 1$ algorithm for hand’s template matching. Then every hand gesture is translated to commands that can be used to control robot movements. The simulation results show that the proposed algorithm can achieve high detection rate and small recognition time under different light changes, scales, rotation, and background.

1. Introduction

Computers and computerized machines have tremendously penetrated all aspects of our lives. This raises the importance of Human-Computer Interface (HCI). The common HCI techniques still rely on simple devices such as keyboard, mice, and joysticks, which are not enough to convoy the latest technology [1]. Hand gesture has become one of the most important attractive alternatives to existing traditional HCI techniques [2]. Gestures are the physical movements of fingers, hands, arms, or body that carry special meaning that can be translated to interaction with the environment. There are many devices that can sense body positions, hand gestures, voice recognizers, facial expressions recognizer, and many other aspects of human actions that can be used as powerful HCI. Gesture recognition has many applications such as communication tool between hearing impaired and virtual reality applications and medical applications.

Hand gesture recognition techniques can be divided into two main categories: appearance based approaches and three-dimensional hand model based approaches [3, 4]. Appearance based approaches depend on features extracted from the model image to model the hand appearance. After that, all input frames from video streaming are compared with the extracted features to detect the correct gesture [5, 6]. Three-dimensional hand model based approaches convert the 3D image to 2D image by projection. Then, the hand features are estimated by comparing the estimated 2D images with the input images to detect the current 2D hand gesture [7, 8]. Generally appearance based approaches performance is better than 3D hand models performance in real-time detection but three-dimensional hand model based approaches offer a rich description that potentially allows a wide class of hand gestures [9].

Both hand shape and skin colour are very important features that are commonly used in hand gestures detection and tracking. Therefore, the accuracy can be increased and at the same time processing time can be decreased as the area of interest in the image is reduced for the hand gesture only. In [10], real-time skin colour model is developed to extract the region of interest (ROI) of the hand gesture. The algorithm is based on Haar-wavelet representation. The algorithm recognizes hand gestures based on database that contains all template gestures. During the recognition process, a measurement metric has been used to measure the similarity between the features of a test image and those in the database. The simulation results show improvement in detection rate, recognition time, and database size. Another real-time hand gesture recognition algorithm is introduced in [11]. The algorithm is a hybrid of hand segmentation, hand tracking, and multiscale feature extraction. The process of hand segmentation is implemented by taking the advantage of motion and colour indications during tracking. After that, multiscale feature extraction process is executed to find palm-finger. The palm-finger decomposition is used for gestures recognition. Experimental results show that the proposed algorithm has good detection rate for hand gestures with different aspect ratios and complicated backgrounds.

In [12], an automatic hand gesture detection and recognition algorithm is proposed. The detection process is based on Viola-Jones method [13]. Then, the feature vectors of Hu invariant moments [14] are extracted from the detected hand gesture. Finally, the extracted feature is used with the support vector machine (SVM) algorithm for hand gesture classification. Haar-like features and AdaBoost classifier are used in [15] to recognize hand gestures. Hand gesture recognition algorithm is developed in [16]. In this algorithm, hand gesture features are extracted based on the normalized moment of inertia and Hu invariant moments of gestures. As in [12], SVM algorithm is used as classifier for the hand gesture. A neural network model is used in [17] to recognize a hand posture in a given frame. A space discretization based on face location and body anthropometry is implemented to segment hand postures. The authors in [18] have proposed a new system for detecting and tracking bare hand in cluttered background. They use multiclass support vector machine (SVM) for classification and -means for clustering.

All of the aforementioned techniques are based on skin colour and are facing problem of extracting region of interest from the entire frame. The reason is that some objects such as human arm and face have colour similar to the hand. To solve this problem, modified 1$ algorithm [19] is used in this paper to extract hand gestures with high accuracy. Most of the shape descriptors are pixel based and the computational complexity is too high for real-time performance. While the 1$ algorithm is very straight forward with low computational complexity algorithm, it can be used in real-time detection. The training time is too small compared to one of the well-known techniques called Viola-Jones method [13].

Background subtraction has direct effect on the accuracy and computational complexity of hand gestures extraction algorithm [20, 21]. There are many challenges facing the design of a background subtraction algorithm such as light changes, shadows, overlapping of the objects in the visual area, and noise from camera movement [22]. The robust subtraction algorithm is one that considers all of these challenges with high accuracy and reasonable time complexity. Therefore, many research efforts have been proposed over the years to fill the need for robust background subtraction algorithms. The background subtraction algorithms can be classified into three groups: Mixture of Gaussian (MoG) [23, 24], Kernel Density Estimation (KDE) [2528], and Codebook (CB) [29, 30].

This paper proposes a system for hand gesture detection by using the average background algorithm [31] for background subtraction and the 1$ algorithm [19] for hand’s template matching. Five hand gestures are detected and translated into commands that can be used to control robot movements. The first contribution in the paper is the use of 1$ algorithm in hand gesture detection. To the best knowledge of the authors of this paper, this is the first time the 1$ algorithm was used in hand gesture detection. It is used in the literature in hand writing detection. The second contribution is the use of average background algorithm for background subtraction in hybrid with the 1$ algorithm. Therefore, the simulation results of the proposed system show that the hand gesture detection rate is 98% as well as the computational time complexity is also improved.

The rest of the paper is organized as follows. Section 2 discusses the proposed system components. The details of the background subtraction algorithm used in this paper are given in Section 3. Section 4 explains the contour extraction algorithm. The modified template matching algorithm is discussed in Section 5. Section 6 provides simulation results and performance of the proposed system. Section 7 gives the conclusions of the proposed system.

2. System Overview

Figure 1 shows the block diagram of the proposed system. This system consists of two main stages. The first stage is the background subtraction algorithm that is used to remove all static objects that reside in the background, and then extracting the region of interest that contains the hand gesture. The second stage is used for comparing the current hand gesture with the trained data using the 1$ algorithm and translating the detected gesture to a command. This command can be used to control robot movements in the virtual world through the Webots simulator [32].

3. Background Subtraction Algorithm

In this paper, the average background algorithm [31] is used to remove the background of the input frame. The moving objects are isolated from the whole image while the static objects are considered as part from the background of the frame. This technique is implemented by creating a model for the background and updating it continuously to take into account light changes, shadows, overlapping of the objects in the visual area, or the new added static objects. This model is considered as a reference frame that will be subtracted from the current frame to detect the moving objects. There are many algorithms [2, 33, 34] that can be used for background subtraction. This algorithm must have the ability to support multilevel illumination, detect all moving objects at different speeds, and consider any resident object as part of the background as soon as possible. Steps of background subtraction algorithms can be divided into four main stages [35], which are preprocessing, background modelling, foreground detection (referred to as background subtraction), and data validation (also known as postprocessing). The second stage is the modelling background stage which is also known as background maintenance. It is the main stage of any background subtraction algorithm.

In [36] the background modelling algorithms are classified into two main types: recursive and nonrecursive models. Nonrecursive models use buffer to store all previous frames and estimate the background from the temporal variation of each pixel in the buffer. Recursive algorithms are not depending on the previous frames such as frame differencing, median filter, and linear predictive filter. Recursive models require less storage than nonrecursive models. However, any error occurring in the recursive model can stay for longer time than nonrecursive models. On one hand, the most known nonrecursive techniques are frame differencing [37], average filter [24, 38], median filtering [39], minimum-maximum filter [40], linear predictive filter [41], and nonparametric modelling [42]. On the other hand, recursive techniques include approximated median filter [43], single Gaussian [44], Kalman filter [45], Mixture of Gaussians (MoG) [23, 24], clustering-based [46], and Hidden Markov Models (HMM) [47].

In this paper, the average filter technique [24, 38] is used for background modelling as in [48] to extract moving areas. The average filter creates an initial background from the first frames as given in (1). The moving areas are obtained by subtracting the current frame from the average background as given in (2). To get a better result a threshold filter is applied on the difference image to create a binary image as calculated in (3). The threshold is chosen to be 50, because the background colour model is brighter that human skin. The average background’s pixels are updated using (4) to remove any noise or any static objects. The parameter is used to control the speed of updating process; in other words it controls how quickly the average background is updated by the current frame. The value of varies from 0 to 1. The value 0.001 is used in this paper to decrease the learning speed. The steps of hand extraction are shown in Figures 2, 3, 4, and 5. Consider

4. Contour Extraction Algorithm

The contour is a curve that connects all points surrounding specific part of an image that has the same color or intensity. To identify hand gestures, the contours of all objects that exist in the threshold image are detected as shown in Figure 6. Then, the biggest contour, which has the biggest calculated area representing the hand’s area, is selected as shown in Figure 7. The Suzuki’s algorithm [49] is used to find the hand contour. The filtered contour points are approximated to another contour that has fewer numbers of points. The contour approximation is the process of finding key vertex points [50] and is used to speed up the calculations and then it consumes low memory size.

5. Template Matching Algorithm

The template matching algorithm [19], called 1$ algorithm, is mainly used in hand writing recognition by comparing the stored template with the user input. The 1$ algorithm supports many features such as scaling independent feature, rotation independent feature, requiring simple mathematical equations, achieving high detection rate, allowing user to train any number of gestures, and low resources consumption. All of these features make the 1$ a good choice for hand’s contour extraction.

In this paper, the template matching algorithm, 1$ algorithm, is modified to satisfy the real-time constraints to recognize hand gesture. This is achieved by comparing the saved contour templates and the biggest extracted contour from the current frame. The best matching template detected is representing the desired contour of the current frame. The modified algorithm is based on four main stages.

5.1. Rebuilding the Point Path

This feature makes the comparison operation independent of the number of points that have been saved in template contour at the training phase. Before the comparison operation, the algorithm rebuilds any stored contour template with points to another contour that is defined with equal spaced points. When the value of is too small, the precision ratio will decrease. As well as, when the value of is too big, the processing time will increase. As a result, the best value of should be . In this paper, the value of is chosen to be 80. Figure 8 shows different set of hands versus different point paths.

5.2. Rotation Based on Indicative Angle

In this stage the indicative angle (the angle between the center point of the gesture and the first point of the gesture) is calculated. After that, the gesture is rotated until this angle goes to zero. Figure 9 shows the hands with different angles.

5.3. Scale and Translate

At this step the algorithm scales all gestures to standard square. The scaling is nonuniform and is applied on all candidates and all templates . After finishing the scaling operation, each gesture will be translated to reference point. To simplify all operations, all points are translated to the origin point .

5.4. Find Optimal Angle for Best Matching

At this stage, all templates and all candidates have been rebuilt, rotated, scaled, and translated. Each candidate , for , is compared to each template using (5) to find the Euclidian distance between corresponding points. The template with least distance is chosen:

6. Experimental Result

The proposed system is implemented using C# programming language and is tested with Webots [32] simulator virtual environment with boebot robot as shown in Figure 10. The computer machine used in the experiments has the following specifications: AMD Quad-Core processor (FX 4-Core Processor Black Edition, 3.8 GHz).

In the experiments five gestures (forward, right, stop, left, and backward) are used to control robot movements as shown in Figure 11. The test video stream was grabbed from a web camera with a resolution of 320 × 240. The detection algorithm is connected to Webots robot simulator using socket programming and controls the movements of the robot in the virtual environment. The hand gestures have been recorded with different scales, rotations, and illuminations and with simple background (i.e., without any object in the background). The algorithm experiment and implementation can be found in [51].

The detection speed of the 1$ algorithm [19] reaches 0.04 minutes per gesture and the error rate increases if the background has objects with many edges or darker than skin colour as shown in Figure 12.

Table 1 shows the accuracy of the proposed hand gesture detection algorithm. As given in this table, the detection rate is 100% for the four gestures forward, backward, stop, and right. While the detection rate for left gesture is 93%, the average processing time is 455 ms on average and the average detection ratio is 98.6%.

Table 2 shows the performance of the proposed system versus some of pervious approaches in terms of number of postures, recognition time, recognition accuracy, frame resolution, number of testing images, scale, rotation, light changes, and background. The proposed algorithm outperforms all the other approaches in term of recognition accuracy which is 98.6%, whereas it has recognition time better than most the other approaches expect the system in [4]. In addition, performance of the proposed system is not affected with changes in scale, light, rotation, and background.

7. Conclusions

This paper proposes a system for hand gesture detection by using the average background algorithm for background subtraction and the 1$ algorithm for hand’s template matching. The 1$ algorithm is modified to satisfy the real-time constraints to recognize hand gesture. Five hand gestures are detected and translated into commands that can be used to control robot movements. The simulation results show that the proposed algorithm can achieve 98.6% detection rate and small recognition time under different light changes, scales, rotation, and background. Such hand gesture recognition system provides a robust solution in real-life HCI applications.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.