Abstract

Hand gesture recognition has become more and more popular in applications like intelligent sensing, robot control, smart guidance, and so on. In this paper, an inertial sensor based hand gesture recognition method is proposed. The proposed method obtains the trajectory of the hand by using a position estimator. The proposed method utilizes the attitude estimation to produce velocity and position estimation. A particle filter (PF) is employed to estimate the attitude quaternion from gyroscope, accelerometer, and magnetometer sensors. The improvement is based on the resampling method making the original filter much faster to converge. After smoothing, the trajectory is then converted to low-definition images which are further sent to a backpropagation neural network (BP-NN) based recognizer for matching. Experiments on real-world hardware are carried out to show the effectiveness and uniqueness of the proposed method. Compared with representative methods using accelerometer or vision sensors, the proposed method is proved to be fast, reliable, and accurate.

1. Introduction

With the development of mobile platforms, applications on hand gesture recognition have become more and more popular. For instance, Google Inc. released its Google Project Glass in 2012 which utilizes a very simple hand gesture recognition system. Basically, the hand gesture recognition systems can be classified into two groups: the vision-based and the inertia sensor based [1].

As has been proposed in [2, 3], the vision-based hand gesture recognition system tracks user’s hand’s feature, analyzes user’s motion, and outputs the final interpreted command. The most widely used recognition methods are artificial-intelligence-based methods [46], for example, the neural network, supported vector machine, and hidden Markov model. This kind of recognition system is relatively reliable but it costs too much time consumption on the modeling and matching. According to stereo vision theory [7], at least one camera is needed to accurately estimate the three-dimensional attitude and velocity of a certain object. However, such configuration of the system will enlarge the economic cost of the system significantly, including the core processors and the data acquisition system. Also the time complexity of data processing and computation are high, which limits the development of applications on mobile platforms.

Other recognition methods are inertial sensor based [8]. These methods usually use low cost MEMS accelerometers to detect the motion of human hand [913]. Gyroscopes can also be used to measure the attitude of the user’s hand in order to analyze the gesture more accurately [14, 15]. Combining the gyroscopes and accelerometers together the real-time velocity can be determined which is helpful for hand gesture recognition [16, 17]. However, the accuracy of the method is to be improved since by using only one sensor output the observability of the gesture may not be satisfactory.

The above methods are developed due to the advances in computer vision and integrated sensor fusion. In vision-based hand gesture recognition, the feature extraction, and learning are always important. There are many related techniques like the Scale-Invariant Feature Transform (SIFT [18]), Gradient Location and Orientation Histogram (GLOH [19]), Convolutional Neural Network (CNN [20]), and so on. Extracting motion from a picture requires the analysis of the correspondence between the current frame and last frame, for example, the RANdom Sample Consensus (RANSAC [21]). However, these techniques are time-costly. In fact, by using inertial sensors, the object’s motion can be computed with much simpler frameworks. These years the attitude and position estimation based on MEMS sensors have been extensively studied [22], which provides us with new ways of obtaining hand’s motion [23].

The main purpose of this research is to find a way that adopts inertial sensors only to determine the correct human hands gesture. According to existing papers published between 2012 and 2016, many of them use learning techniques to model the gesture via captured sensor data at one certain moment. At this point, it is not so reliable and accurate because a gesture can only be well determined if the history data is also included for analysis. To implement this, the trajectory determination is vital. Therefore, this paper first computes the trajectory of human hand by inertial integration of attitude, velocity, and position. What has to be pointed out is that the attitude estimation significantly influences the results of velocity and position; hence a PF is introduced for accurate and convergent estimates of attitude quaternion. In this paper, we mainly have the following contributions:(1)We use inertial sensors including gyroscopes and accelerometers to estimate the position of the user’s hand. The first step is to fuse these sensor outputs into the attitude quaternion using the PF. The velocity and position update is then performed using the ZUPT-aided equations.(2)After recording the trajectory of the hand, the trace is saved as a low-definition image which will be further sent into a BP-NN based gesture recognition system proposed in [24].(3)The proposed method is systematically verified by experiments showing its advantages in estimation accuracy and feasibility of implementation. It is also compared to a representative gesture recognition algorithm based on accelerometer, which gives proof of its superiority on the success rate.

Figure 1 shows the structure of the recognition system of our paper. This papers is briefly structured as follows: Section 2 introduces the trajectory estimation method. Section 3 includes the BP-NN based hand gesture recognition method. Experiment, simulations, and results are given in Section 4. Section 5 contains the concluding remarks.

2. Relative Position Estimation

2.1. Attitude Determination

In this paper, rotation vectors and quaternions are used for attitude determination. Rotation vector is an effective form for rotational representation. In fact rotation vector is the eigenvalue of the direction cosine matrix (DCM) and can be given by [25]where is the DCM, is the identity matrix, and is the rotation vector which can be defined bywhere is the projection of on -axis, is the projection of on -axis, and is the projection of on -axis. The differential equation of the rotation vector can be given by [25]where is the real-time angular velocity which can be given by . Generally speaking, (3) can be simplified asBy integrating (4), the real-time rotation vector can be calculated. The most important characteristic of the rotation vector is that the rotation vector can compensate for noncommutative error effectively. In 1971, Bortz proposed an attitude estimation method that used rotation vector to avoid coning error [26]. Twelve years later, Miller proposed a discrete-rotation-vector based method with three samples in one integration period [27]. Savage summarized several compensation methods in [28, 29]. According to Savage’s theory, the three-sample rotation vector method supposed that the angular velocity can be given by a two-order polynomial within one updating period:where , , can be given by [25]Expanding (4) with Taylor Series, we haveSuppose , multiorder derivatives of the rotation vector can be given byInserting (8) into (7), the discrete form of (4) can be obtained [28]:where is the vector sum of angular increments. Equation (9) is called the three-sample rotation vector algorithm. Once the rotation vector is calculated, it can be converted to a quaternion:where denotes the length of the rotation vector. Multiplying the quaternions from the initial time, the DCM from hand to geographical coordinate system can be given by [28]where represents hand while represents North-East-Down (NED) Coordinate System. The DCM forms the mathematical platform of the Strapdown Inertial Navigation System (SINS).

The MEMS gyroscope usually has bias in the output which leads to the drift of the attitude’s integral. In this way, accelerometer is usually used for compensating for the drift. In engineering practice, observers and optimal filtering techniques are usually adopted to improve the accuracy of the attitude estimation [23]. In the next subsection, we would like to introduce one Improved Unscented Particle Filter for such state estimation.

2.2. Particle Filtering for Attitude Estimation

The inertial attitude determination can be acquired from equations in last subsection. However, in real practice, other sensors like accelerometer and magnetometer are integrated with gyroscope for much more accurate and stable results. We now introduce the particle filtering algorithm which was proposed in [30]. Assume that the discrete state space has the following model:where , represent the state vector and observation vector, respectively. , are independent zero-mean white Gaussian noises (WGNs). Then the particle filtering has the following calculation procedure: (1)Initialization: at , we extract sample point and weight from the importance function, where . Usingwhere denotes the importance function and is the probability density function, we may compute the particle’s weight. It is also noted that here the former denotes the likelihood function while the latter is the state transition one.(2)Forecast: at time epoch , we forecast particles using the state model where the process noise is subjected to the probability density function .(3)Update: the importance of particles is updated using and then normalization and then using norm 1.(4)Resampling: given the threshold number of the particles as , we can calculate the effective number of particles byWhen we get , we may resample the particles and obtain their weights. Then using the final estimated state can be computed. In this paper, the resampling technique is chosen as the residual resampling.

The presented scheme is a sequential Monte Carlo suboptimal method. In attitude estimation, the quaternion can be used as the state vector. The accelerometer-magnetometer combination can be used for measurement model. The measurement equation, that is,can show that the direction cosine matrix is the quadratic function of the attitude quaternion. Hence there is nonlinearities inside the measurement model and it is very suitable to use particle filtering for state estimation.

With the above algorithm, we may improve the conventional nonlinear estimation results that mainly generated from Extended Kalman Filter (EKF [31]). Here we define the state variable as . The state propagation model is given in the last subsection and the variance information has been systematically derived. Related materials can be found in [32]. Then, with the presented approach, the filtering can be recursively continued.

2.3. Velocity and Position Determination

The differential equation of velocity can be given by [28]where is the velocity of hand, is the acceleration measured by the accelerometer, is the rotational angular velocity of the Earth in NED, is the angular velocity of NED relative to the Earth-Centered, Earth-Fixed (ECEF) Coordinate system and is the vector of gravity which can be written aswhere is the local gravitational acceleration. Usually, MEMS gyroscopes cannot sense the rotational angular velocity of the Earth, so that is much smaller than in most low-speed cases. So (13) can be simplified in this paper as follows:In accordance with (15), the discrete form of the velocity update equation can be given by [29]where is the DCM at the moment of and is the angular increment which is a function of time. Letand (16) can be written asObviously, the most significant item of (17) is which can be derived as follows:Let be the sculling motion item. According to [29], the three-sample-based sculling motion item can be given byWith (16), (17), (18), (19), and (20), the real-time velocity can be calculated. The position of the hand in NED can be also calculated by integrating the velocity which can be given byHere, we suppose that the initial position vector is and the real-time position of the hand will be calculated and recorded. The recorded trace will be projected to the vertical plane and the projected trace will be saved as a image which can be recognized later using BP-NN.

In real applications, the velocity and position estimation may diverge due to the bias of the accelerometer. In this way, the zero-velocity update (ZUPT) is introduced to overcome such disadvantages [33]. That is to say, when the acceleration is measured to be less than a settled threshold (usually the absolute value of the accelerometer’s bias), the acceleration would not be integrated, so as to the position.

3. Recognition of Gesture

3.1. The Structure of BP-NN

The backpropagation neural network is a Multilayer Neural Network (MNN) which owns at least three layers and each layer of the BP-NN consists of several neurons. Each neuron in the hidden layer is connected with all the neurons in both the front layer and the rear layer while there is no connection between neurons within a layer. When training the BP-NN, the activation values of the neurons will spread from the input layer to the output layer. To lower the error between expected output values and the feedback values, the backpropagation algorithm (BPA) will be used to adjust the weights between the neurons from the output layer to the input layer. Just like the feedback control in Modern Control Theory, the error will then be decreased time after time. When the error is less than a predetermined threshold, the training process stops and the trained BP-NN can be used for gesture recognition. In this paper, the three-layer BP-NN is used for recognition. A three-layer BP-NN can be illustrated in Figure 2.

3.2. The Training Algorithm of BP-NN

According to neural networks theory, the basic training methods can be given as follows.

Let be the activation function of the input layer, hidden layer, and output layer. In this paper, the Sigmoid Transfer Function (STF) , Linear Transfer Function (LTF) , and Hyperbolic Tangent Sigmoid Transfer Function (HTSTF) are used for representing activation functions.

The training process of BP-NN is actually an optimization problem. The training algorithm based on Gradient Descent Method (GDM) can be given bywhere is the training efficiency, denotes the real output of the th neuron, and is the error of the th neuron. can be given bywhere is the ideal output of the th neuron, is the real output of the th neuron, and denotes the th neuron.

3.3. Grid-Based Feature Extraction

As has been utilized in [34], the grid statistical feature extraction method is very popular. In this paper, the image is divided into grids. We use digits from 0 to 9 as the ideal hand gesture. The divided numbers 0 and 8 can be given in Figure 3.

4. Experiment

4.1. Platform Setup

The proposed algorithm fuses the inertial sensor data into attitude quaternion and then computes the velocity and position. Here the position’s history can well describe the trajectory of the human hand. In this case, we especially design one experimental platform for such validation. In Figure 4, the tower development platform with NXP Kinetis MK60DN512 microcontroller is presented. The employed platform owns the core processing speed of 100 MHz along with the interfaces of SDIO, SPI, UART, WIFI, and CAN Bus. This allows for the data acquisition and logging of wearable inertial sensors. A miniature inertial sensor module including MPU6000 gyroscope, accelerometer combination, and HMC5983 magnetometer is attached to the designed platform using the RS232 cable free of electromagnetic interference (see Figure 5). The module is in the size of 3 cm × 3 cm, making it much more flexible to be mounted on human’s hand. The data polling rate is set to 500 Hz for MPU6000 and 220 Hz for HMC5983. Apart from this, the miniature module can also produce the reference attitude outputs in quaternion. According to the reference manual of this product, the attitude estimation algorithms is effective and accomplished by EKF. In the later comparisons, the results are going to be compared with such reference Euler angles.

4.2. Attitude, Velocity, and Position Estimation

The initial attitude is set as . The initial variance of the state vector is defined as . In the particle filtering design, we set while the threshold is . The raw sensor data is shown in Figure 6 while the generated attitude outputs are shown in Figure 7. We can see that the proposed attitude estimator has basically the same performance with reference system.

The calculated attitude is then used for velocity and position integral. After the position integral, the trajectory are saved as images. The first 30 trajectories are used as training samples. And the printing character is also added to the set of training samples which can be shown in Line 4 of Figure 8.

4.3. Neural Network

The Neural Network Toolbox of MATLAB is utilized for training the BP-NN. In this section, the Levenberg-Marquardt Method (LMM) and GDM are adopted as training algorithm, respectively. The Mean Square Error (MSE) was used as the evaluation standard of the performance of trained BP-NN. The performances of the BP-NNs are compared with different training algorithms and activation functions, which are shown in Figures 9 and 10 and Tables 1 and 2.

As can be seen in Figures 9 and 10, the performance of LMM is better than GDM in this case. MSEs from different combinations are grouped into Table 1.

MSEs of the test data can be given as in Table 2.

Obviously, GDM is not reliable in this paper at all because its success rate is too low for real applications. The STF-LMM combination shows the best performance among all the combinations. The STF-LMM combination can be used for recognition, which is a component of the proposed system.

4.4. Gesture Recognition Results

With the designed system shown above, we make several comparison experiments with a recent representative method. This method is proposed by Xu et al. in which an accelerometer is adopted for gesture recognition [13]. The advantage of the proposed method is that it logs the history gesture movements so that the hand gesture can be determined more accurately. We first generate several gestures with the designed platform and then use the aforementioned parameters and modeled BP-NN to verify the success rate of both methodologies. The general results are summarized in Table 3.

We can see that, for some instant gestures, the two methods show not much macroscopic differences. However, when the gesture becomes slow, which relies on the history identification of itself, the proposed method shows much more superiority. This verifies the parameters and models described above and also proves the feasibility and efficiency of the proposed algorithm.

5. Conclusion

In this paper, we propose a hand gesture recognition system that combines inertial sensors and BP-NN together. Rotation vector method is used to compensate for the noncommutative errors in the determination of attitude, velocity, and position. Particle filtering is introduced to generate accurate attitude outputs from these sensors. Throughout the real-world experiments, the best activation function and training function of the BP-NN are determined successfully. The final tested performance of the BP-NN proved the effectiveness of our method. However, it should be noted that the robustness of our method has not been verified yet. In the future, we will combine different hand gesture recognition methods together in further researches to develop a new robust and more accurate recognition system.

Conflicts of Interest

The authors declare no conflicts of interest regarding this manuscript.

Acknowledgments

This research was supported by National Science Foundation of China (Grant no. 61450010).