Handwriting Recognition in Free Space Using WIMU-Based Hand Motion Analysis
We present a wireless-inertial-measurement-unit- (WIMU-) based hand motion analysis technique for handwriting recognition in three-dimensional (3D) space. The proposed handwriting recognition system is not bounded by any limitations or constraints; users have the freedom and flexibility to write characters in free space. It uses hand motion analysis to segment hand motion data from a WIMU device that incorporates magnetic, angular rate, and gravity sensors (MARG) and a sensor fusion algorithm to automatically distinguish segments that represent handwriting from nonhandwriting data in continuous hand motion data. Dynamic time warping (DTW) recognition algorithm is used to recognize handwriting in real-time. We demonstrate that a user can freely write in air using an intuitive WIMU as an input and hand motion analysis device to recognize the handwriting in 3D space. The experimental results for recognizing handwriting in free space show that the proposed method is effective and efficient for other natural interaction techniques, such as in computer games and real-time hand gesture recognition applications.
Recent advances in computing technology and user interfaces have led to remarkable growth in interactive applications, such as gestural interfaces in virtual environments. In day-to-day life, the use of gestural devices, which use hand motion data to control interactive interfaces, has increased over the years. Vision-based 3D gesture interaction interfaces, such as Microsoft Kinect and Leap Motion, use a depth-camera to track a user’s hand in a constrained environment. The user must perform the gesture action in the visibility of the device to interact with the applications, which limits the range of motion. Vision-based techniques suffer from occlusion problems, being limited to wearability, computational costs, and sensitivity to lighting conditions.
Low-cost sensor technology devices and user-friendly human-computer interaction (HCI) techniques are being rapidly developed using inertial sensing methods, such as gesture recognition, activity recognition, motion tracking, and handwriting recognition. Compared to conventional keyboard and touch screen—based input methods, handwriting character recognition in three-dimensional (3D) space using inertial sensors is an emerging technique. Many researchers have proposed handwriting character recognition using accelerometer-based devices. Most of the research is limited to handwritten digit, simple gestures, or character recognition in two-dimensional (2D) space.
Generally, accelerometers show similar acceleration signals for different hand motions, and variations in acceleration data can be found among different users, which decrease the overall recognition performance of such systems. To recognize complex gestures such as the English alphabet, 3D accelerometer-based techniques are ineffective and inefficient because English letters are more complex than digits and simple gestures, which contain similarly shaped characters. However, using an inertial sensor-based device, users can freely write characters in 3D space. We propose an inertial sensor-based system for handwriting recognition in 3D space without a constrained environment or writing space. The system uses a WIMU device for hand gesture tracking and motion analysis to recognize handwritten lowercase letters of the English alphabet and digits in free space.
The WIMU device is embedded with MARG sensors and a sensor fusion algorithm, which provides intuitive and accurate hand motion data recognition through linear accelerations, angular velocities, and orientations of the moving hand. The user can write in free space on an imaginary screen using our WIMU motion sensor. In this paper, we present a two-stage approach for spotting and recognizing handwriting in 3D space. Our proposed approach uses hand motion analysis for automatic segmentation of hand motion data to spot significant handwriting data. Furthermore, we implement a multidimensional handwriting recognition algorithm using DTW, which measures similarity by computing the distance between two signals that might vary in time or speed, to recognize handwriting in real-time.
The remainder of this paper is organized as follows. Section 2 briefly describes existing inertial sensor-based handwriting recognition methods and related works. We present the WIMU motion sensor in Section 3. In Section 4, we explain the WIMU device as a handwriting interface that automatically spots handwriting segments and performs handwriting recognition. Experimental results are presented with user-independent test in Section 5, and Section 6 concludes our paper.
2. Related Work
Many researchers have used vision-based sensors in hand gesture recognition applications [1–3]. Inertial sensor-based HCI techniques for gesture recognition and handwriting recognition have also been used by many researchers [4–6]. Compared to vision-based sensors, inertial sensor-based techniques have the advantages of being ubiquitous and having low latency and computation cost. A microelectromechanical system (MEMS), accelerometer-based nonspecific-user hand gesture recognition system, is presented in . Akl et al.  presented an accelerometer-based gesture recognition system that used DTW with affinity propagation methods. Liu et al.  used a single three-axis accelerometer and the DTW algorithm for gesture recognition. They evaluated their system using more than 4000 samples for eight gesture patterns collected from eight users. Choi et al.  presented a pen-style hardware, which has an accelerometer, to recognize the 10 Arabic numerals. They used both the hidden Markov model (HMM) and DTW for recognition. An accelerometer-based pen device for online handwriting digit recognition using DTW is also presented in .
An accelerometer-based digital pen for 2D handwritten digit and gesture trajectory recognition applications is presented in , which extracts the time and frequency-domain features from the acceleration signals and then identifies the most important features to reduce the feature dimensions. The reduced features were then sent to a trained probabilistic neural network for recognition. Reference  presented a study on the effectiveness of combining acceleration and gyroscope data on mobile devices for gesture recognition using classifiers with dimensionality constraints. An inertial-measurement-unit-based pen and its associated trajectory recognition algorithm for handwritten digits and gesture recognition are presented in [14, 15].
The fusion of multiple sensors has been adapted to enhance the gesture tracking and recognition performance of the system. Liu et al.  presented a method for fusing data from inertial and vision depth sensors within the framework of an HMM for hand gesture recognition. Reference  proposed the fusion of a MEMS inertial sensor and a low-resolution vision sensor for 2D gesture tracking and recognition. Fusion of three-axis accelerometer and multichannel electromyography sensors was used to recognize sign language in . However, the fusion of multiple sensors increases the computational load and cost of a system.
In hand gesture recognition or handwriting recognition using inertial sensors, spotting and recognizing the significant data from the continuous data streams are very important. Reference  presented a two-stage approach for the spotting task. The first stage preselects signal sections likely to contain specific motion events using a simple similarity search. Those preselected sections are further classified in a second stage by exploiting the recognition capabilities of HMMs. Amma et al.  proposed a two-stage approach for spotting and recognizing handwriting gestures. The spotting stage uses a support vector machine to identify the data segments that contain handwriting. The recognition stage then uses HMMs to generate a text representation from the motion sensor data.
Many frameworks for 3D spatial gesture recognition systems exist. HMMs are widely used in gesture recognition methods [21–23]. Chen et al.  presented a 6D gesture recognition system using different tracking technologies for command and control applications and compared the effectiveness of various features derived from different tracking signals, using a statistical feature-based linear classifier as a simple baseline and the HMM-based recognizer in both user-dependent and user-independent cases. However, HMMs require large training data sets to form a statistical model for recognition, and their computational complexity increases with an increase in the dimensions of the feature vectors. The DTW-based hand gesture recognition algorithm has also been used by many researchers [25–28]. DTW works even with only one training data set, and it is easy to execute, computationally efficient, and accurate for time-series data. Reference  proposes a DTW-based recognition algorithm for online handwriting and gesture recognition using an inertial pen.
3. WIMU Motion Sensor
A custom-made wireless motion sensor using a 9-axis MEMS sensor (InvenSense MPU9150) is designed, which incorporates a triaxis 16-bit gyroscope, triaxis 16-bit accelerometer, and triaxis 13-bit magnetometer with selectable ranges up to ±2000 °/s, ±8 g, and ±1200 μT, respectively. The accelerometer, gyroscope, and magnetometer provide accelerations, angular velocities, and magnetic signals generated by hand motion. All these sensors are connected to a microcontroller (STMicroelectronics STM32F103RET6) that collects and processes the data. The inertial sensors data are transmitted to a PC using (Panasonic iPAN1321) Bluetooth transceiver or a USB connection. Figure 1 shows our custom-made wireless motion sensor. The wireless motion sensor data is acquired and processed at 100 Hz.
The WIMU motion sensor is programmed with a sensor fusion algorithm that enables it to precisely and accurately track user hand motions in 3D space. A quaternion complementary filter algorithm is implemented to obtain the 3D attitude of the device in quaternion format . The quaternion complementary filter algorithm uses the calibrated sensor data as input and produces the quaternion as output.
4. WIMU Handwriting Interface
In real-time handwriting recognition using an inertial motion sensor, the first problem is spotting the significant handwriting data from continuous hand motion data. The second problem is recognizing complex, similarly shaped letters within a large number of classes. Users have different styles for handwriting and dynamic variations in timing and speed for each character. Even a single user has variations in time and speed for a single character, which makes 3D handwriting recognition a difficult task. To overcome these problems of identifying significant hand motion data, intraclass variations, and interclass similarity, we propose a two-stage approach for handwriting in 3D space using a WIMU motion sensor. Figure 2 shows the block diagram of our segmentation and classification process for handwriting recognition in 3D space.
4.1. Handwriting Features Extraction
Inertial sensor measurements commonly contain noise, sensor drift, cumulative errors, and the influence of gravitation error that produce inaccurate output; hence, preprocessing steps such as calibration and filters are necessary to eliminate noise and errors from the inertial signals. Calibration procedure reduces the sensitivity and offset errors from the raw signals using scale factors and biases from the triaxis accelerometer, gyroscope, and magnetometer to obtain calibrated signals, and low-pass filtering reduces the high-frequency noise from the calibrated signals.
The acceleration data consist of two components: motion-induced acceleration and gravity. The gravity component is treated as noise and removed because it does not depend on the user’s hand motion. To compensate for gravity, we compute its expected direction with the quaternion output from the quaternion complementary orientation filter using
The gravitational acceleration is subtracted from the acceleration , as shown in (2), to obtain the motion-induced acceleration in the sensor frame:
After calibration and filtering, the WIMU motion sensor provides accelerations , angular velocities , and 3D attitude of the device in quaternion as feature parameters generated by hand movements for further processing and analysis.
4.2. Handwriting Segmentation
The hand gesture data from the WIMU motion sensor consists of symbolic and continuous data and can be considered as an organized sequence of segments. Segmentation of continuous hand motion data simplifies the process of handwriting classification in free space. Our approach uses hand motion analysis for handwriting segmentation. In real-time, we process the accelerometer and gyroscope data from the motion sensor to segment the continuous hand motion data into handwriting motion segments and nonhandwriting segments, using both the motion detection and nonmotion detection approaches.
The angular velocity and linear acceleration of a hand motion are two controlling parameters; they provide information to determine the beginning and end boundaries of handwriting segments. We calculate the norm of the linear acceleration and angular velocity data from user hand motions using (4) and (5) to segment hand motion data and spot significant handwriting segments. This approach assumes that the angular velocity and linear acceleration of the hand motions decrease, when a user begins and ends handwriting:
Using a magnitude threshold-based hand motion analysis method, we segmented the hand motion data into handwriting segments and nonhandwriting segments. For segmentation using accelerometers, we calculated acceleration threshold from the filtered accelerometer signals when the user is stationary. A small constant such as determines the segmentation of the handwriting motion data when is greater or less than the threshold. However such a loose condition produces unexpected segmentation problems for hand motion.
In our empirical tests, we observed that using only an acceleration threshold produced unexpected motion segmentations; therefore we also used a temporal threshold to avoid unexpected segmentations. Segments that resulted in the same temporal threshold were combined into a single segment. We used 450 ms, as the temporal threshold for our empirical tests. A handwriting motion was assumed to have stopped if its duration was greater than the temporal threshold. Thus, we used acceleration and temporal thresholds in our motion detection approach to separate the hand motion data into two segments, handwriting and nonhandwriting.
Similarly, for gyroscope-based segmentation, we determined a gyro threshold from the filtered gyroscope signals, and a small constant such as 20 °/s was used. Segmentation of hand motion data follows when is less than the gyro threshold. Unlike accelerometer-based segmentation, gyroscope-based segmentation did not produce any unexpected segmentations. Thus, in our empirical tests we determined that gyroscope-based segmentation provides higher accuracy than accelerometer-based segmentation.
Our system uses acceleration and temporal thresholds to determine handwriting segmentation for spotting significant motion data, with high-accuracy gyroscope-based segmentation validating the gesture segments made using accelerometer-based detection. Thus, our system combines the accelerometer-based segmentation which uses a temporal threshold to avoid unexpected segmentations with the gyroscope-based segmentations to verify and validate the segmentations. Figure 3 shows our combined approach for spotting significant hand motion data.
4.3. Handwriting Recognition
For handwriting recognition in 3D space, we implemented a multidimensional real-time handwriting recognition algorithm using DTW, which has been widely used for human body movement recognition. In our system, the DTW algorithm computes the distance between two gesture signals, represented by multidimensional time-series data that can vary in time or speed obtained from the WIMU motion sensor. This method is simple and effective for interactive applications such as handwriting and gesture recognition systems.
The quaternion output from the WIMU motion sensor is transformed into Euler sequences of rotation angles. We use roll , pitch , and yaw , in addition to the accelerations and angular velocities , as feature parameters to efficiently track and classify handwriting in a meaningful and intuitive way. The distance estimation of the orientation data is efficient and allows good discrimination among complex and similarly shaped handwritten characters. We used a min-max normalization method to compensate for scale and magnitude variations due to individual differences between writings. The time-series sequence of hand motion data obtained from the WIMU motion sensor is denoted by
During real-time handwriting recognition, the DTW recognition algorithm computes the similarity between the input data and the templates. Input handwriting data is accepted and classified to a class, which has the minimum warping distance and matches the threshold value of that class. If it does not match the threshold value, the input handwriting data is rejected.
If are two time-series hand motion sequences with different lengths, to find similarity between them using DTW, we need to define a distance matrix , containing the Euclidean distances between all pairs of points between and : Then we recursively define the warping matrix :
We then calculate the optimal total minimum warping distance , between and after alignment.
In order to improve and speed up computation of DTW algorithm it is common to restrict warping path. Common extensions to the DTW algorithms are Sakoe-Chiba band and Itakura parallelogram. The typical constraints to restrict warping path are as follows:(i)Monotonicity. The warping path should not move backwards. It must be monotonically increasing.(ii)Continuity. The increment in a warping path is limited such that no elements are skipped in a sequence.(iii)Boundaries. The start and end elements in the warping path are fixed. If a warping window is specified then only solve for the pairs where , where is the size of warping window.
5. Experimental Results
We demonstrated our WIMU motion sensor-based handwriting interface using the 26 lowercase letters of the English alphabet and digits. The system runs on a PC with an Intel Core i7 with a 3.40 GHz CPU and 16 GB memory. The wireless motion sensor communicates with the PC via a Bluetooth interface. WIMU motion sensor is equipped with button to start gesture by pressing the button and end the gesture by releasing it. The combination of accelerations, angular velocities, and Euler rotation angles as feature parameters for handwriting recognition with automatic segmentation of the hand motion data into significant motion segments allows users to produce affordance input in 3D space using the WIMU motion sensor.
5.1. Template Database for Handwriting
We stored the template database for each handwritten English lowercase letter and digit in XML file format from training samples for handwriting recognition. In the handwriting training process, a threshold value and template are computed for each handwritten English lowercase letter and digit. Figure 4 shows the handwriting trajectories we used in this system for English lowercase letters and digits.
After training, the DTW recognition algorithm classifies unknown multidimensional time-series hand motion data by calculating similarity between the input and each handwriting template in the database. The threshold value for each template class filters false positives in the recognition stage. The unknown input data segment is rejected, if no match is found in the template database. If a new template class is added or if an existing template class is removed from the template database, we need only train and compute the threshold value for the new template class, which reduces the training time.
5.2. User Study: Test of Digits and English Lowercase Letters
We conducted a user-independent experiment to test handwriting recognition and evaluate the efficiency of the WIMU motion sensor as an input device for handwriting in free space. The user-independent test was conducted with eleven male and nine female participants aged between 25 and 35 years. We instructed participants on the handwriting trajectories for English lowercase letters and digits and allowed them to practice writing in free space with the WIMU motion sensor before beginning the experiment.
For 3D handwritten digit recognition, we asked each participant to write each digit (0 to 9) by pressing the button on the WIMU motion sensor and releasing the button upon completion of each digit; this was repeated 5 times in free space without any limitations or restrictions for writing. Thus, using twenty participants, we tested each digit 5 times for a total of 1000 inputs.
Table 1 shows the confusion matrix table for the 3D handwritten digit recognition experiment. The columns are recognized digits, and the rows are the actual input digits in 3D space. The DTW-based recognition algorithm achieved an average accurate recognition rate of 99.5% using a leave-one-out cross-validation method. Figure 5 shows the precision and recall accuracy of the proposed method for each digit.
Similar to the 3D digit handwriting recognition experiment, we asked participants to practice writing using the given handwriting trajectories for English lowercase letters in free space until they were comfortable with the WIMU motion sensor. Then we asked the twenty participants to write each of the 26 English lowercase letters 5 times. Thus, we tested a total of 2600 handwritten English lowercase letters with the template database for the user-independent experiment in free space.
Table 2 shows the confusion matrix table for the 3D handwritten English lowercase letters recognition experiment. The columns are recognized letters, and the rows are the actual input letters in 3D space. The DTW-based recognition algorithm achieved an average accurate recognition rate of 98.69% using a leave-one-out cross-validation method. Figure 6 shows the average precision and recall accuracy of the proposed method for each English lowercase letter.
5.3. User Study: Spotting and Recognition Test for Words
For writing words in free space we concatenated the individual letters written in free space continuously. User starts writing word in free space by pressing the button provided on the WIMU motion sensor in the beginning of the starting letter of word and releasing the button on completion of last letter of the word. Figure 7 shows the user writing “velab” word in free space using our proposed approach.
We evaluated the segmentation accuracy, which measures the ratio between the number of true results (both true positive and negatives) and the total number of cases examined (true positive, true negative, false positive, and false negative) of the three methods (acceleration-based, gyroscope-based, and combined approaches), measured per sample point using continuous hand motion data that contains both handwriting and nonhandwriting data from twenty participants. All twenty participants were asked to write the word “velab” in free space 5 times at different speeds. Figure 8 shows the accuracy (equation (9)) of the segmentation methods from our empirical test for spotting handwriting and nonhandwriting data from continuous hand motion data. The combination of accelerometer and gyroscope-based segmentation achieves an accuracy of 98.00%, which is higher than the accelerometer-based (94.44%) and gyroscope-based (96.11%) methods:
Figure 9 shows the false positive rate of the segmentation methods based on our experimental test. Our approach for spotting significant hand motion data using both accelerometer and gyroscope-based segmentation achieves a low false positive rate of 4.5%, whereas methods based on accelerometer or gyroscope segmentation achieve false positive rates of 12.5% and 8.75%, respectively.
Our empirical test thus shows that the combined approach to segment continuous hand motion data into handwriting and nonhandwriting data using both accelerometer and gyroscope data provides higher accuracy and a lower false positive rate than either method used alone. This method for handwriting segmentation is simple and effective for real-time use. Figure 10 shows the recognition accuracy of empirical test for each letter in the word “velab” using combination of accelerations, angular velocities, and hand rotation angles as feature parameters for DTW-based handwriting recognition in free space.
We also tested the system for handwriting different words in free space using the proposed approach to evaluate the systems performance. We asked all twenty participants to write 40 words , which included common TV channels and digital/Internet services and contained all of the letters of English alphabet. We asked all twenty participants to write each word in lowercase five times each in free space using the proposed approach to evaluate the system performance.
Figure 11 shows the average precision and recall accuracy of each letter in words written in free space using WIMU motion sensor. The results are obtained from feeding in the detection segments. The DTW-based recognition algorithm achieved an average accurate recognition rate of 97.95%.
5.4. Qualitative Evaluation and Discussion
We asked each participant a series of questions to evaluate user experience for the proposed handwriting in free space using WIMU motion sensor on a scale from 1 to 10 for each question. The questions asked are as follows: () natural: do you feel it natural user interface, () intuitive: intuitiveness of the system, () easy: how much easy to use and adapt quickly to the proposed interface, and () comfort: how much physical ease and freedom compared to other handwriting methods like touchscreen or vision-based approaches. Figure 12 shows the average of qualitative evaluation result obtained from twenty participants.
The results show that our interface is simple and effective for handwriting in free space with a natural user interface that provides ease of use. The automatic segmentation of hand motion data by analyzing accelerations and angular velocities to indicate the start and end of significant handwriting data operates similarly to the pen-up, pen-down concept in conventional handwriting systems. The approach of using user hand motion constraints (acceleration and angular velocity) for segmentation in real-time reduces the redundant segmentation of hand motion data compared to accelerometer-based and gyroscope-based methods for segmentation of time-series data.
The proposed system recognizes the hand motion trajectories by sensing and spotting handwriting segments, and the DTW-based recognition algorithm, which compensates for variations in speed and size, performs the handwriting recognition. The hand motion data obtained from acceleration, angular velocities, and orientation data increases the handwriting recognition rate in free space. This allows even a naive user to adapt quickly and easily to the proposed interface for handwriting in free space.
Compared to individual letters the performance of handwriting for words in free space continuously using WIMU motion sensor varies due to variations in users different handwriting speed. The main limitation of our system is that the proposed approach assumes that the acceleration and angular velocity of a hand motion decrease when a user begins and finishes handwriting for spotting handwriting and nonhandwriting data, which requires users to slow their handwriting speed for fraction of second after each letter in the word to recognize handwritten letter spotted during segmentation process.
We have presented an interactive, inertial sensor-based interface for handwriting in free space. The hand motion analysis to detect significant hand motion segments for handwriting using user hand motion constraints collected from a WIMU motion sensor input device increases our system’s handwriting recognition rate. The DTW-based handwriting recognition algorithm and using a combination of accelerations, angular velocities, and orientation data to recognize handwriting in 3D space are effective and efficient in real-time. Our experimental results show that the proposed method is suitable for interactive gesture-based applications. Users can effectively express their intentions in a virtual environment beyond vision, touch screen, keyboard, and mouse interactions.
The proposed interface system could also be used in other natural interaction techniques, such as in computer games and real-time user activity recognition applications. Although our proposed method is effective for spotting and recognizing hand motion data, it assumes that the acceleration and angular velocity of a hand motion decrease when a user begins and finishes handwriting, which requires users to slow their handwriting speed. Thus, we plan to further investigate the issue of handwriting speed and improve our system to recognize sequences of words.
The authors declare that there are no competing interests regarding the publication of this paper.
This research was supported by the MSIP (Ministry of Science, ICT and Future Planning), Korea, under the Global IT Talent support program (IITP-2015-R0110-15-2003) supervised by the IITP (Institute for Information and Communication Technology Promotion) and the Chung-Ang University Research Scholarship Grants in 2014.
A. Benbasat and J. Paradiso, “An inertial measurement framework for gesture recognition and applications,” in Gesture and Sign Language in Human-Computer Interaction, I. Wachsmuth and T. Sowa, Eds., vol. 2298, pp. 9–20, Springer, Berlin, Germany, 2002.View at: Google Scholar
J. K. Oh, C. Sung-Jung, B. Won-Chul et al., “Inertial sensor based recognition of 3-D character gestures with an ensemble classifiers,” in Proceedings of the 9th International Workshop on Frontiers in Handwriting Recognition (IWFHR-9 2004), pp. 112–117, Tokyo, Japan, October 2004.View at: Publisher Site | Google Scholar
S. Zhou, Z. Dong, W. J. Li, and C. P. Kwong, “Hand-written character recognition using MEMS motion sensing technology,” in Proceedings of the IEEE/ASME International Conference on Advanced Intelligent Mechatronics (AIM '08), pp. 1418–1423, IEEE, Xi'an, China, August 2008.View at: Publisher Site | Google Scholar
J.-S. Wang, Y.-L. Hsu, and C.-L. Chu, “Online handwriting recognition using an accelerometer-based pen device,” in Proceedings of the 2nd International Conference on Advances in Computer Science and Engineering, pp. 229–232, 2013.View at: Google Scholar
S. Kratz, M. Rohs, and G. Essl, “Combining acceleration and gyroscope data for motion gesture recognition using classifiers with dimensionality constraints,” in Proceedings of the 18th International Conference on Intelligent User Interfaces (IUI '13), pp. 173–178, Santa Monica, Calif, USA, March 2013.View at: Publisher Site | Google Scholar
S. Vikram, L. Li, and S. Russell, “Handwriting and gestures in the air, recognizing on the fly,” in Proceedings of the CHI, p. 21, Paris, France, April-May 2013.View at: Google Scholar