Mathematical Problems in Engineering

Volume 2018, Article ID 2404089, 9 pages

https://doi.org/10.1155/2018/2404089

## Modified Dynamic Time Warping Based on Direction Similarity for Fast Gesture Recognition

Correspondence should be addressed to TaeYong Kim; rk.ca.uac@ytmik

Received 25 August 2017; Revised 12 December 2017; Accepted 21 December 2017; Published 22 January 2018

Academic Editor: Tae Choi

Copyright © 2018 Hyo-Rim Choi and TaeYong Kim. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

#### Abstract

We propose a modified dynamic time warping (DTW) algorithm that compares gesture-position sequences based on the direction of the gestural movement. Standard DTW does not specifically consider the two-dimensional characteristic of the user’s movement. Therefore, in gesture recognition, the sequence comparison by standard DTW needs to be improved. The proposed gesture-recognition system compares the sequences of the input gesture’s position with gesture positions saved in the database and selects the most similar gesture by filtering out unrelated gestures. The suggested algorithm uses the cosine similarity of the movement direction at each moment to calculate the difference and reflects the characteristics of the gesture movement by using the ratio of the Euclidean distance and the proportional distance to the calculated difference. Selective spline interpolation assists in solving the issue of recognition-decline at instances of gestures. Through experiments with public databases (MSRC-12 and G3D), the suggested algorithm revealed an improved performance on both databases compared to other methods.

#### 1. Introduction

In human-computer interaction (HCI), replacing the mouse and keyboard with the users’ voice and movement as input mechanisms is a popular topic in present-day research. This has been a result of the substantial improvement in hardware performance and new sensor technology. An analysis of the movement obtained from the sensor in order to determine the user intention is an important process of these types of HCI. HCI gesture-recognition technology mostly consists of pattern-recognition technology. The recognition is mainly divided into two processes: the first process involves the extraction of the characteristics of a pattern, and the second process involves the categorization of the extracted features. A computer’s acquisition of a user’s characteristics typically occurs through a sensor or the processing of acquired data. The acquired characteristics from a gesture are sequential data, and pattern-recognition technologies are required to categorize them.

Methods such as dynamic time warping (DTW) and hidden Markov model (HMM) are used to analyze the sequential data. Research studies seek to improve these methods [1]. The DTW algorithm was developed to match sequence data that are of different lengths. The algorithm creates a cost table for each of the components of two sequence datasets, and it compares the two sequence datasets using dynamic programming that rotationally selects and saves the minimum cost. HMM is a probabilistic model that uses the transition probability of sequence data [2]. Neural network (NN) is a computer system modeled on the human brain and nervous system. Recently, deep learning methods (convolutional neural network, recurrent neural network) have provided reasonable results in computer vision research, while research is still on to improve their applicability in gesture recognition [3, 4]. The major challenge encountered in using deep learning during gesture recognition is the effective presentation of the gesture movements.

The use of deep learning-based methods requires remarkably large database for effectively training the inputs to obtain adequate results during the testing phase. Consequently and because all the processing is achieved within the hidden layers, it is challenging for the researcher to analyze the training process. Therefore, the flexibility of DTW, its requirement of a small-sized database during the training phase, and its being viewable under this process make it a convenient tool for matching process and hence a flexible method for analyzing the extracted features.

The matching based on DTW algorithm involves lesser database-learning pressure and provides steadier results compared with a probability-based algorithm. Owing to these strengths, DTW could be applied to various areas that use sequence data such as gesture [5], voice [6], hand-written letters [7], and signatures [8]; moreover, favorable results can be achieved without the need for a large amount of learning data. DTW [1], edit distance with real penalty (ERP) [9], and edit distance on real sequence (EDR) [10] consider each datum rather than the shape of the sequence trajectory. Angular metric for shape similarity (AMSS) [11] and longest common subsequence (LCSS) [12] are less influenced by outliers; however, AMSS requires preprocessing and is more sensitive to short vibrations. With respect to methods that consider the sequence shape, Derivative DTW (DDTW) [13] compares the shape by using differential sequences. As DDTW considers only the shape, its performance exhibits significant deviation according to the characteristics of the database.

This study introduces a modified DTW algorithm that is based on direction similarity (DS), to calculate the similarity while considering both the movement and shape of gestures. The suggested gesture-recognition system considers the hand-position data acquired from the camera as the sequence data, and if the length of the sequence is insufficient, it interpolates the length. After the normalization of both the position and size of the acquired sequence, gesture recognition is achieved through the use of the DTW algorithm, which reflects the direction characteristics to detect fast gestures.

#### 2. DTW and Classification of Gestures

##### 2.1. Classic DTW

The DTW algorithm can be defined as a pattern-matching algorithm that permits nonlinear construction according to a time scale. To calculate the similarity of the lengths and of the two sequences and , respectively, a nonlinear adjustment course is set, and the minimalized pathway of ’s distance is determined. is defined as follows:and function that calculates the distance of is defined as follows:

The minimum of the total distance of the adjustment course is calculated as follows:

Numerous calculations are required for (3) as all the feasible pathways must be calculated; meanwhile, dynamic programming could also be used to solve the equation. When DTW is applied to the recognition system, the accuracy and efficiency of the calculation increased with the following four limits: an end point limitation that quadrates the start and end points of the input pattern and reference pattern; a monotone-increasing limitation that requires the increase of the monotone for the optimized pathway; a global-path constraint that limits the permitted areas of the input pattern and reference pattern; and a local constraint that limits the pathway to a node to prevent overcontraction or overexpansion [14].

The application of the four constraints of DTW is depicted in Figure 1. The calculation of the optimum path , in consideration of the local path limitation, is as follows: